[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: And the Winner Is...
Andrew,
The task is _not_ CPU bound. The clue is that using
the basic algorithm it takes the same amount of time
on either a P100 or a faster 500Mhz system. If it was
simply CPU speed the 500Mhz system should be 5x faster
then the 100Mhz system. But that's not the case.
To go faster you need a faster algorithm.
I know you're not serious about re-writing in assembly
but if you tell the gcc compiler to print out the
assembly instructions it generates you can see it is quite
good. Hard to beat it by hand coding
Things to worry about if speed is an issue:
1) The cache. Are you moving data to/from consecutive
locations in RAM? Incrementing the wrong index of
a two dimensional array will "thrash the cache".
2) Can you do a block copy rather then a byte at a time?
If you have a 32 bit CPU it is best to work with 32
bit "chunks". On a 64 bit CPU then work with 64 bits
at a time.
3) If some operation require a "wait", can you do something
productive during the wait time? A delay loop is a
total waste of time.
While writing in assembly is not worth it. Reading the
assembly generated by the compiler is worth it in the few
case were we care about speed.
> -----Original Message-----
> From: Andrew Bennett [mailto:andrew.bennett@ns.sympatico.ca]
>
> Not enough of a margin. I am rewriting the whole
> thing in Assembler ... operating sytem and all ;>)
> Tom - it shouldn't take more than a couple of years
> to rewrite your Basic stuff in Assembler. :>(