Re: Re: Re: Re: Re: Re: Testing was biased!
All true. Assuming 1 vector floating point multiply-add per cycle (for example), that's (if I understand it correctly) 96 bits of data per cycle (3 source, 1 destination). On a 1GHz G4+ that's 96,000,000,000 bits per second, or 12GB/sec (There's gotta be something wrong with my math, that seems too high). A 167MHz bus can theoretically get 1.3GB/sec.
<edit>
Well, I was wrong. It's worse than I thought. On a 400MHz G4 (not G4+) using Altivec to add two streams of numbers together can eat up 12.8GB/sec. Using multiply-adds and writing the results back to memory can quadruple that (it turns out that multiply adds don't work like a = b * c + d, they work like abcd = efgh * ijkl + mnop, all in one cycle).
</edit>
<edit2>
This means that certain operations could go over 100 times faster if we had a fast enough bus (100GB+/sec).
</edit2>
Originally posted by eric_n_dfw
I'm no SIMD expert, but I think your're fooling yourself if you believe that "the whole OS is vector optimized."
Only certain things can be optimized for SIMD processing and I'd be surprised if much of the Darwin core has any AltiVec calls.
Quartz probably makes use of it for some of the eye candy - I'd venture a guess that a lot of the alpha-chanel stuff and the genie effect and things like that use AltiVec a lot. (I know dropping a G4/400 into my B&W G3/400 sped up Aqua a bit for me - and was like putting NOS in FCP!!)
Unfortunatelly, even with AltiVec doing the heavy lifting in applications like QuickTime, FCP, iMovie, iTunes and iDVD I still beleive it is ham-strung by the slow memory access speed. <soapbox>We need REAL DDR, like yesterday Steve! </soapbox>
All true. Assuming 1 vector floating point multiply-add per cycle (for example), that's (if I understand it correctly) 96 bits of data per cycle (3 source, 1 destination). On a 1GHz G4+ that's 96,000,000,000 bits per second, or 12GB/sec (There's gotta be something wrong with my math, that seems too high). A 167MHz bus can theoretically get 1.3GB/sec.
<edit>
Well, I was wrong. It's worse than I thought. On a 400MHz G4 (not G4+) using Altivec to add two streams of numbers together can eat up 12.8GB/sec. Using multiply-adds and writing the results back to memory can quadruple that (it turns out that multiply adds don't work like a = b * c + d, they work like abcd = efgh * ijkl + mnop, all in one cycle).
</edit>
<edit2>
This means that certain operations could go over 100 times faster if we had a fast enough bus (100GB+/sec).
</edit2>