That's correct. My understanding is also that when SSE instructions are executed they block execution of normal floating point instructions, so theoretically you only save on the number of instructions you use, which can be important, but then again the instructions you use have almost twice as long opcodes (normal float FADD is "DC /0" while SSE float ADDPD is "66 0F 58 /r" so you gain nothing. I don't have any numbers on how this turns out in practice, though.Rincewind42 said:Correct me if I'm wrong, but the last time I heard about SSE and dealing with 128-bit vectors, it implemented the functionality by processing the two 64-bit vector halves in serial. So this would mean that even though SSE can handle 2 doubles in one instruction, it still works as if your only processing one of them per cycle.