Sun Baked said:
Darwin's SMP did make multiprocessor Macs usefull for everybody, under OS 9 the second processor sat around idle unless there was an actual routine written to take advantage of the Multiprocessor widget.
SMP is cool, but it does come at a cost in single CPU systems. (saps about 10-15% out of some of the OS9 vs. OSX benchmarks)
---
IBM has implemented threading on the Power5 (and Power5Lite) -- this software trick will allow the CPU to handle twice the number of instructions as the PPC970, and only impose a 25% transistor penalty.
Basically, something that can give you dual-PPC970 core performance, but is only 25% bigger -- instead of twice the size.
Ooh - yeah the classic OS series was rather lacking under the hood... it's a good thing they did go *nix.
I'm not so sure about the 10-15% penalty being attributed to SMP for Darwin vs OS9... there's a lot more going on under the hood in Darwin. I'd be surprised if the performance drop because of SMP actually registers... but classic was a bit more bare metal compared to Darwin so by doing less in the overhead, you can let a user process do more...
.. of course you get some good stuff from Darwin... like real memory management, user space protection, and a thread aware scheduler...etc.
Re: Power5
It's not a software trick. Multi-threading with a context switch is the software trick. SMT is more of a way to keep the cpu pipeline full as opposed to handling more instructions. Theoretical throughput remains the same, however the hope is you can parallelize instructions. Since threads are by definition Parallel, let the cpu handle threads to better use its resources; look no further than Amdahl's law for speedup:
Amdah's Law and you can see what they are trying to do with SMT. Throughput for a single task with SMT is 1/n, which is where the misconception that the processor can handle n times the amount of work; it can't. It can handle n process threads, with n being the SMT multiplier (2 in most cases), but it can still only process the same amount of work. Look at Amdahl's law and find the bottleneck: Ah that nasty serial dependant stuff? SMT is an idea to reduce bubbles/stalls in the pipeline by allowing the processor to perform a pseudo context switch to other code that is theoretically parallel to the stalled code, so that it can always be doing something, rather than sucking nops.
The problem comes, or the problem that Intel has found in their implementation of SMT (called Hyperthreading), is that threads now have to internally fight over cache resources, and since SMT is an attempt to fill bubbles, the pipeline needs lots of bubbles to fill to make a difference. Hence, Prescott's double cache and +10 stages in the pipeline (and Intel's theory that more pipeline stages = higher clock speeds). Now with a longer pipeline, other problems come to light. Yikes... Let's hope IBM learns from Intel's forray and has some better ideas.
-Wyrm