As for the 2009 vs. 2008... the 2008 is the last of the FSB architecture and good-ridance to that. The FSB runs at 200MHz and is quad-pumped to achieve 800 MT/s. The FSB is used for interprocessor communications and also for both CPU's in a DP system to communicate with the memory controller. The 2008's also used an 800MHz memory bus with fully buffered DDR2 which can as much as double latency (
source). The only real advantage to FBDIMM's is that it allows lots of memory DIMM sockets since trace lengths to the controller from the DIMM sockets are no longer relevant to the design (since the data is fully buffered) - but you sacrifice a lot of performance to get those plentiful DIMM slots.
Architecturally, the 2009 Nehalems bring a lot of advancements to the table starting with the elimination of the FSB. Now interprocessor communications happen over a quick path interconnect at 8x the speed of the old FSB (6.4GT/s) and the memory controller is now on-die with each CPU having direct access to it's RAM bank or a QPI link away to the RAM on the other processor. No more contention for memory access by CPU's competing for cycles on the already bottlenecked FSB that's in the 2008's. Add to this support for 1066MHz DDR3 in triple-channel mode, and the memory performance is almost on par with the L3 cache bandwidth of the 2008 CPU on-die cache!
Then there's hyperthreading which uses stalled cycles on each core to process other tasks that are queued up and ready to go... so as not to waste any clock cycles when the CPU is maxed out.
Last but not least, there's Turbo Boost, which provides a clock multipler boost to one or two cores for lightly threaded apps when the other cores can be put into a lower power state.
Now all this technological advancement is unfortunately untapped by most software applications but highly threaded memory intensive software has been proven to perform better on the Nehalem architecture and this gap will only increase over time as software is optimized for it.
Can everyone leverage the benefits of the 2009 architecture? Of course not. But everyone will benefit from at least some of these new enhancements at some point in their workflow.
Now the issues with the 2009 are largely overstated in my view. The commonly stated issues are:
- The audio temperature issue: solved
- The DIMM slots in the Quad: It costs more to populate memory using 4GB sticks but it's not insurmountable and probably impacts only a minority of users/workloads that only require 4 cores but greater than 16GB of RAM.
- The SATA throughput limit on the ICH: This is actually common to all recent Intel ICH chips going back at least to the ICH9 (2008) and even earlier so its not unique to the 2009 and only rear's it's ugly head when trying to use several high performance drives in RAID0.
- Price: Yes the 2009 is more expensive. Whether there's value there for any given user depends on their workload, business case, and budget.
The fact is that there are few, if any, technological merits to the 2008 architecture. Only it's price/value is attractive because most of our software (but not all) pitifully trails the capabilities of current hardware.