Where did this "triple channel mode" come from. Most of this thread is a bit whacked because the terminology is way off.
There is not a "triple channel mode memory controller" on the Nehalem and Westmere (5500 and 5600 series ) Xeon. There are
three memory controller
s. Yes plural; as in more than one. Each one of these controllers can be attached to one, two , or three banks of DIMMs slots.
So on the Mac Pro there are two controllers attached to one bank/slot each and one controller that is attached to two banks/slots.
----- memory controller 1 --- [ 1 ] --- [ 4 ]
----- memory controller 2 --- [ 2 ]
----- memory controller 3 --- [ 3 ]
So if you do not put memory slots attached to all three controllers then you will not get all three controllers involved. If you just put one DIMMs into the first slot they you will get just one controller active. Likewise if only use the first two slots then will only activate two of the controllers; one will be dormant because there is nothing attached to it.
What Intel does is change the interleave (how the memory is layed out) and memory clock speed depending upon how you fill up the slots. Fill slots 1 through 3 and you get a 3 way interleave and the higher clock speeds. As you fill in more of the banks the clock speed drops. As you don't fill in groups of 3 the interleave drops below a way weave.
Since programs often ask for addresses in sequence if "words" at addresses 4 8 12 are located behind three different controllers the processor can start all of those requests in parallel (or at very least a pipeline fashion since they take many cycles ) because the work is delegated out to three different controllers. In a two way interleave ( say a two way interleave where <1> and <2> and <3> and <4> are paired) then cannot get as much memory requests going in parallel. Could get 4 and 8 , but if 12 is also assigned to memory controller 1 it will have to wait until get 4 dispatched before can dispatch 12.
It appears the "single channel mode" and "triple channel mode" being talked about here is really single/triple/etc interleave. If so that misses an important point of "single mode". There are two types of interleave. One is at the 'micro' level (you can assign word addresses ) so that a single core interacts with multiple controllers. The other "interleave" is that there are multiple threads/processes that interact with different much larger regions of memory (each of which is behind a different controller). You can still get three memory controllers running in parallel if have three different thread/processes accessing three different regions of memory. A smaller effect, but it is still present in most normal situations.
Also no "new" optimizations required for either interleave. Both of those will be leaveraged by apps that "think" they are interacting with a single controller. There is no visible difference to them other than some memory requests coming back faster .... which with dynamic execution on the Xeons doesn't really hurt anything.
You also loose interleave if mismatch sizes. So if 2 1GB DIMMs and 1 2GB DIMMs you cannot split things up 3 ways because the sizes don't match. Likewise one 8GB and one 4GB is worse than 3 4GB DIMMs. While theoretically looks like loosing 30% by switching from three way to two way in reality for real apps ( as opposed to synthetic benchmarks which are too synthetic and small ) the real loss will be in the 2-10% range in memory bandwidth (not overall throughput).
For the 5500 (Nehalem) series Xeons once you add any memory to the second bank on any of the controllers the speed drops to 1066.
"As soon as you add a second DIMM to any memory channel the speed drops to 1066 MHz for all DIMMs "
http://www.delltechcenter.com/page/04-08-2009+-+Nehalem+and+Memory+Configurations?t=anon
So all the folks ranting about how need to fill all possible slots and Apple was lame for not supporting 1333 in the 2009 models were blowing lots of smoke. All the vendors drop down to 1066 if fill more than three DIMMs slots. The speed drop off is even worse if go to 3 banks of slots. 800 memory is better than no memory at all so if needed > 32GB of RAM you simply take hit. It is still approx 10x faster than hitting anything on a SATA or SAS bus.
For the 5600 (Westmere) one of the incremental improvements is that can now in certain configs fill bank 2 and still retain 1333 memory speeds.
See figure at bottom of page 4.
https://globalsp.ts.fujitsu.com/dmsp/docs/wp-westmere-ep-memory-performance-ww-en.pdf
Note that it is still the case if there were 3 banks present that filling any of those slots will drop all the speeds back down to 800.
8GB will likely work since they worked on the 2009 models. Apple didn't provide an 8GB option because with their 30+% markup on memory prices they know that would put that upgrade in the stratosphere pricing where extremely few folks would buy it. OWC has had 8GB modules for the 2009 for long time. Even if you had to go to 1066 8GB modules they'd be worth it versus the alternative of hitting a SAS/SATA channel. However since a couple of years into future, there is decent likelihood can get 1333 models at reasonable prices then.
The "use 4GB now and then 8GB a couple of years from now" is why the four slot configuration makes sense for a very wide spectrum of users. Sure there are a subset of folks who need to pack the machine to the gills with DIMMs but they are not the majority.
P.S. What is more lame is not that the $3,600 Mac Pro doesn't have more DIMM slots, but that there they is no docs in Apple's knowledge base that covers this stuff. Get all kinds of funky quasi disinformation floating around the internet when Apple should be writing this stuff up (like Dell, Fujistu, and others do) so that do get voodoo filling in the blanks information wise.
P.P.S. When the next round of Sandy Bridge era Xeons come along and 6 cores are more mainstream distributed throughout the core of the series' line up 4 memory controllers will be in place ( just like present in the current 6500/7500 series ) and then the 4 slots will be make even more sense and Apple would not have to change the design. Four will the the natural "bank" size and the Mac Pro will already have that.