32gb of RAM in 2009 Quads!


gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
Did you run some tests that show the tri-channel interleaving works in this configuration? Did I miss that?
I ran tests with 1 GB and 2 GB UDIMMs in my octad. Slots 1, 2, 5, 6 with 2 GB and 3, 4, 7, 8 with 1 GB versus 2 GB in slots 1, 2, 3, 5, 6, 7. I only did Geekbench but it is supposed to be bandwidth sensitive. As the Xeon 3500/5500 are specified for UDIMM and RDIMM I have no doubt that the principle will work for 4/8 RDIMMs as well.

I'm sure there would be significant bandwidth loss if I could run my memory on 1333 MHz. My W5590 are well capable of it but EFI will not let me. If I had the 10x multiplier the bandwidth would drop when you use two slots instead of one per channel. Since Apple has screwed it up anyway I can just as well take advantage of the lower price of the 1 GB UDIMMs or the 4 GB RDIMMs. I'm now running this mixed mode.
 

VirtualRain

macrumors 603
Aug 1, 2008
6,304
114
Vancouver, BC
I ran tests with 1 GB and 2 GB UDIMMs in my octad. Slots 1, 2, 5, 6 with 2 GB and 3, 4, 7, 8 with 1 GB versus 2 GB in slots 1, 2, 3, 5, 6, 7. I only did Geekbench but it is supposed to be bandwidth sensitive. As the Xeon 3500/5500 are specified for UDIMM and RDIMM I have no doubt that the principle will work for 4/8 RDIMMs as well.
Did you post those results? I'd be interested to compare.

I must admit that I think Geekbench is terrible at determining memory bandwidth. Here's a thread where I tried to benchmark my tri-channel setup with it and it reported a stream copy of 5GB/s :eek: compared to Sisoft (18GB/s) and Everest (14GB/s) under Windows...

http://forums.macrumors.com/showthread.php?t=729368

Theoretical memory bandwidth with tri-channel 1066 is 25GB/s so Geekbench doesn't come remotely close to saturating our memory architecture.
 

gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
If I find the time I can run the test again with other benchmarks. It just seems pretty obvious that in the MacPro4,1 there will be little difference if you fit half the memory into each of the two slots 3 and 4 that connect to the memory channel 3. Why should it be slower than having the whole capacity in the slot 3?
 

VirtualRain

macrumors 603
Aug 1, 2008
6,304
114
Vancouver, BC
If I find the time I can run the test again with other benchmarks. It just seems pretty obvious that in the MacPro4,1 there will be little difference if you fit half the memory into each of the two slots 3 and 4 that connect to the memory channel 3. Why should it be slower than having the whole capacity in the slot 3?
It would be cool to get to the bottom of this. I don't think dual vs. triple channel makes much difference at all in real-world performance... it would only be measurable in benchmarks, and then only a few which can saturate this kind of architecture. However, it would be nice to determine how Intel's memory controller handles this situation. While your assumption makes a lot of sense, it's also possible that it simply defaults to dual channel mode no matter what the actual memory configuration is when all four DIMM slots are occupied.
 

nanofrog

macrumors G4
May 6, 2008
11,718
2
It would be cool to get to the bottom of this. I don't think dual vs. triple channel makes much difference at all in real-world performance... it would only be measurable in benchmarks, and then only a few which can saturate this kind of architecture. However, it would be nice to determine how Intel's memory controller handles this situation. While your assumption makes a lot of sense, it's also possible that it simply defaults to dual channel mode no matter what the actual memory configuration is when all four DIMM slots are occupied.
For most current software, it won't matter, as there's very little that can actually use enough bandwidth to need triple channel. But there is some. Most is server based (and there's not a massive amount here either), but for workstation use, it's in areas such as large scale simulations (medical, weather,...).

As far as filling the 4th DIMM, it's my understanding the IMC does default to dual channel mode. It would be intersting to find out if that's different though. :) I just don't have a 4th DIMM to test it myselft right now. :eek:
 

AZREOSpecialist

macrumors 68020
Mar 15, 2009
2,103
881
Pricey, but to be expected with 8GB sticks, and RDIMM at that (can't be UDIMM, as it won't work with their own 4GB UDIMM's ;)).


Yes, but not like the standard non-ECC DDR3 though. There's just not as much made, as it's typically only sold to the enterprise market.
4 GB DIMMs were quite pricey when the 2009 Mac Pro was announced. Weren't 4 of them over $1,000 initially? By July and August, that price had fallen to around $600 for four modules. I think the same will happen here.
 

nanofrog

macrumors G4
May 6, 2008
11,718
2
4 GB DIMMs were quite pricey when the 2009 Mac Pro was announced. Weren't 4 of them over $1,000 initially? By July and August, that price had fallen to around $600 for four modules. I think the same will happen here.
To some extent, yes. But there's less demand for the largest sticks. As the 8GB versions of RDIMM are currently the largest capacity, they aren't likely to fall quite as much until the 16GB sticks arrive (announced some time ago, but not shipping yet AFIAK). Those may not show until the Xeon 56xx based servers are available.
 

gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
As far as filling the 4th DIMM, it's my understanding the IMC does default to dual channel mode. It would be intersting to find out if that's different though. :) I just don't have a 4th DIMM to test it myselft right now. :eek:
No, this is neither logical nor sensible. If you fill the slots 3 and 4 with the same DIMMs as the slots 1 and 2 you obviously unbalance the third channel and may cause the IMC to default to dual channel memory. I would agree with that.

If you fit only half the capacity into slots 3 and 4 you end up with exactly the same memory capacity for each channel. This is the same as leaving the slot 4 empty and fitting the same DIMMs to the slots 1-3. If there is any penalty at all it should be absolutely minimal compared to an unbalanced mode.

If you want me to run a test with the UDIMMs you need to tell me which free software you consider to be conclusive. I can run Win7 or OS X apps.
 

Spanky Deluxe

macrumors 601
Mar 17, 2005
4,850
358
London, UK
Is it really worth it??

Mac Pro Quad 2.66Ghz $2499
Upgrade to 32GB of RAM $1979.99
Total: $4478.99

Mac Pro Octo 2.26Ghz $3299
Upgrade to 32GB of RAM $1199.99
Total: $4498.99
 

nanofrog

macrumors G4
May 6, 2008
11,718
2
No, this is neither logical nor sensible. If you fill the slots 3 and 4 with the same DIMMs as the slots 1 and 2 you obviously unbalance the third channel and may cause the IMC to default to dual channel memory. I would agree with that.

If you fit only half the capacity into slots 3 and 4 you end up with exactly the same memory capacity for each channel. This is the same as leaving the slot 4 empty and fitting the same DIMMs to the slots 1-3. If there is any penalty at all it should be absolutely minimal compared to an unbalanced mode.

If you want me to run a test with the UDIMMs you need to tell me which free software you consider to be conclusive. I can run Win7 or OS X apps.
I believe what happens is the interleaving is engaged for all three channels, even if only one has both slots filled (it can't selectively interleave just certain channels that have the additional DIMM/s, as some boards have more than a pair of slots per channel).
 

gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
I believe what happens is the interleaving is engaged for all three channels, even if only one has both slots filled (it can't selectively interleave just certain channels that have the additional DIMM/s, as some boards have more than a pair of slots per channel).
Let us restrict the discussion to Mac Pros. They all have only one slot per channel 1 and 2 and two slots per channel 3.

As I have previously pointed out there is also no multiplier penalty for using two slots per channel because Apple has already castrated the high performance IMCs to 1066 MHz.
 

nanofrog

macrumors G4
May 6, 2008
11,718
2
Let us restrict the discussion to Mac Pros. They all have only one slot per channel 1 and 2 and two slots per channel 3.

As I have previously pointed out there is also no multiplier penalty for using two slots per channel because Apple has already castrated the high performance IMCs to 1066 MHz.
I'm thinking in terms with the second DIMM in channel 3, interleaving is activated on all of them (even though there's not one there for slots 1 & 2).
 

gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
I wonder why there should be interleaving at all. The memory controller is addressing exactly the same amount of memory cells with the same interface. The I/O process should be the same without an additional serialization.

Are you sure your concept of interleaving is actually happening in reality? I do not know enough about the architecture and the protocol of the memory channel to say it interleaves or not.

Let us say it does, then there may still be enough slack in the protocol to let the total process run in the same time frame. Let us assume we have an HP or a Sun workstation with dual slots per channel. It would not have any bandwidth reduction if all six slots are filled with 1066 MHz memory compared to three slots filled with 1066 MHz memory. Only when it uses 1333 MHz memory the bandwidth would be reduced for six slot use versus three slot use because the controller would step the frequency down to 1066 MHz. At least that is what I read in the Intel literature about the 5500 IMC.
 

VirtualRain

macrumors 603
Aug 1, 2008
6,304
114
Vancouver, BC
No, this is neither logical nor sensible. If you fill the slots 3 and 4 with the same DIMMs as the slots 1 and 2 you obviously unbalance the third channel and may cause the IMC to default to dual channel memory. I would agree with that.

If you fit only half the capacity into slots 3 and 4 you end up with exactly the same memory capacity for each channel. This is the same as leaving the slot 4 empty and fitting the same DIMMs to the slots 1-3. If there is any penalty at all it should be absolutely minimal compared to an unbalanced mode.

If you want me to run a test with the UDIMMs you need to tell me which free software you consider to be conclusive. I can run Win7 or OS X apps.
I agree that what you say, could work, and it's also how I would make it work if it was up to me... but it's not consistent with Intel's (albeit somewhat vague) documentation...

Intel's own X58 desktop single socket reference motherboard, also uses 4 DIMM slots and here's how they describe the operation...

http://downloadmirror.intel.com/18128/eng/DX58SO_TechProdSpec.pdf (pg 16)

Memory Configurations
The Intel Core i7 Processor supports the following types of memory organization:

Tri/Dual channel (Interleaved) mode.

This mode offers the highest throughput for real world applications. Interleaving reduces overall memory latency by accessing the DIMM memory sequentially. Data is spread amongst the memory modules in an alternating pattern.

Three independent memory channels give two possible modes of interleaving:
• Tri-channel mode is enabled when identical matched memory modules are
installed in each of the three memory channels (blue connectors).
• Dual channel mode is enabled when two of the blue memory connectors are populated with matched DIMMs.

Single channel (Asymmetric) mode.

This mode is equivalent to single channel bandwidth operation for real world applications. This mode is used when only a single DIMM is installed or the installed memory modules are not matched.
They seem to make it clear that Tri-channel mode is only engaged in the unique case of having three identically matched memory modules in each of three memory channels.

I'm not sure of the layout of the DIMM slots in the Mac Pro, but isn't it Channel A that has two DIMM slots while B and C have only one?

Finally, I would use Sisoft Sandra or Everest's memory bandwidth tests (in Windows) to determine the real single, dual, and tri-channel memory bandwidth and then try mixing them like you do and see what performance you get with that combo... it should match one of the known single, dual or tri-channel measurements thus removing any ambiguity
 

gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
They seem to make it clear that Tri-channel mode is only engaged in the unique case of having three identically matched memory modules in each of three memory channels.
Leave out the words "only" and "unique" and your statement is true. Intel make no reference to the configuration I use.

I'm not sure of the layout of the DIMM slots in the Mac Pro, but isn't it Channel A that has two DIMM slots while B and C have only one?
No, the first two channels have one DIMM slot and the third channel has two. Read the manual.

Finally, I would use Sisoft Sandra or Everest's memory bandwidth tests (in Windows) to determine the real single, dual, and tri-channel memory bandwidth and then try mixing them like you do and see what performance you get with that combo... it should match one of the known single, dual or tri-channel measurements thus removing any ambiguity
I will do something along those lines in the next days.
 

gugucom

macrumors 68020
May 21, 2009
2,136
0
Munich, Germany
Test carried out with Windows7-64 Lavalys Everest home edition Ver. 2.20.405

Config1: 2GB UDIMMs in slots 1, 2, 5, 6 and 1 GB UDIMMs in slots 3, 4, 7, 8
Config2: 2GB UDIMMs in slots 1, 2, 3, 5, 6, 7

write C1: 6526, 6552, 6545 MB/s Av. 6541
write C2: 6534, 6615, 6535 MB/s Av. 6561 delta 0,31%

read C1: 10.031, 10.026, 10.018 MB/s Av. 10.025
read C2: 10.076, 10.095, 10.065 MB/s Av. 10.079 delta 0,54%

latency C1: 12,7 ns
latency C2: 12,7 ns

Let's examine those results. I have run the test three times in both configurations. With six 2 GB UDIMMs writing is 0,31% and reading is 0,54% faster compared to a mixed config from four 2 GB and four 1 GB UDIMMs. Both configs have latencies of 12,7 ns.

A bandwidth difference of half a percent is absolutely negligible under real world conditions. The mixed mode is much better for upgrades because it lets you buy just 2 DIMMs of each kind for successive upgrades. In the case of the RDIMMs I do not expect to see different results for the comparison of respective configurations. With RDIMMs it is particularly usefull to buy only two of the expansive 8 GB RDIMMs.
 

nanofrog

macrumors G4
May 6, 2008
11,718
2
I wonder why there should be interleaving at all. The memory controller is addressing exactly the same amount of memory cells with the same interface. The I/O process should be the same without an additional serialization.

Are you sure your concept of interleaving is actually happening in reality? I do not know enough about the architecture and the protocol of the memory channel to say it interleaves or not.

Let us say it does, then there may still be enough slack in the protocol to let the total process run in the same time frame. Let us assume we have an HP or a Sun workstation with dual slots per channel. It would not have any bandwidth reduction if all six slots are filled with 1066 MHz memory compared to three slots filled with 1066 MHz memory. Only when it uses 1333 MHz memory the bandwidth would be reduced for six slot use versus three slot use because the controller would step the frequency down to 1066 MHz. At least that is what I read in the Intel literature about the 5500 IMC.
In this case, the interleaving is nothing more than a switch. But when active (more than 1 DIMM in any of the channels), they all have to engage to keep the data flow correct (properly syncronized). Unfortunately, it adds latency, and is why even in triple channel mode, the memory throughput does slow down as additional DIMMs are addedd (up to 9 = 3 per channel are actually allowable in the IMC).
 

VirtualRain

macrumors 603
Aug 1, 2008
6,304
114
Vancouver, BC
Test carried out with Windows7-64 Lavalys Everest home edition Ver. 2.20.405

Config1: 2GB UDIMMs in slots 1, 2, 5, 6 and 1 GB UDIMMs in slots 3, 4, 7, 8
Config2: 2GB UDIMMs in slots 1, 2, 3, 5, 6, 7

write C1: 6526, 6552, 6545 MB/s Av. 6541
write C2: 6534, 6615, 6535 MB/s Av. 6561 delta 0,31%

read C1: 10.031, 10.026, 10.018 MB/s Av. 10.025
read C2: 10.076, 10.095, 10.065 MB/s Av. 10.079 delta 0,54%

latency C1: 12,7 ns
latency C2: 12,7 ns

Let's examine those results. I have run the test three times in both configurations. With six 2 GB UDIMMs writing is 0,31% and reading is 0,54% faster compared to a mixed config from four 2 GB and four 1 GB UDIMMs. Both configs have latencies of 12,7 ns.

A bandwidth difference of half a percent is absolutely negligible under real world conditions. The mixed mode is much better for upgrades because it lets you buy just 2 DIMMs of each kind for successive upgrades. In the case of the RDIMMs I do not expect to see different results for the comparison of respective configurations. With RDIMMs it is particularly usefull to buy only two of the expansive 8 GB RDIMMs.
This does look encouraging... but just to remove any last bit of skepticism I would still encourage you to run this test with sticks only in 1, 2, 5, 6 just to make sure that dual-channel performance is 33% less than what you are seeing to ensure the test is accurately reflecting tri-channel performance in the first place.

Here's my result from a few months ago with 3x2GB in my quad. Note that my write is about 50% higher than yours but my latency is a LOT higher than yours. :confused: