Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

barefeats

macrumors 65816
Original poster
Jul 6, 2000
1,058
19
Last night Lloyd Chambers of DigLloyd.com tweaked his DigLloydTools app (DLT) that we use to test memory throughput. It's now more accurate. The bad news is that it showed that when we put 8 sticks of memory in the 8-core 2.26GHz Nehalem, our throughput for memory read/write (memmove) dropped by 1/3. It turns a triple channel memory bus into a dual channel memory bus. Arggghh.

Specifically, in our test, the combined read/write throughput dropped from 9261MB/s to 6195MB/s when we went from 6x2GB to 8x2GB configuration.

Now, don't panic. That doesn't necessarily affect real world app performance unless the particular app you are running is saturating the memory bus. Which apps saturate? I don't know yet. I'm running our complete real world test suite including Pro Apps and 3D Games in both the 12G and 16G config. If I find anything that's significantly slowed by the 8x2G config, I'll post it here as well as on Bare Feats.
 
Last night Lloyd Chambers of DigLloyd.com tweaked his DigLloydTools app (DLT) that we use to test memory throughput. It's now more accurate. The bad news is that it showed that when we put 8 sticks of memory in the 8-core 2.26GHz Nehalem, our throughput for memory read/write (memmove) dropped by 1/3. It turns a triple channel memory bus into a dual channel memory bus. Arggghh.

Specifically, in our test, the combined read/write throughput dropped from 9261MB/s to 6195MB/s when we went from 6x2GB to 8x2GB configuration.

Now, don't panic. That doesn't necessarily affect real world app performance unless the particular app you are running is saturating the memory bus. Which apps saturate? I don't know yet. I'm running our complete real world test suite including Pro Apps and 3D Games in both the 12G and 16G config. If I find anything that's significantly slowed by the 8x2G config, I'll post it here as well as on Bare Feats.

It's also important to note that memory throughput isn't everything.
If you're actively using 14GB+ of memory, you're better served by 16GB than by 12GB for the simple reason that swapping is even slower ;)
 
Do you also have a QuadCore model at hand for testing?
The question is if 4GB Sticks can be used in these models!

Keep up the good work! :)
 
It's also important to note that memory throughput isn't everything.
If you're actively using 14GB+ of memory, you're better served by 16GB than by 12GB for the simple reason that swapping is even slower ;)

Agree. In our After Effects CS4 tests with Total Benchmark, it gobbled up 13GB when I had it in 16GB config. When running in 12G config, it only had 10GB available because 2GB is reserved for "other apps." So as you can guess, it ran the benchmark 3 seconds faster with 16GB (117 vs 120).

It's a balancing act. Do I need memory capacity or speed or both?
If 4G modules were not so expensive, the best way to go would be 6x4G = 24G.

As for 4-core Nehalem with only 4 memory slots, OWC will be trying the 4G modules out in the 4-core as soon as they get some in stock (probably this week). If it works, they will announce it on their site and their blog.
 
It's also important to note that memory throughput isn't everything.
If you're actively using 14GB+ of memory, you're better served by 16GB than by 12GB for the simple reason that swapping is even slower ;)

Now the logical thing to ask would be something like this: With 8 sticks, is there an automatic way to pretend that slots 4 and 8 "aren't there" when memory throughput is lower than 12GB? Do we know for sure MP does not already do that?
 
Now the logical thing to ask would be something like this: With 8 sticks, is there an automatic way to pretend that slots 4 and 8 "aren't there" when memory throughput is lower than 12GB? Do we know for sure MP does not already do that?

We tested that scenario. The 4th and 8th slot drag down the whole bus.
 
Apple sure did cause confusion with these new models. 2008 or 2009, quad or octo, 6 or 8 memory modules. They had avoided such things in the past with things like not using 8 core processors until the 3GHz were available and only offering the 2.8GHz in single processor configurations. I wonder if those who made such decisions moved on or actually how high such decisions go.
 
i'm sorry but i don't understand what core counts and clock speed have to do with memory bus speed. care to explain to a relative noob?
 
Duh. Why don't these machines have 6 and 9 (or more possibly, 12) slots respectively?
 
How about 4x2GB, are memory speeds similar to 8x2GB since it is in dual channel mode? I'd love to see this benchmark added to the DigLloydTools test.
 
I reckon for the octo models, get 12gb now (6x2) and later when the 4gb chips are better priced go to 24gb (6x4)

Thats my plan anyway.
 
I'll be going with 4x2GB from Apple and then upgrading with an additional 2x2GB modules (when OWC gets around to offering that option) for a total of 6x2GB. I'm curious what kind of speeds I'll be seeing until then.
 
. . .

As for 4-core Nehalem with only 4 memory slots, OWC will be trying the 4G modules out in the 4-core as soon as they get some in stock (probably this week). If it works, they will announce it on their site and their blog.

Lovely! I can't wait to find out! And thanks for posting about the 1/3 drop in speed when using all 4 slots (or all 8 slots). Presumably if you used up 4 slots of one CPU, and then only 2 slots of the other CPU (in the octo, of course), then the speed would still drop, right? Just curious.
 
Last night Lloyd Chambers of DigLloyd.com tweaked his DigLloydTools app (DLT) that we use to test memory throughput. It's now more accurate. The bad news is that it showed that when we put 8 sticks of memory in the 8-core 2.26GHz Nehalem, our throughput for memory read/write (memmove) dropped by 1/3. It turns a triple channel memory bus into a dual channel memory bus. Arggghh.

Specifically, in our test, the combined read/write throughput dropped from 9261MB/s to 6195MB/s when we went from 6x2GB to 8x2GB configuration.

Now, don't panic. That doesn't necessarily affect real world app performance unless the particular app you are running is saturating the memory bus. Which apps saturate? I don't know yet. I'm running our complete real world test suite including Pro Apps and 3D Games in both the 12G and 16G config. If I find anything that's significantly slowed by the 8x2G config, I'll post it here as well as on Bare Feats.

Are you sure adding a 4th stick affects the performance of all memory, or is it just the 4th stick that's dragging the overall score down.

My understanding is that regardless of the presence of the fourth stick, the first three will run in interleaved mode (stripping data across the 3 channels). The 4th stick obviously has no counterparts to interleave data across so it will run in SINGLE channel mode. Thus a memory benchmark that actually utilizes all available memory, will suffer, and yes, the fourth stick is dragging the benchmark down, but that doesn't necessary imply that the first 6GB are not fully interleaved and running at maximum.

Just a suggestion since I don't know how Apple implemented things nor do I know how your benchmark works.
 
Are you sure adding a 4th stick affects the performance of all memory, or is it just the 4th stick that's dragging the overall score down.

My understanding is that regardless of the presence of the fourth stick, the first three will run in interleaved mode (stripping data across the 3 channels). The 4th stick obviously has no counterparts to interleave data across so it will run in SINGLE channel mode. Thus a memory benchmark that actually utilizes all available memory, will suffer, and yes, the fourth stick is dragging the benchmark down, but that doesn't necessary imply that the first 6GB are not fully interleaved and running at maximum.

Just a suggestion since I don't know how Apple implemented things nor do I know how your benchmark works.

AFAIK, X58 will run memory in Triple channel or Dual channel.
In other words, the fourth stick makes sure things get divvied up into pairs.
 
AFAIK, X58 will run memory in Triple channel or Dual channel.
In other words, the fourth stick makes sure things get divvied up into pairs.

From what I remember the 3 RAM slots work in triple channel and the 4th in single channel... and the speed drop only when the 4th slot is working
 
From what I remember the 3 RAM slots work in triple channel and the 4th in single channel

Thats not possible. The memory controller only has 3 channels available. Where would it get the 4th channel and then why wouldn't it be able to run quad channel memory?
 
I dont suppose you have access to Shake or Nuke to test core and memory usage do you?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.