Separate names with a comma.
Discussion in 'Mac Pro' started by dknightd, Jun 12, 2010.
I thought they worked better with matched pairs. Maybe the matched pairs is old info?
BECAUSE GOD HATES YOU
Actually, the socket-1366 Nehalem's have a tri-channel memory controller, so you want to use matched..err...triplets.....uh....install memory in sets of 3.
Depends on the cpu architecture. One of the big features with the release of intel's nelahem architecture was support for tri-channel memory for better memory bandwidth. The bloomfield chips support it, while lynnfield and down nehalem chips only support dual channel.
Intel's next major cpu architecture, sandy bridge, will support dual, tri and quadruple channel memory.
Because Apple is too cheap to configure it with 6 GB. Also, that chip can handle six modules, so putting four on the logic board is just upsetting
But honestly, only 3 measly gig in a 'top' powerhouse workstation, and 2 gig in their bottom of the range non pro macbook.
Simply a case of if you want something done, do it yourself.
(so why the fleecing prices)???
Lucky you even get 3GB, before the 09 models Apple sold their workstation (like everyone else) with the minimum amount of memory. Users of these types of systems have very different requirements when it comes to memory and storage so there is little reason to offer more than the minimum. Also it's not like they would sell more systems if they came with 6GB of memory.
My 2008 2.8GHZ 8 core only came with 2GB installed. I used it that way for the entire year that I owned it and sold it that way. Never once had a memory problem but I never used it for memory intensive apps anyways.
My 2009 MBP has 4GB installed.
It's pretty silly.
The original Mac Pro only shipped with 1GB of ram, even if you opted out for the 8 core variant in 2007.
So, 3 dimms might actually be faster than using 4 ?
It's the cheapest way to get a triple channel configuration in either a Quad or Octad Nehalem system.
The architecture will allow for multiple DIMM's per channel (interleaving), but Apple didn't balance this out (2x DIMM slots per channel for example), given the design choices they made (physical constraints).
In theory, Yes.
But exceptionally little software can actually utilize it, and why using the 4th DIMM per CPU for additional capacity out-weighs the bandwidth advantage of triple channel.
Interesting, I never noticed that the base 2009 Mac Pro only includes 3GB of RAM. I got the 8-core 2009 Mac Pro, which includes 6GB of RAM.
The question of "why 3" is that there is a modest speed improvement thanks to triple-channel memory. If you use all four banks (or all 8 in the 8-core Mac Pro), the memory will run as dual-channel instead of triple-channel, for a modest speed decrease in terms of memory access. In terms of the scientific software that my company produces, the speed difference is about 5%. Whether this will be a significant difference for you will depend on the type of software you are using.
Never give away what people will pay for.
Did you even try "Configure your Mac Pro"?
$1350 for 12GB of RAM. Wow and ouch. I'll go to OWC and get it for $900 less
Well.. the OP didn't mention 'cost', I figure WTH, he/she can afford it!
what? were is the techinical backup on this?
In a 4 DIMM slot set up one of the memory channels is in a interleaved mode. The other 2 are not (or at least don't have to be with proper support). Moving just one of the channels to interleaved mode minimizes the impact (versus moving all 3).
You still have triple channels. You only "loose" the triple channel effect when all the cores happen to pull data out of the same channel. ( e.g., all four cores work on a single 500 MB image which is purely allocated to just one DIMM or , in 4 DIMMs filled set up, purely allocated across the two DIMM interleaved on the 3rd channel. )
There is no change in the software. The software just says "get me memory at address 1234000" it is the CPU's MMU and memory subsystems job to go get it. That is opaque to the software which exact physical DIMM it is on. The app doesn't even use physical memory addresses.
Apple supports 4 DIMM slots because it is the natural way to get to 8GB using 2GB DIMMs. (and 16GB in the dual package set up.) Most customers are currently going to want to avoid 4GB DIMMs because not quite as cost effective.
It isn't theory.
If you put one 4GB DIMM versus 4 1GB DIMMS you'd see a difference on anything that was actually dependent upon memory transfers and large enough to force as spread across DIMMs ( versus micro benchmarks that load everything into the L3 cache or have disk I/O marks that stall the cores so much that the number of no-ops is sky high.)
In the first case you essentially have a front side bus. You know... the arch that AMD abandoned a long time and and Intel has also abandoned with much fanfare with Nehalem. To toss that diference off as pure occasional theory benefits.... why have they both changed. Because it makes no difference? Numerous benchmarks state otherwise.
The triple channels are implemented in hardware. The software has no choice to utilize it. The OS memory page allocation code may not be completely milking all of the advantage out it but don't need to change application software to get the effect.
A very straightforward case is when a OS has a file buffer cache and does copies to the apps address space when data is read. One core can be doing look ahead reads filling up the cache and another core copying into apps address space. While both will need to serialize when both accessing the buffer cache when writing/reading asynchronously can both proceed in parallel unblocked. If have a single front side bus all those accesses to memory must be serialized and you get more slowdowns.
Another case is when after the OS has allocated/deallocated the physical memory pages for the creation/destruction of serveral applications the physical page allocations start to somewhat randomly distribute across DIMM boundaries (e.g, an app gets last 20 pages on one dimm and rest on another. If the even of the app thinks a data structure is localized it access still can be parallelized. )
Pragmatically since the kernel code and data structures start up first they tend to segregate on a different DIMM than the application code. Similarly, if running multiple apps which are periodically accessing and doing things in the background they too, on average tend to disperse off to different DIMMs if have low GB, single DIMM per channel set up.
The effect tend to disappear as put very large GB memory pools on each of a limited number of channels. Folks who jam as much memory as possible into the box and then invoke narrowly focused concurrency loads which focus on relatively small problems ( relatively much smaller than channel memory pool size) will see much smaller effects.
There's not that much of a difference for most applications though. Once that DIMM slot is occupied and accesed, it's no longer able to run as quickly. That latency penalty isn't that bad in real world terms, especially for the trade-off in additional capacity.
But without the other 2x DIMM's (2x per channel), the user looses the ability to add in as much capacity capable on other boards using low cost DIMM's (i.e. 6x 2GB sticks). Under limited budgets (and capacity needs not needing to exceed 12GB for example), that difference of 4GB could make a difference, assuming the application usage can utilize the additional capacity.
The Nehalem-EX (Becton), the 7500 and 6500 series (http://www.intel.com/p/en_US/products/server/processor/xeon7000) , support 4 channels and they aren't Sandy Bridge microarchitecture. Adding and subtracting cores doesn't necessarily change microarchitecture. That is more a change in the implementation (and perhaps layout). There is no addition/change in instructions. No new function units or logic implementations. No new interconnect network. Just more of the same units copied (and yes hooked together, but if arch allowed some expansion not new).
Likewise adding and subtracting memory controllers doesn't particularly change the microarchitecture. If primarily the same controllers as previously existed can be just replicating them on the die one more time. Perhaps need one more connector to the L3 cache, but not necessarily a big change in protocols or implementation.
Another example would be making the L3 cache bigger. Again no substantive microarch change necessarily required. The implementation just sucks up more transistors on a die.
Really depends on workload. If running numerous processes which spread out it will. It is only multiple threads sharing smallish pieces of memory where this doesn't work. But yeah not going to be double digit percentage difference.
Only on that channel though. Don't have to be on one channel or an single app which hogs up 80+% of all memory.
On a highly parallel app that has algorithms that periodically have to fork/join the workload. Yeah sure going to be performance bound to that slowest memory channel. So once make one subset workload go slow the overall app slows down.
This is also just a limited term constraint. As 4GB DIMMs fall into normal pricing the problem goes away. If need them only later in the Mac Pro lifecycle this isn't as big of problem .
Apple makes a very similar tradeoff on PCI-e slots. The Mac Pro only has 3 and lots of "large footprint board" workstation models have 6 or 9. Sure there are some folks with solutions that require lots of slots used that don't work. However, how many folks are actually leaving out in the cold ? So a subset of a subset of folks can't leverage the box as well. It is a design judgment call. I'f Apple has gone on and gotten a broad range of real life deployed configs and those stats show that "over 16GB" or "over 8GB " configs come out as a single digit percentage, then it isn't a bad call.
This high memory coupled to lower core count just seems like a strange demographic. What saying here is that there is significant number of folks who need lowest core:memory ratio possible. It isn't very intuitive why that should be true or desirable. That sounds like need much better software not different hardware.
Of course, but this is less common though.
It depends on the software, and unfortunately, as you keep trying to explain the difference between FSB and Nehalem, most software is still designed around FSB, and where such an unbalanced load occurs.
I keep recalling the level of recycled code that usually exists and necessitation of backwards compatibility kicking in and ruining the party.
Yes, but ATM, it's still a valid complaint. Larger DIMM's don't come down in price as quickly as their smaller capacity counterparts do (cost increases aren't linear with additional memory chips, as the demand is smaller).
Down the road, 4 and 8GB sticks will become low enough this is less of an argument (then it will increase again, as the trend typically goes).
For PCIe, 4x is a better trade off over the DIMM slots made available. 6x would have been just fine, and is the more common design used by other board makers (and for good reason).
Sure compromises have to be made, especially on smaller boards. But IMO, the RAM capability was compromised to an extent. Yes, it could have been worse, but given the video grapics industry seem to be a notable subset of the MP market, the additional DIMM's would have been in order (at least that's the area they've heavily marketed/wooed in the past, and obtained a loyal customer base as a result).
I'd love to see better software quality (optimization for it's designed function), but it takes awhile for it to be released (assuming the inevitable bugs don't detract so much, that it fails to produce effective results).
What's available at the time of an architecture release is written for older processor designs.