Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Thanks for the info. With computer components changing so quickly, it's hard to keep up. Only question is since each CPU has it's own memory controller, how do they keep from trying to access the same address in memory? Does each CPU get its own bank of RAM? If that's the case, then sharing info between the CPUs would be hard, but not impossible, I guess if you had a good OS.
Yes in AMD's server architecture each chip has it's own IMC and it's own bank of RAM. Cache coherency is done through dedicated HT links between the processors. With 3 HT links per chip it scales well to dual chips and pretty well to 4 chips. Barecelona will add a 4th HT link to allow "ideal" scaling to 4 chips and there will also be a link spliting ability so each chip can have 8 half links to allow for 8 chip setups. So far performance is great if each processor is working off it's own RAM bank, but of course things get difficult when 1 chip is needs to access data from another chips banks despite the HT links. I think XP doesn't have any NUMA support at all and Vista was supposed to add it, but so far the implmentation seems poor since there isn't much performance increase.
 
I think XP doesn't have any NUMA support at all and Vista was supposed to add it
XP x64, Server 2003 and Vista have NUMA support.

It's a hard problem to solve, however. Windows takes the tactic that when a thread allocates memory (at the system level, not at the "new" or "malloc" call in the application code), the memory will be allocated from the NUMA node where the thread is running if there is available memory on that node.

That's great if the thread continues to be scheduled on a CPU on that node, but if all the CPUs on that node are busy - what do you do? Typically, you'll decide that it's better to run on a "far" node with slower memory access than to not run at all.

Windows APIs have the functions to allow a program to control where memory is allocated, and to say that a thread would "prefer" to run where the memory is.

but so far the implmentation seems poor since there isn't much performance increase.
Do you have some links to support that conjecture?
 
Only question is since each CPU has it's own memory controller, how do they keep from trying to access the same address in memory? Does each CPU get its own bank of RAM?

Each NUMA memory node has a unique base physical address for the RAM attached to that node.

For example, if you had a two socket system with 2 GiB of RAM on each memory controller

o Socket 1: RAM addresses 0x00000000 to 0x7fffffff (0 to 2GiB-1)
o Socket 2: RAM addresses 0x80000000 to 0xffffffff (2GiB to 4GiB-1)

It's not difficult at all. In essence, it's no different from having two DIMMs in a system - they don't have the "same" address, the system assigns each DIMM a unique offset.
 
We don't need to sit in their strategic planning sessions. We already know

  1. They removed "computers" from their corporate name
  2. The key note speech at Mac World did not even talk about Macs
  3. They took key people off the Leopard project so they could work on a phone

Taken together we can see what's going on.

You are drawing conclusions from insufficient data. You are also weighting it questionably. You are also taking a small snapshot of time and attempting to use that as forecast point of reference. There is nothing even remotely conclusive in this.

Removing 'computer' from their name means nothing. It is marketing. Think about it. They now have iPhone, iPod and iTV. Those are not normally associated with computers. Just because they are diversifying their products, does not mean they are dropping other ones. Why go through the cost and headaches of moving to the Intel processor, if they were going to move out of computers?

I expect Apple will be doing a major overhaul of their Mini's and iMacs soon. There is another thread going where Apple is getting hammered for just doing an incremental upgrade to the Mac Pro's. If they were to do a minor upgrade to the other machines, they would get hammered for that. To do something exceptional, they have to wait for some advances from other vendors. The bottomline is, far too many people are acting like spoiled kids. No matter what they do, someone whines.

Now, there are professional whiners, of the MS disinformation/propaganda ilk. This board is just crawling with them. Those can be discounted. They have the collective IQ of an ameba and no sense of honor. However, many people here set themselves up for disappointment. They create fantasies of what they want, then cry when the reality falls short.
 
Do you have some links to support that conjecture?
http://www.techreport.com/reviews/2007q1/quad-core/index.x?pg=13

It's kind of an observation. When Quad FX was released it performed poorly against Kentsfield despite speculation that the dual die approach was starved for bandwidth and would be crippled by cache coherency. It was generally thought that Quad FX was hobbled by the NUMA implementation in XP and would gain on Kentsfield once Vista was released. Recent testing has shown that Vista doesn't seem to allow Quad FX to gain ground on Kentsfield so it's implied that either Quad FX is already going all out or Vista's NUMA implementation still doesn't expose all of Quad FX's potential.
 
Windows APIs have the functions to allow a program to control where memory is allocated, and to say that a thread would "prefer" to run where the memory is.

I think the trick is to set your thread/core affinity before you start allocating memory. The challenge is to set the affinity right when you don't know what other apps are doing. Or can you? It would be easier if the OS did it.
 

Unfortunately, this story criticizes XP/Vista, without offering an example of an operating system that *can* do well with NUMA.

NUMA is very hard - "Non Uniform Memory Architecture" - some RAM is fast, some RAM is much slower. "Fast" and "Slow" are relative to the CPU on which a thread is scheduled.

An optimum memory layout goes to hell if you're scheduled on a different processor.

One can write an application that hardwires itself to the NUMA topology - and get the best possible performance. This is great for a scientific cluster app where you can give one app complete control of the machine - but for a general purpose multiprocessing desktop it's not so simple. In fact, "cluster" is very appropriate - the program needs to treat the NUMA system like a collection of boxes, and control how the boxes are scheduled and allocated.

One can look at this data and come up with either the conclusion that "Windows sucks" or "NUMA sucks". Without giving an example of a system that performs much better than Windows, concluding that "Windows sucks" shows a bit of bias.

It's all academic to MacRumors, though, since Apple won't let OSX run on any AMD system.
 
I think the trick is to set your thread/core affinity before you start allocating memory. The challenge is to set the affinity right when you don't know what other apps are doing. Or can you? It would be easier if the OS did it.
If *you* don't know what the other apps are doing, how will the OS determine it?

How will the OS determine that you'll be launching Photoshop 10 minutes after starting a video render job? Any attempt to set affinity is likely to hurt performance if the load changes.

If you want a shocking look at the complexity involved in this, take a peek at the VMware ESX documentation. VMware's NUMA support includes the automatic migration of memory to the NUMA node where the VM is running - it's better to copy memory between NUMA memory controllers than to access "far" memory!

(see http://www.vmware.com/pdf/esx2_NUMA.pdf)
 
Unfortunately, this story criticizes XP/Vista, without offering an example of an operating system that *can* do well with NUMA.
I guess it would be hard to do a legimate comparison of NUMA between OSs since there would be so many other differences. Anyways, just looking at SPECfp_rate2000 results though, which are supposed to be bandwidth intensive I believe, systems with SuSE Linus tend to do better than better than those on Windows Server 2003 Enterprise even though they are identically configured.

http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07372.html

http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07373.html

And I believe Solaris achieves the highest scores for Opterons in SPECfp_rate2000. I guess the question is how applicable is SPECfp_rate2000 in testing NUMA across various OSs.
 
Drawn into mediocrity

Admittedly, the new Apple laptops are convenient to work with, although most of the time I'm still using my 5y old PB G4 that still fits me for many things.

However, since the switch to Intel, design hasn't improved, and the ProBooks just got uglier; all in all Apple got content to adopt these PC'ish improvements that Intel delivers.

The times are over, when a company like IBM, just pulled out a real performance leap for Apple.

Look at the Playstation3. It took some time to get it running, but people are now gradually taking advantage of the multi-core architecture, and whoever saw it in action, must admit, this was a jump forward.

The adaptation of the Cell would have forced Apple to build up software-know-how concerning efficiently programming multi-core platforms and but also to set in place the appropriate hardware.

I you want to stay atop, you have to go with the best, with those who so-far year by year are pushing forward the limits in many fields of Computer Science.

The Intel move is just convenient in the short run, which makes you lazy and content. That's it.
 
I guess it would be hard to do a legimate comparison of NUMA between OSs since there would be so many other differences. Anyways, just looking at SPECfp_rate2000 results though, which are supposed to be bandwidth intensive I believe, systems with SuSE Linus tend to do better than better than those on Windows Server 2003 Enterprise even though they are identically configured.

http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07372.html

http://www.spec.org/osg/cpu2000/results/res2006q4/cpu2000-20060918-07373.html

And I believe Solaris achieves the highest scores for Opterons in SPECfp_rate2000. I guess the question is how applicable is SPECfp_rate2000 in testing NUMA across various OSs.

Not quite identical - it's 32-bit Windows 2003 and 64-bit Suse !! :eek: (But since the compiler uses 128-bit arithmetic most of the time, this is not as major as it would sound.)

Also note the following from the reports:

Win2k3:
Other Configuration Notes
The start /b /wait /affinity command is used to bind CPU(s) to processes.​

Suse:
Other Configuration Notes
Taskset utility used to bind process to CPU(s)​

So, Linux had the advantage of running in 64-bit mode, and both systems were running with hard affinity to eliminate hypertransport memory references.

It should give AMD fans pause when they realize that the people running these benchmarks did everything possible to *prevent* any memory traffic from going over the HyperTransport links.... To get the best performance, they had to convert the NUMA SMP system into the equivalent of 8 single CPU systems, bypassing HT and the NUMA topology.

Note that SPECrate is not a multi-threaded application - multiple independent copies of the application are run. In this case, the 8 copies were locked to the individual CPUs of the system, not being scheduled by the OS.
 
If *you* don't know what the other apps are doing, how will the OS determine it?

I didn't mean it in that sense; I meant can you get other processes' affinity settings from the API? If so, you might steer clear of processors that have been adopted by another app. If you have the luxury.

The OS would do it by being given some relationship info about the threads by each app. The scheduler would then factor that data in with the realtime info that it has in deciding what threads to run and where.
 
I didn't mean it in that sense; I meant can you get other processes' affinity settings from the API? If so, you might steer clear of processors that have been adopted by another app. If you have the luxury.

There's a second issue - what do you do when the memory that you need is on a CPU that's been "adopted" by another app?

And note that it's "thread affinity" that's important for multi-threaded applications - all the threads are within one process. (That's true for Windows and Linux 2.6, can anyone confirm for OSX?)

And thread affinity means a shared memory space, so if you schedule threads of a process you might find some threads are close to the memory, and some threads are "far".

NUMA is a very difficult issue to get right in the general case. It's easy if you can lock independent processes to cores, but very hard for a dynamic mix of multi-threaded applications.
 
Huh? There's nothing to say the hardware has to be in existence before the software is optimized for it. There's plenty of mathematics/genetics software made to scale up to 64 or 128 processors.

Also, remember that the original Cloverton upgrade was performed months ago. So it's not like it would be impossible to have built an 8-core Mac Pro before so you have something to test on.

True. But programmers need to be able to test their code, and it's unrealistic to expect them to hack their system to do so. Certainly a programmer could make an app that should be able to use a non-shipping future number of cores, but they may also be hesitant to ship code without being able to test and optimize it on the hardware in question.

Logic developpers said there won't be any logic 7 update using 8 core.
I didn't see benchmarks but they also said quad macpro do a little better than octo macpro on logic ...:(

Where did they say that? Musicmesse? These days it's rare to hear anything from Logic developers, where did this info come from or is it just a rumor?
 
benchmark

I find this conversation confusing and full of claims that are not substantiated.

8 core Mac is approximately 195% faster than a 4 Core on some applications.


luxology did a benchmark on their Modo 3d program which uses a true multithreaded multicore rendering engine and it showed a 195% speed increase over there 4 core.


also the reason why you want multiple cores is to run multiple programs decent speeds.

Games and other software may not see as much speed increase because they were designed to work on COMPUTERS WITH FEWER CPUs

memory will always be a bottleneck no matter how far in the future we go. You need a 4 GB of memory to take advantage of so many CPUs so if o use newer memory the price would be outrageous
 
Huh? There's nothing to say the hardware has to be in existence before the software is optimized for it. There's plenty of mathematics/genetics software made to scale up to 64 or 128 processors.

Also, remember that the original Cloverton upgrade was performed months ago. So it's not like it would be impossible to have built an 8-core Mac Pro before so you have something to test on.

no idea what goes into software development. threads in C++ and Java have to be expressed. So if I set up a multiple thread application I must tell exactly what portions of the class are allowed to use threads. If I write software with too many threads within an application that is going to run on one processor. i am bottlenecking my own program.

I would like to know which software uses 64 or 128 processors is is meant to run on a mainframe?

most programs switch between one thread and multiple thread processing. Therefore a program that uses one thread would not benefitfrom multiple processes and lets you run multiple programs. Most programs switch between one thread and multiple threads therefore you get the is only 45% to 60% faster the more processors to add. When you write multiple threads you also have to keep track of multiple processes which could induce bugs.


The BeOS actually use multiple threads in all their program libraries which made it quite a unique operating system.
 
While the gaming benchmarks are interesting in themselves, I gotta wonder who would consider this as a serious part of the purchase decision.

Then again, if you ARE buying this for gaming, you'd be a girly-man for getting less than two 30" monitors.

Funny you write this... today i just put an order in for the 8 core MP and 2 30" apple displays... woohooo..... but for music and film production.... are there games that utilize both screens? i may have to ditch my xbox plans for that lol
 
Update to 8 core Mac?

Probably 'nothing' to do with the debate's here but I've just had my order for one of these beasties, delayed slightly because I'm told they've changed/updated a part.
 
Update to 8 core Mac?

No, I didn't get chance to ask, but they did say I would be getting a better machine.
 
That's interesting - I was wondering why mine had slipped back a few days - be nice to know what is being changed though!?

Adam
 
Probably 'nothing' to do with the debate's here but I've just had my order for one of these beasties, delayed slightly because I'm told they've changed/updated a part.
Did they happen to mention which part?
No, I didn't get chance to ask, but they did say I would be getting a better machine.
That's interesting - I was wondering why mine had slipped back a few days - be nice to know what is being changed though!? Adam.
Prolly just the motherboard. ;) :eek: :confused:

Now you've unleashed a new mystery Susan & Adam. These kinds of unannounced updates are one of the reasons some of us are still waiting a little while longer. Anyway can you try and find out what's changing Susan? Inquiring minds need to know. :)
 
Prolly just the motherboard. ;) :eek: :confused:

Now you've unleashed a new mystery Susan & Adam. These kinds of unannounced updates are one of the reasons some of us are still waiting a little while longer. Anyway you can try and find out what's changing Susan? Inquiring minds need to know. :)

Yes indeed. I just ordered myself a quad 3 Ghz yesterday... I would like to know as soon as possible if whatever is being changed will be implemented on the quads too so I know whether or not to cancel my order, or something.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.