Intel Outlines Next Generation Processors (Nehalem) Due in Late 2008

diamond.g · Mar 19, 2008

motulist said:
Intel sure do their best to keep these things confusing! In the old days, the chips in each CPU family started at a low number and when an improved chip came out the number went up or added a letter.

68000 cpu => 68010 => 68020

601 cpu => 603 => 604

G3 cpu => G4 => G5

Now with intel we have

clovertown (= core2?) => penryn? (core2duo?) => nehalem?

I'm sure that last progression of CPUs is completely incorrect, and that's my point. There is absolutely zero logical sense in how intel names their CPUs.

Core 2 is Core 2 Duo, Core 2 Solo, and Core 2 Quad. Think of it as being the same thing as a G5 Single core, dual core and quad core.

From Core 2 the next step would most likely be called Core 3. There is no more Pentium (which should have ended with P3 as P4 was totally different).

rhg84 said:
I'm a little confused these chips will go in the Montevina platform correct? So what are people guessing no update until these are released? or we we still hoping for a update mid year with a later update to these chips?

Nope Nehalem has a new platform that isn't Montevina (which only applies to notebooks anyways). Nehalem for mobiles won't be out till next year. But for servers and desktops (note iMac doesn't use desktop CPU's) the chips will be out this year.

diamond.g · Mar 19, 2008

DavidCar said:
The complexity is such that I would still like to see a diagram. In a two Nehalem system, do the QPIs connect to the Northbridge or to each other? If the memory is "3 Ch" does that mean that each Nehalem has three banks of memory? If I have one application running on both Nehalems, then on which Nehalem's memory does the program code reside? Could it sometimes reside with both? Can OSX handle that? Is it possible and perhaps easy to create a four Nehalem system? Would that require a different Northbridge chip? Etc.

Tri Channel is similar to dual channel but with three sticks versus 2. So instead of having 4 sticks of ram (or two in the iMac and MB/P/A) you have 6 (or 3). The code would reside on whatever core is working on it. Most likely in L3 cache where all the other cores can see it, which leads to a potential problem. Cache poisoning is going to become a big deal again. OS X should run on it, but I am not sure how efficient it would be. This may accelerate 10.6 as I don't see a kernel change comming out in a 10.5.x release, but then again anything is possible as 10.4 has a arch change.

Basically, Intel is copying AMD. Well with the exception of the SMT thing.

bigwig · Mar 19, 2008

DavidCar said:
Has anyone seen a block diagram of how, for example, a two chip Mac Pro would be created? I assume two blocks of main memory. So where does the I/O get attached? Would OSX need to be enhanced for NUMA memory, etc?

I don't think there's any chance Intel or Apple will do a NUMA machine. It's a little too niche for them. Call SGI if that's what you need.

RoboCop001 · Mar 19, 2008

gifford said:
Not just strike from underground but from the air, as this recent job description makes out...
"
Duties will include:

•Controlling Skynet 4 and Skynet 5 Spacecraft in accordance with authorised procedures.
•Working as part of a team on a 2 Days, 2 Nights, 4 Off Shift Cycle.
•Contributing to the drafting of Operational Documentation and development of Test Plans in support of Spacecraft Operations.

The successful candidate must be a:

•A Spacecraft Professional with previous practical exposure in Spacecraft Control operations.
•Previous exposure in the SKYNET programme is highly desirable.
•Security Clearance is a requirement for this position. "

link: http://www.justengineers.net/vacancies/vacancy-details.asp?id=552303

Spacecraft too?? ........ There's no where to hide!! Nowhere!

Terminators on the ground, spaceinators in space...... there's only one thing left to do....... build a second skynet!! Clearly, this will allow them to destroy each other because of competition! How could it possibly go wrong?

Get started, Johnson. And better rename Skynet 2 to something sexy, like... Skyforce or...... Megasky..... or maybe even SkyHook.... uh oh... does that mean SkyHook is Skynet 2? Are they both already here?? ......They're working together aren't they.

We're so screwed.

And all this because Intel decided to make faster and better processors year over year.

Stridder44 · Mar 19, 2008

roland.g said:
Here come all the waiting for nehalem threads. Unfortunately people don't see the Q2 09 for notebooks and iMacs, they just see late 2008 and think oh no and I held out for a Penryn MacBook Pro. Good thing all that waiting did. We tried to get you to update sooner and told you that the Penryn update was almost negligible in terms of performance and that only really offered better power life. If anything you lost some L2 cache. But no you really had to have it. Update six months earlier and that makes your machine older and more in need of an update next summer when a more significant leap occurs.

Too bad it was an awesome bump, and if you ordered the 2.5 you gained L2 cache. Stop hating on the waiters. Did you wait? No? Then why do you care so much?

Anyway, always happy to see developments. I hope these things really do amazing stuff.

DavidCar · Mar 19, 2008

bigwig said:
I don't think there's any chance Intel or Apple will do a NUMA machine. It's a little too niche for them. Call SGI if that's what you need.

Doesn't two Nehalems, each with it's own memory bank = NUMA? Or even four Nehalems as in the diagram in the previous message?

DavidCar · Mar 19, 2008

diamond.g said:
Tri Channel is similar to dual channel but with three sticks versus 2. So instead of having 4 sticks of ram (or two in the iMac and MB/P/A) you have 6 (or 3). The code would reside on whatever core is working on it. Most likely in L3 cache where all the other cores can see it, which leads to a potential problem. Cache poisoning is going to become a big deal again. OS X should run on it, but I am not sure how efficient it would be. This may accelerate 10.6 as I don't see a kernel change comming out in a 10.5.x release, but then again anything is possible as 10.4 has a arch change.

Basically, Intel is copying AMD. Well with the exception of the SMT thing.

Thanks for the diagram. It appears one Nehalem can have four QPIs, while each Northbridge can have two QPIs. Cool.

diamond.g · Mar 19, 2008

DavidCar said:
Thanks for the diagram. It appears one Nehalem can have four QPIs, while each Northbridge can have two QPIs. Cool.

The diagram may be slightly misleading though, each chip has access to the same pool of RAM, but in the picture it would be pretty hard to show that. There would be lines connecting all of the processors to each other and the pool of ram. For a more accurate view you should put the RAM inbetween not only the two CPU diagram, but the 4 CPU one as well. Each CPU knows where it's code is in RAM due to the CPU's being able to talk to one another directly. It is a very slick interface, and I was, quite honestly, surprised Intel didn't move to it when create Core (the first one). It has the potential to allow for a crazy amount of CPUs to be put on one systemboard (really the trace lines are your main limit other than power).

gnasher729 · Mar 19, 2008

diamond.g said:
Sadly L2 is shrinking in the next microarch. L3 will become the next big thing I guess.

That is a bit of a misunderstanding. The names "L1", "L2", "L3" stand for first level, second level, third level cache, they say nothing about the implementation. Like having a first, second, and third gear in your car.

So the thing that is called "L3 cache" on Nehalem is exactly the same thing that is now called "L2 cache". What is called "L2 cache" on Nehalem is something that doesn't exist on the current CPUs; a bit of cache that is faster than current L2 cache but not as big, and on the other hand it is bigger than current L1 cache but not as fast.

tenks · Mar 19, 2008

Nehalem will change everything. There is already a huge performance adv. with Core 2 over AMD's Phenom/Athlon64s...Just wait and see that gap widen even further.

The 2 advantages AMD had over Intel with its design, starting originally with Opterons/Athlon64s in 2001, were an integrated memory controller (vs. Intel's FSB) and Hypertransport...With Nehalem Intel will have all the goodies that made Core2 great and then all the goodies that made Opterons great. Quickpath, Intel's PR marketing name for CSI (common system interface) is a extremely high bandwidth 'pipe' that is used to DIRECTLY connect one cpu to another. This replaces the highly ancient FSB, which showed its weakness's and bottlenecks mainly when you added more than 1 CPU, ie a 2-socket mac pro system. And lastly the integrated memory controller will reduce latency because now the cores can talk directly to the RAM without first having to go through the aging FSB and then the north bridge (system controller). Also this will reduce the complexity of motherboard design and at the same time improving the layout of the board itself.

Now a lot of people won't understand why, but L2 cache and cache in general will be overall reduced compared to this generation/Core2. Why? Intel used huge L2 caches to make up performance lost by using the FSB. Instead of fetching data from memory, thus having to go through the FSb, Intel relied heavily on large cache's (they're connected directly to the CPU) to fetch data from. So basically large cache's were used to offset the pitfalls of not having an internal memory controller. Now that Nehalem has its memory controller integrated it no longer has a FSB nor the need for one and talks to the ram directly. No need for large cache in this case, they take up a lot of die space anyway. You will see Nehalem use large L3 caches, which is kind of an external cache that will link up all the cores...

Im putting things really simply and could go on forever why Nehalem will be so great...Just check out these slides from intel's presentation. They will answer a lot of your questions, especially how they link up together in 2 and 4 sockets. http://www.dvhardware.net/article25978.html

gnasher729 · Mar 19, 2008

DavidCar said:
Yes. So what would replace the System Controller in the diagram on this page:

http://www.apple.com/macpro/technology/processor.html

One processor chip is directly connected to one set of RAM chips. Nothing in between, straight connection between processor and RAM, so the connection is as fast as it can possibly b. The other processor chip is connected to a second set of RAM chips. And a very wide connection between both processor chips.

So whenever a processor needs to read from RAM, it determines which processor chip is directly connected to the RAM, and then either reads the data itself, or it asks the other processor chip.

You'd want the connection between the processor chips to be so fast that reading from the "wrong" set of memory chips doesn't make much difference in speed.

diamond.g · Mar 19, 2008

gnasher729 said:
That is a bit of a misunderstanding. The names "L1", "L2", "L3" stand for first level, second level, third level cache, they say nothing about the implementation. Like having a first, second, and third gear in your car.

So the thing that is called "L3 cache" on Nehalem is exactly the same thing that is now called "L2 cache". What is called "L2 cache" on Nehalem is something that doesn't exist on the current CPUs; a bit of cache that is faster than current L2 cache but not as big, and on the other hand it is bigger than current L1 cache but not as fast.

Um, L2 cache isn't going away with Nehalem. According to Intel L1 is the same as Core 2, L2 is smaller than Core 2's and L3 larger but it is shared amonst all processors (versus how each has it's own pool now). So basically just like AMD's Phenom.

DavidCar · Mar 19, 2008

diamond.g said:
All the CPU's have access to one pool of RAM. Intel's diagram is somewhat misleading.

You lost me at "pool of RAM." I don't see any infrastructure for pooling the RAM outside of the Nehalems and their QPI interconnects as shown in the diagram.

gnasher729 · Mar 19, 2008

diamond.g said:
Um, L2 cache isn't going away with Nehalem. According to Intel L1 is the same as Core 2, L2 is smaller than Core 2's and L3 larger but it is shared amonst all processors (versus how each has it's own pool now). So basically just like AMD's Phenom.

That's what I said: L2 has been renamed to L3, and there is a new thing with size and speed between old L1 and old L2/new L3, and called L2.

All the CPU's have access to one pool of RAM. Intel's diagram is somewhat misleading.

It isn't misleading at all. Each CPU socket connects directly to one set of RAM chips, which gives you fastest access time. The other CPU sockets can still access the same RAM, but they have to go through the CPU that is connected directly.

diamond.g · Mar 19, 2008

DavidCar said:
You lost me at "pool of RAM." I don't see any infrastructure for pooling the RAM outside of the Nehalems and their QPI interconnects as shown in the diagram.

Yeah I edited my post, gnasher is actually more correct.

DavidCar · Mar 19, 2008

gnasher729 said:
... Each CPU socket connects directly to one set of RAM chips, which gives you fastest access time. The other CPU sockets can still access the same RAM, but they have to go through the CPU that is connected directly.

So in the four Nehalem diagram, each memory block is at most one QPI removed from each processor. Cool. But I'm still wondering if OSX can manage it.

TMay · Mar 19, 2008

tenks said:
Nehalem will change everything. There is already a huge performance adv. with Core 2 over AMD's Phenom/Athlon64s...Just wait and see that gap widen even further.

The 2 advantages AMD had over Intel with its design, starting originally with Opterons/Athlon64s in 2001, were an integrated memory controller (vs. Intel's FSB) and Hypertransport...With Nehalem Intel will have all the goodies that made Core2 great and then all the goodies that made Opterons great. Quickpath, Intel's PR marketing name for CSI (common system interface) is a extremely high bandwidth 'pipe' that is used to DIRECTLY connect one cpu to another. This replaces the highly ancient FSB, which showed its weakness's and bottlenecks mainly when you added more than 1 CPU, ie a 2-socket mac pro system. And lastly the integrated memory controller will reduce latency because now the cores can talk directly to the RAM without first having to go through the aging FSB and then the north bridge (system controller). Also this will reduce the complexity of motherboard design and at the same time improving the layout of the board itself.

Now a lot of people won't understand why, but L2 cache and cache in general will be overall reduced compared to this generation/Core2. Why? Intel used huge L2 caches to make up performance lost by using the FSB. Instead of fetching data from memory, thus having to go through the FSb, Intel relied heavily on large cache's (they're connected directly to the CPU) to fetch data from. So basically large cache's were used to offset the pitfalls of not having an internal memory controller. Now that Nehalem has its memory controller integrated it no longer has a FSB nor the need for one and talks to the ram directly. No need for large cache in this case, they take up a lot of die space anyway. You will see Nehalem use large L3 caches, which is kind of an external cache that will link up all the cores...

Im putting things really simply and could go on forever why Nehalem will be so great...Just check out these slides from intel's presentation. They will answer a lot of your questions, especially how they link up together in 2 and 4 sockets. http://www.dvhardware.net/article25978.html

So what will we get with SSE4.2? I have some PC MCAD apps that will eventually take advantage of this update.

gnasher729 · Mar 19, 2008

DavidCar said:
You lost me at "pool of RAM." I don't see any infrastructure for pooling the RAM outside of the Nehalems and their QPI interconnects as shown in the diagram.

The link in Tenk's post is quite good. For a mid-level machine, you would have the following items:

1. Two four-core CPUs (possibly two eight-core CPUs).
2. Each CPU has three channels to connect to RAM.
3. Each channel can handle three RAM chips.
4. Both CPUs are connected with QuickPath.

So each CPU has three banks of memory, with up to three chips per bank. Nine chips per CPU, 18 chips total for the two CPU system. You'll reach the fastest speed by using 6 chips, one for each channel of each CPU. So each CPU can read from three RAM chips simultaneously. Then you can add more chips as needed (so a typical system would have 12 GB of RAM, with up to 36 GB using 2GB chips).

Each CPU can still access all the data, that's what the fast QuickPath connection is for. The CPU just has to pass the request to the other CPU through QuickPath, and get its results back quickly.

The only problem is that bandwidth depends on exactly where the data is. If you need to read six data items, then if you are lucky they are all in different RAM chips, and they can be read simultaneously. If you're out of luck, they are all in the same chip, so the six items have to be read one after the other. Most likely the data will be randomly distributed between chips or in such a way that sequential data is always on different chips.

gnasher729 · Mar 19, 2008

DavidCar said:
So in the four Nehalem diagram, each memory block is at most one QPI removed from each processor. Cool. But I'm still wondering if OSX can manage it.

The operating system doesn't even have to know about this. When a program needs to read memory from RAM, the processor figures out where the memory is automatically. MacOS X isn't involved in this at all.

diamond.g · Mar 19, 2008

gnasher729 said:
That's what I said: L2 has been renamed to L3, and there is a new thing with size and speed between old L1 and old L2/new L3, and called L2.

Somehow I didn't read it that way.

Too bad they don't mention how fast the L3 is going to be. But yeah, basically the same design as the Phenom.

It isn't misleading at all. Each CPU socket connects directly to one set of RAM chips, which gives you fastest access time. The other CPU sockets can still access the same RAM, but they have to go through the CPU that is connected directly.

Yeah, I realized my mistake there. Wouldn't this design be similar to NUMA?

diamond.g · Mar 19, 2008

gnasher729 said:
The link in Tenk's post is quite good. For a mid-level machine, you would have the following items:

1. Two four-core CPUs (possibly two eight-core CPUs).
2. Each CPU has three channels to connect to RAM.
3. Each channel can handle three RAM chips.
4. Both CPUs are connected with QuickPath.

So each CPU has three banks of memory, with up to three chips per bank. Nine chips per CPU, 18 chips total for the two CPU system. You'll reach the fastest speed by using 6 chips, one for each channel of each CPU. So each CPU can read from three RAM chips simultaneously. Then you can add more chips as needed (so a typical system would have 12 GB of RAM, with up to 36 GB using 2GB chips).

Each CPU can still access all the data, that's what the fast QuickPath connection is for. The CPU just has to pass the request to the other CPU through QuickPath, and get its results back quickly.

The only problem is that bandwidth depends on exactly where the data is. If you need to read six data items, then if you are lucky they are all in different RAM chips, and they can be read simultaneously. If you're out of luck, they are all in the same chip, so the six items have to be read one after the other. Most likely the data will be randomly distributed between chips or in such a way that sequential data is always on different chips.

3 channels, with 3 sticks of RAM per channel. So a single socket system could reference up to 18GB of RAM (with 2 GB sticks).
That is gonna be nice... I wonder where 9 RAM slots are gonna fit...

dgdosen · Mar 19, 2008

johnnyjibbs said:
This sounds great. I'm glad I just upgraded to a MacBook Pro. I'd have been waiting until some time in 2009 had I not jumped in now. By the time I need to upgrade again, the next design of the MacBook Pro will be well-oiled and well away from its problematic Rev A. And I'll have one these chips or the successors to it.

At least Apple has something new to put in its machines every few months.

Well, I'm on the mid-06 MBP, and I'll be perectly ready to jump on that Nehalem MBP, finally with an easy to replace HD, in January 09.

Bring it on Apple...

ChrisA · Mar 19, 2008

motulist said:
Intel sure do their best to keep these things confusing! In the old days, the chips in each CPU family started at a low number and when an improved chip came out the number went up or added a letter.

68000 cpu => 68010 => 68020
Now with intel we have
clovertown (= core2?) => penryn? (core2duo?) => nehalem?

I'm sure that last progression of CPUs is completely incorrect, and that's my point. There is absolutely zero logical sense in how intel names their CPUs.

There is a reason for this. Some years ago there was a court decision that said the you can not copyright or tradmark a part number like "68000". So after that Intel started with names rather then numbers. The first named CPU was "Pentium". If was the follow on to the 486. and would have been called "586". Hence the "Penti-" prefix.

irun5k · Mar 19, 2008

Merkuryy said:
Can Intel and AAPL really balance the gap between power saving and CPU power? I mean with lithium-ion battery, this is the main obstacle? just like the laptop speed can't have a breakthrough without resolving the hardrive speed problem

In the mobile world, this is my major issue. Given the choice, I'd rather make my Macbook or iPhone's battery last twice as long instead of making it twice as fast. I'm constantly having to charge my iPhone and I really don't feel like I use it that much. I'm definitely more annoyed by having to charge it all the time than I am about its processor speed (which is more than adequate.)

There are two ways to attack the problem- better batteries, or hardware that uses less power. The industry really seems focused more on keeping up with Moore's Law rather than increasing battery life. The market is willing to settle for 2-5 hours battery life, so the industry develops faster hardware while keeping the battery life (and prices) about the same.

El Phantasmo · Mar 19, 2008

ChrisA said:
There is a reason for this. Some years ago there was a court decision that said the you can not copyright or tradmark a part number like "68000". So after that Intel started with names rather then numbers. The first named CPU was "Pentium". If was the follow on to the 486. and would have been called "586". Hence the "Penti-" prefix.

if pentium was the "586", then Core was "686", Core 2 = "786" and Nahalem the "886"?

just curious.. i still remember when i had my 8086 and then the 486, then P3, P4 then big change to mac.. G4, G5, and now back to intel with a C2D ... hehe good times...

Intel Outlines Next Generation Processors (Nehalem) Due in Late 2008

macrumors G5

macrumors G5

macrumors 6502a

macrumors 68000

macrumors 68040

macrumors 6502a

macrumors 6502a

macrumors G5

Suspended

macrumors member

Suspended

macrumors G5

macrumors 6502a

Suspended

macrumors G5

macrumors 6502a

macrumors 68000

Suspended

Suspended

macrumors G5

macrumors G5

macrumors 68030

macrumors G5

macrumors 6502

macrumors member

Our Staff