Intel's strategy... is it flawed? (eg. Gulftown)

VirtualRain · Oct 16, 2009

Intel's strategy with Nehalem (and it's 32nm derivative Gulftown) appears to be to develop an amazing server microarchitecture and then push that technology down into other segments such as the mainstream (Lynnfield) and mobile markets (Clarksfield).

The other interesting element to their strategy is that with the die shrink accompanying Gulftown, they've opted to use some of the added die real estate by upping the core count from 4 to 6 and upping the L3 cache from 8MB to 12MB (which is a complete waste of silicon if you ask me). What's also notable is that early reports of Gulftown have them clocked at a very conservative 2.4GHz.

While this might seem like a suitable approach to take, particularly if you are focused on making an ideal server chip, it doesn't seem to make much sense for mainstream computer user (or even workstation users), who will be inheriting this technology as the chip of the day in about 12-18 months.

I personally think that this strategy on increasing core count and cache and neglecting clock speed is going to ultimately cause Intel grief. It's already to the point where very few can benefit by upgrading their computer. If you are an average PC user running a Core2 processor, there's very little benefit to upgrading to Penryn or even Nehalem. Gulftown's just going to climb further up the diminishing returns curve.

If you are a pro user, such as those here, there's no compelling reason to upgrade from Penryn to Nehalem... and it looks like there will be even less reason to upgrade to Gulftown. Sure it's a 50% increase in core count, but how often is that actually useful and what are the tangible benefits?

I think Intel is going down a difficult path. In the old days, going from a 400MHz CPU to a 800MHz CPU made a huge difference to your computing experience. Today, going from a 2 core system to an 8 core system is virtually unnoticeable for most. Even a hard-core user really has to think long and hard to justify an Octo core over a quad because the situations where it will provide any benefit at all are fairly rare.

I see a period now where most users really have no compelling reason to upgrade their computer. This is not good for Intel.

If it was up to me, I would offer a line of Intel processors with lower core counts and less cache with higher clocks. Surely if Intel can make a 6 core processor with 12MB of L3 at 2.4GHz, they could make a dual-core processor with 2MB of L3 at a clock of 5GHz and maybe a quad-core with 4MB of cache that runs close to 4GHz. Which would interest you more?

dukebound85 · Oct 16, 2009

you assume software will not take adv of these cores

thats the direction software is taking

bruinsrme · Oct 16, 2009

You have to think much further than incremental steps.

Intel is going through a strange transition, where the new immersion and scanners are ahead of chip design. Therefor their design teams can literately put the cutting edge chips into prodcution sooner than anticipated, they don't have to wait.
Consumers will be offered a more powerful chip than before; pushing the gap of processor power further ahead of the applications taking advantage of the technology.
Pushing forward will mean more die per wafer, lower cost per die, being 2 to 3 generations ahead of any competition, ramp the newer factories up to full production and get them paid off and really start focusing on 15inch or 450mm processes.

One thing you forgot is, todays fabs making the state of the art processors will be chip set plants 2 to 3 years from now.
Remember when we all thought a 1G harddrive was a lot of space.
Same with processors, multicore will start being better utilized and before we know if 16 core will be common talk

VirtualRain · Oct 16, 2009

dukebound85 said:
you assume software will not take adv of these cores

thats the direction software is taking

I agree, but let's be honest... almost all software that lends itself to multi-threading is already there.

The only unrealized potential gains to be had are those that leverage OpenCL and the hundreds of cores in the GPU. But OpenCL actually makes the core count in the CPU less relevant. A high-clock dual-core processor with a high-end GPU and OpenCL enabled software would be more valuable than a 6-core CPU.

shadow1 · Oct 17, 2009

Higher clock speed can be an issue. I think 3.5 GHz is the limit for clockspeed in a processor. Remember the Netburst Mircoarchitecture that made clockspeed the main selling point. But by 3.8 Ghz the chips were too hot and the chips chewed up loads energy. Amd's lower clockspeed chips performed better than the higher clocked Pentium 4 chips.

dukebound85 · Oct 17, 2009

shadow1 said:
Higher clock speed can be an issue. I think 3.5 GHz is the limit for clockspeed in a processor. Remember the Netburst Mircoarchitecture that made clockspeed the main selling point. But by 3.8 Ghz the chips were too hot and the chips chewed up loads energy. Amd's lower clockspeed chips performed better than the higher clocked Pentium 4 chips.

lol what?

clockspeed has gone well past 3.5 ghz and 3.8ghz

Umbongo · Oct 17, 2009

VirtualRain said:
If it was up to me, I would offer a line of Intel processors with lower core counts and less cache with higher clocks. Surely if Intel can make a 6 core processor with 12MB of L3 at 2.4GHz, they could make a dual-core processor with 2MB of L3 at a clock of 5GHz and maybe a quad-core with 4MB of cache that runs close to 4GHz. Which would interest you more?

Everything you have written appears to be based on the assumption that with the move to 32nm it will all be 6 core processors. You incorrect. There will be dual core and quad core. Intel will push processor speeds as high as they need to retain hardware stability, appropriate thermal charastaristics, to compete in the market place and satisfy the customers who make up the major part of their business (hint: corporations). They could have had 4GHz Penryn and they even suggested as much early on, but AMD could not compete so what need was there?

Just because they can do something and the hardware would be better suited to things that doesn't mean it makes sense to do so from a business perspective.

gugucom · Oct 17, 2009

I think that Intel's strategy makes a lot of sense. It is classic tick tock and they execute it very nicely in my view.

2009 was a tock year with the introduction of the Nehalem micro architecture. I see Intel's innovation with the increase in bandwidth quite impressive. It is essentially Apple with their poor implementation of the RAM and PCIe design who messed up the product quality from Intel. You can buy competitive systems with 18 RAM sockets and four 16 lane PCIe slots.

2010 will be a tick year dedicated to the 32nm fab standard. So under normal conditions you would not expect to see radical changes. That Intel manages to increase the core count by 50% is a nice bonus in my view. I agree that the Gulftowns will most probably come in different clock speeds up to 3,3 GHz even if the initial clock speed is only 2,4 GHz.

The new technology will then come in 2011 with the Sandy Bridge micro architecture.

frimple · Oct 17, 2009

gugucom said:
I agree that the Gulftowns will most probably come in different clock speeds up to 3,3 GHz even if the initial clock speed is only 2,4 GHz.

I also agree with this. Initial testing results show it's possible to take it up to nearly 6GHz. I'm sure Intel is well aware of just how fast the processor can run within it's tolerances so the cock speed on the initial ES chips doesn't mean that's what it's going to be when the actually produce them.

I think everyone's getting a little ahead of themselves with these 6 core Gulftowns....

Tesselator · Oct 17, 2009

VirtualRain said:
Intel's strategy... is it flawed? (eg. Gulftown)

Which would interest you more?

The direction is determined by a fairly simple equation where the variables are probably more revealing than anything else. Simply speaking it's just cost, supply, and demand, within a publicly available technology. Each cost, supply and demand are actually elaborate and vast in scope. So is the topic of "publicly" released or "available" technology but that's such a speculative and expansive topic. 😉

Anyway, to answer your second question about what would interest me personally, it would be different technology all together first and foremost. After that though being stuck with the current tech base, I want to see both parallelism and clock speed progress significantly and incrementally. I want very much to have processors available with 64 physical cores operating at 6 or 8 gigahertz per core. 🙂

nanofrog · Oct 17, 2009

Umbongo said:
Everything you have written appears to be based on the assumption that with the move to 32nm it will all be 6 core processors. You incorrect. There will be dual core and quad core. Intel will push processor speeds as high as they need to retain hardware stability, appropriate thermal charastaristics, to compete in the market place and satisfy the customers who make up the major part of their business (hint: corporations). They could have had 4GHz Penryn and they even suggested as much early on, but AMD could not compete so what need was there?

Just because they can do something and the hardware would be better suited to things that doesn't mean it makes sense to do so from a business perspective.

Unfortunately, this is the case. They will only produce what makes financial sense towards the business/enterprise market. They won't develop totally separate technology for the consumer side, as it's too expensive. It's far more cost effective to apply systems engineering to the existing parts to develop a less expensive part family/ies. So the result isn't ideally suited.

The one thing that came out of Nehalem that will matter IMO, is the move to both QPI and the IMC when it is downlined to every family. We need to solve bottlenecks as much as increases in clock speed, as little software is dependent solely on the core (clock speed).

Tesselator said:
The direction is determined by a fairly simple equation where the variables are probably more revealing than anything else. Simply speaking it's just cost, supply, and demand, within a publicly available technology. Each cost, supply and demand are actually elaborate and vast in scope. So is the topic of "publicly" released or "available" technology but that's such a speculative and expansive topic. 😉

Anyway, to answer your second question about what would interest me personally, it would be different technology all together first and foremost. After that though being stuck with the current tech base, I want to see both parallelism and clock speed progress significantly and incrementally. I want very much to have processors available with 64 physical cores operating at 6 or 8 gigahertz per core. 🙂

It's complicated in a sense, but most of it is the development of a system that aims at the heart of their profit base. The business/enterprise market. That's why Nehalem got things like QPI and the IMC to begin with. It's a way to speed up the data streams. Eventually it will trickle down to the lowliest of parts, but not yet, as the Core i5 family proves (still using DMI as it's interface). Cost and family separation is the primary reasons IMO.

It's going to be awhile before we leave semiconductors it seems. I do recall a sub 10 nm process has been developed, and I'd be totally amazed if they scrap it, and move to another technology, such as optical. Those systems will have to be developed for the enterprise market first (really high end, and almost certainly first release as a supercomputer), before it ever trickles down to the more common systems, even in the enterprise lines. 🙁

voyagerd · Oct 17, 2009

I am definitely interested in more cores. Upping the clockspeed instead would just take us back to Intel’s days of the Pentium.

nanofrog · Oct 17, 2009

voyagerd said:
I am definitely interested in more cores. Upping the clockspeed instead would just take us back to Intels days of the Pentium.

Only if the core count is reduced. If core count stays the same or increases, as well as an increase in clock speeds, then you end up with a faster chip for both single and multi-threaded applications. 😀

When it happens, you pay through the nose though. For example, look at the SP and DP Xeons at the lowest to highest clocks in the Nehalem family. Ouch.
$284 - $999 for the SP versions, and it's worse for the DP models. 🙄 😛

voyagerd · Oct 17, 2009

nanofrog said:
Only if the core count is reduced. If core count stays the same or increases, as well as an increase in clock speeds, then you end up with a faster chip for both single and multi-threaded applications. 😀

When it happens, you pay through the nose though. For example, look at the SP and DP Xeons at the lowest to highest clocks in the Nehalem family. Ouch.
$284 - $999 for the SP versions, and it's worse for the DP models. 🙄 😛

Clock speed increases are always nice still. They usually come with process size decreases, like 45nm to 32nm. The problem is that higher clock speeds mean more power consumption and more heat produced. There is already a lot of heat produced because of all the cores.

goMac · Oct 17, 2009

I actually just sat in on a presentation by an Intel engineer on stuff like Nehalem and Gulftown the other day...

To answer your question...

It is going to be physically impossible to keep pushing Ghz. Intel tried to go for 4 ghz, and without extreme cooling methods, it just won't happen. Increasing in Ghz creates exponential heat increases, increasing cores creates linear heat increases, so cores are the best option moving forward.
Yes, software has trouble taking advantage of cores, and eventually we actually likely hit a wall of the number of cores software can take advantage of, which leads me on to the next point...
The biggest holdup in speed increases isn't actually the processor. Memory speeds have not increased as fast as processor speeds, and the speed penalties for going out to memory or GPU are awful and can stall CPU processing.
Intel already took the first step of dealing with this in Nehalem. On Nehalem the speed penalty for going out to memory is greatly reduced because the memory controller is now on CPU. Now instead of hoping over multiple buses to get to memory, memory is just a stones throw away from the processor. This is also why on the new Mac Pros, the memory has been moved physically closer to the processor. The length of the connections on the motherboard can make an impact, and they were reduced to lower time to memory.
The next problem is the GPU. In the future Intel wants to move the GPU onto the CPU for the same reason. The time for the processor to get over the PCI bus to your graphics card can be extreme in computer science times. The solution is to move your GPU onto the CPU so that they are right next to each other and the talk time is almost 0.
In the future, when you buy your 16 core CPU, it might come with one or two cores configured not as CPU's, but as GPU's. So you'd have a 14 core CPU/2 core GPU (with the GPU cores having many processing sub-cores as part of their silicon, much like a normal GPU.)
This does bring up the question of whether it's nicer to have a separate discretely upgradable GPU, but that's another topic....
Intel is talking about having a full feature GPU core on your CPU, not like Intel Integrated graphics.

Anyway, hope this helps a bit.

nanofrog · Oct 17, 2009

voyagerd said:
Clock speed increases are always nice still. They usually come with process size decreases, like 45nm to 32nm. The problem is that higher clock speeds mean more power consumption and more heat produced. There is already a lot of heat produced because of all the cores.

I know. Heat is the biggest reason why the clocks aren't higher, even when we see articles/posts,... as to how far they can be over clocked. It's too expensive for cooling (from what most are willing to pay), and it wouldn't comply programs such as Energy Star either. Corporations actually watch their power bills with computers, and wouldn't be that willing to buy substantial quantities if it were available.

So it's up to individuals to do so if they wish. Intel at least is offering that possiblity without intentional hinderances, and the board makers are willing to cooperate as well (allowing access to such settings in the consumer market). As there's not nearly the interest in such endevors in the enterprise market, the boards (particularly the DP boards) aren't OC friendly. No access to the settings, and so far, there's been no release of an enthusiast board for such systems (DP workstations for example, as the SkullTrail was for the 54xx parts).

goMac · Oct 17, 2009

nanofrog said:
So it's up to individuals to do so if they wish. Intel at least is offering that possiblity without intentional hinderances, and the board makers are willing to cooperate as well (allowing access to such settings in the consumer market).

It's not power bills that are the problem, it's the cooling. You just can't cool those chips enough to keep them at 4 ghz in any sort of supported configuration. It's just not possible.

I know overclockers can do it, but there is no way Intel can put out enough chips that can be supported in a 4 ghz configuration. And usually overclockers have much more advanced cooling systems. And again, at 4 ghz, you're going to be going way faster than your RAM can keep up with, which is why they're already having to increase cache sizes.

Power bills are honestly the smallest issue with going 4 ghz.

VirtualRain · Oct 17, 2009

Nice insights goMac!

However, I still think Intel's strategy (whether by choice or forced on them by physical limitations) is providing users with little reason to upgrade.

It's hard to believe, but for the average computer user, the move from 65nm to 45nm to 32nm will have netted very little increase in performance. The only ones benefiting from the last couple generations of Intel processors and likely the next couple generations are those with hard-core rendering/encoding workloads.

I use to upgrade my PC with every new Intel processor as did a lot of other computer enthusiasts. Now, few are upgrading. Sadly, a 6-core Gulftown at 2.4GHz is not an upgrade to someone running a 65nm Q6600 quad! 😱

Anyway, Apple can create compelling reasons to upgrade their computer line without significant improvements in CPU's but I feel sorry for the other PC manufacturers who are faced with trying to market this stuff as much improved. 🙁

EDIT: goMac... I think it's important to point out that Nehalem's move to an IMC was huge from an architectural perspective, but the real world gains are negligable... here's why... Back when the memory controller was on the NB and accessed across the slow FSB, Intel opted to workaround this high-latency situation by using massive L2 cache on the Core2 line of processors (8MB for quad cores). This meant that a cache miss was a rarity, masking the slow memory architecture. Now enter Nehalem with it's IMC, something AMD had been doing for a couple of generations. While this dramatically improves memory access, it's real advantage is that you can free up silicon that was previously spent on massive cache. However, for some odd reason I've yet to understand, Intel continues to spend large amounts of silicon real-estate on ridiculous L3 cache sizes. 8MB with current CPU's and 12MB with Gulftown! (Whereas I think AMD runs something like 1MB/core). That's insane for an IMC. It's proven that you can run your memory in single-channel, dual-channel, or tri-channel, and it won't make any difference... because the massive cache is masking any memory latency issues. Why Intel continues to use huge L3 cache sizes is a mystery to me.

Apparently cache errors are also one of the key factors in limiting clock speeds... and why on Nehalem, the cache runs on a separate power and clock plane from the cores. Thus smaller cache sizes would enable higher clocks.

goMac · Oct 17, 2009

VirtualRain said:
EDIT: goMac... I think it's important to point out that Nehalem's move to an IMC was huge from an architectural perspective, but the real world gains are negligable... here's why... Back when the memory controller was on the NB and accessed across the slow FSB, Intel opted to workaround this high-latency situation by using massive L2 cache on the Core2 line of processors (8MB for quad cores). This meant that a cache miss was a rarity, masking the slow memory architecture. Now enter Nehalem with it's IMC, something AMD had been doing for a couple of generations. While this dramatically improves memory access, it's real advantage is that you can free up silicon that was previously spent on massive cache. However, for some odd reason I've yet to understand, Intel continues to spend large amounts of silicon real-estate on ridiculous L3 cache sizes. 8MB with current CPU's and 12MB with Gulftown! (Whereas I think AMD runs something like 1MB/core). That's insane for an IMC. It's proven that you can run your memory in single-channel, dual-channel, or tri-channel, and it won't make any difference... because the massive cache is masking any memory latency issues. Why Intel continues to use huge L3 cache sizes is a mystery to me.

Apparently cache errors are also one of the key factors in limiting clock speeds... and why on Nehalem, the cache runs on a separate power and clock plane from the cores. Thus smaller cache sizes would enable higher clocks.

Yeah, this is a good point, for a lot of operations, 8 MB is more than enough cache. But if you're doing stuff like... working with HD video all off of memory (which is going to start happening more often once After Effects CS5 and whatever 64 bit Final Cut is going to be hit), 8 MB of cache is nothing. 🙂 The CPU is going to have to keep hitting main memory for more frames, and then ship them to the GPU.

If you're playing back off disk, like most consumers are, it's kind of a mute point in since disk is so much slower than main memory. Main memory speeds are going to be the least of your problems. But, as consumers start getting more and more RAM in their computers, and RAM speeds still can't keep up, the disparity is going to get worse. If you have a consumer machine with 32 gigs of RAM in 5 years, a CPU with 8 megs of cache is going to be struggling badly to keep up with everything you've got in RAM.

This of course assumes that we're going to have software that actually fills up 32 gigs of RAM, but considering I started with Macs back when the entire system fit in 2 megabytes, I wouldn't be surprised. Microsoft seems to be adept at finding ways to get Word for Mac to eat up more and more memory. 🙂

Also, 16 cores fighting over 8 megs of cache doesn't sound fun either.

But you're right, Harpertown is still very competitive with Nehalem. I still recommend Harpertown Mac Pros to people who are interested in buying a Mac Pro. I have a Harpertown at home and a Nehalem at work, and while the Nehalem is a bit better in dealing with virtual machines, my Harpertown is every bit as fast. But Nehalem is really Intel banking on the future, when there are a lot more cores fighting over a cache that's trying to handle a lot more memory.

IIRC the new i7's have 12 megabytes of cache, but the quad i7's only have 8 megs of cache because there wasn't enough room on the die, which could be another problem. More cores might mean less room on the die for cache, which means we'll need faster memory access.

nanofrog · Oct 17, 2009

goMac said:
It's not power bills that are the problem, it's the cooling. You just can't cool those chips enough to keep them at 4 ghz in any sort of supported configuration. It's just not possible.

I know this. See the above post. 😉

BTW, to me, even the cooling in an enterprise environement is tied to power useage (HVAC run harder, requiring more power, or even an upgrade to larger unit/s, using additional power). It all comes down to financial practicality.

Single systems (larger cases, not crammed in a rack), have more options, and even those are left up to the individual. There's possiblities, such as direct liquid cooling to the processors, such as IBM's newest Power chips (constructed into supercomputers), but it's not cheap enough yet. It would take a massive economy of scale to get it into end user systems. Again, cost is the primary limitation, and Intel really doesn't have enough incentive yet. Maybe if they hit a wall, and nothing else is ready tech wise. Timing will be critical, but I doubt such exotic cooling systems will ever make it to the mainstream.

goMac said:
I know overclockers can do it, but there is no way Intel can put out enough chips that can be supported in a 4 ghz configuration. And usually overclockers have much more advanced cooling systems. And again, at 4 ghz, you're going to be going way faster than your RAM can keep up with, which is why they're already having to increase cache sizes.

This is where I prefer to think tackling the system's bottlenecks are worth more to end users right now than many realize. Even for clock speeds to remain in the range they're at now. Feed the chips as fast as they can accept the data streams. It would speed things up, and not require any adjustments to cores. Interface OTOH... 😀 QPI and the IMC was a step in the right direction. If we can just get a drive tech fast enough at really low cost, the results will be amazing.

VirtualRain said:
I use to upgrade my PC with every new Intel processor as did a lot of other computer enthusiasts. Now, few are upgrading. Sadly, a 6-core Gulftown at 2.4GHz is not an upgrade to someone running a 65nm Q6600 quad! 😱

There's been less and less benefit from it. Even with faster chips, the bottlenecks haven't been addressed. QPI and the IMC was a big improvement. It could still use work, and particularly non volatile storage (drive tech).

VirtualRain said:
EDIT: goMac... I think it's important to point out that Nehalem's move to an IMC was huge from an architectural perspective, but the real world gains are negligable... here's why... Back when the memory controller was on the NB and accessed across the slow FSB, Intel opted to workaround this high-latency situation by using massive L2 cache on the Core2 line of processors (8MB for quad cores). This meant that a cache miss was a rarity, masking the slow memory architecture. Now enter Nehalem with it's IMC, something AMD had been doing for a couple of generations. While this dramatically improves memory access, it's real advantage is that you can free up silicon that was previously spent on massive cache. However, for some odd reason I've yet to understand, Intel continues to spend large amounts of silicon real-estate on ridiculous L3 cache sizes. 8MB with current CPU's and 12MB with Gulftown! (Whereas I think AMD runs something like 1MB/core). That's insane for an IMC. It's proven that you can run your memory in single-channel, dual-channel, or tri-channel, and it won't make any difference... because the massive cache is masking any memory latency issues. Why Intel continues to use huge L3 cache sizes is a mystery to me.

IMC and QPI are the right way to go, but it needs work, and the software still needs to catch up anyway. No need to tweak it just yet. Besides, it'll end up as part of the "improvements" made to future parts. 😛

With current parts, cache works out to 2MB/core. Now take into consideration Hyper Threading. That's were they derived the value from. Same core count, but 2x the threads, so additional cache is needed. It doesn't do much ATM, and HT seems to be too buggy to be worth using as it exists now (from what I've seen and read). But down the road, it will.

TheStrudel · Oct 17, 2009

goMac said:
This of course assumes that we're going to have software that actually fills up 32 gigs of RAM, but considering I started with Macs back when the entire system fit in 2 megabytes, I wouldn't be surprised. Microsoft seems to be adept at finding ways to get Word for Mac to eat up more and more memory. 🙂

If I had my druthers, all of the Pro Apps could use up to all but 2 GB of your RAM when set, but that's partly a 32-bit issue. That said, I think anything that uses media and lots of horsepower should be able to fill up that 32 GB of RAM.

Yes, I was one of those people who complained about CS4 not being 64-bit.

"Nobody works with file sizes larger than 3 GB anyway."

Nobody who doesn't have access to a printer that goes up to 54" wide, that is.

At any rate, good thread. I have to imagine that SSDs are getting close to the point where the storage bottleneck stops applying and we can worry about other things instead. Right now, everybody who isn't using OCZ Vertex or Intel X-25M drives is at the storage bottleneck.

I also kind of like the idea I heard before about a 2.5" or 1.8" SSD right on the logic board large enough to hold the OS for fast access. Maybe we'll see it in future chipset designs.

goMac · Oct 17, 2009

nanofrog said:
This is where I prefer to think tackling the system's bottlenecks are worth more to end users right now than many realize. Even for clock speeds to remain in the range they're at now. Feed the chips as fast as they can accept the data streams. It would speed things up, and not require any adjustments to cores. Interface OTOH... 😀 QPI and the IMC was a step in the right direction. If we can just get a drive tech fast enough at really low cost, the results will be amazing.

Sure, but the technical problem is that you're also going to hit a limit to how much data you're going to be able to stream from memory into one core. 🙂 The silicon has limits where you are going to start hitting data corruption.

The solution is to have multiple cores, each with their own memory bus. Just like with CPU's, it's easier to increase your RAM access speeds by adding buses in parallel instead of just pumping up the frequency of one memory bus.

So still, even if you say "make memory faster!", it still leads you to needing multiple cores. There isn't any escape. 🙂

And it's not like it's THAT bad. Even if software is written as single threaded on OS X (which is very rare these days), most of the underpinnings of the system are multithreaded anyway, which means you're probably still getting multicore performance out of a single core app. Hell, I wrote an app a few years back that I didn't do much in the way of explicit threading in, and with all the system libraries it had 16 threads.

nanofrog said:
There's been less and less benefit from it. Even with faster chips, the bottlenecks haven't been addressed. QPI and the IMC was a big improvement. It could still use work, and particularly non volatile storage (drive tech).

Drives are never going to be even able to remotely compete with RAM. Too complex of a design, too many buses in between. Even NVRAM is no competition to DRAM in terms of speed.

nanofrog said:
IMC and QPI are the right way to go, but it needs work, and the software still needs to catch up anyway. No need to tweak it just yet. Besides, it'll end up as part of the "improvements" made to future parts. 😛

I don't think anything about QPI needs work. The hardware is all ready to go.

nanofrog said:
With current parts, cache works out to 2MB/core. Now take into consideration Hyper Threading. That's were they derived the value from. Same core count, but 2x the threads, so additional cache is needed. It doesn't do much ATM, and HT seems to be too buggy to be worth using as it exists now (from what I've seen and read). But down the road, it will.

HyperThreading isn't really buggy at all. Because of the pipelining HyperThreading brings to the table you will see some performance enhancement, at least in my experience. It isn't an extreme performance enhancement, but any extra performance is a good thing.

HyperThreading has been around a long long time. Back in 2005 when I was playing with the Intel Mac Developer boxes they all had HT. I really doubt it's got bugs at this point. 🙂

Also, HyperThreading doesn't necessarily need more cache. Ideally, you're pipelining instructions working with the same data, so you shouldn't need to go back to memory. Of course this is a concern really for the developer. I don't know if stuff like GCD takes cache into account, but eventually the developer probably won't have to worry about this sort of thing as much.

TheStrudel said:
If I had my druthers, all of the Pro Apps could use up to all but 2 GB of your RAM when set, but that's partly a 32-bit issue. That said, I think anything that uses media and lots of horsepower should be able to fill up that 32 GB of RAM.

Yes, I was one of those people who complained about CS4 not being 64-bit.

"Nobody works with file sizes larger than 3 GB anyway."

Nobody who doesn't have access to a printer that goes up to 54" wide, that is.

Yeah, wish I could say more, but I have an NDA (apparently.) 🙂

TheStrudel said:
At any rate, good thread. I have to imagine that SSDs are getting close to the point where the storage bottleneck stops applying and we can worry about other things instead. Right now, everybody who isn't using OCZ Vertex or Intel X-25M drives is at the storage bottleneck.

I also kind of like the idea I heard before about a 2.5" or 1.8" SSD right on the logic board large enough to hold the OS for fast access. Maybe we'll see it in future chipset designs.

Yeah, as I said above, even with SSD, drives are still going to be a huge huge bottleneck. RAM is currently at 68264 megabits/sec transfer rate. An SSD is about 400 megabits at max. Still a huge huge disparity. 🙂

(Feel free to let me know if my above numbers are wrong. I just went to Wikipedia.)

nanofrog · Oct 17, 2009

goMac said:
Sure, but the technical problem is that you're also going to hit a limit to how much data you're going to be able to stream from memory into one core. 🙂 The silicon has limits where you are going to start hitting data corruption.

Not that much. It comes down to the controller design/s (IMC or QPI+core controller).

goMac said:
The solution is to have multiple cores, each with their own memory bus. Just like with CPU's, it's easier to increase your RAM access speeds by adding buses in parallel instead of just pumping up the frequency of one memory bus.

Increasing the clock on NVRAM is part of it, but it does have a limit. I'm thinking more in terms of parallelism of serial data. For example, the Flash cards (PCIe devices) are using what are essentially smaller drives run in parallel off of multi channel controllers. The Colossus is such an example as a SATA drive. It can be done both ways. But the PCIe bus in the X58 chipset contains more lanes (= greater throughput potential, but will depend on the exact chipset and/or if another controller is added, such as an nF200) than the 6x SATA ports in the ICH10R for example.

The busses still have limits, and can't be exceeded, so the semiconductor drives are fixed to the bus max. If you hit the limit (where things end up shared), the chipset has to switch it. This could be addressed, but likely won't due to fixed transistor counts/cost per unit to keep yields high.

goMac said:
And it's not like it's THAT bad. Even if software is written as single threaded on OS X (which is very rare these days), most of the underpinnings of the system are multithreaded anyway, which means you're probably still getting multicore performance out of a single core app. Hell, I wrote an app a few years back that I didn't do much in the way of explicit threading in, and with all the system libraries it had 16 threads.

It will depend on the app though. From what I'm familiar with, most of it (in general, not necessarily for a specific OS), is still single threaded. Either it wasn't written for it, and SL could potentially help it out, or it can't be at all. I certianly wouldn't want such an API system to multi-thread things like word processor documents, or .pdf files (data sheets for example). 😛

Everything I've ever used that was multi-threaded, was developed that way.

goMac said:
I don't think anything about QPI needs work. The hardware is all ready to go.

Think of the ICH10R. 😉 Currently, it's throttled to ~660MB/s on the Nehalem systems. I had been hoping it was just the SP systems, but apparently it seems the DP units are affected as well. But both also only use a single chipset. Perhaps the second chipset would solve it, as QPI is doubled. So what I'm referring to, is just widening QPI to prevent such issues. Die size may have been the limiting factor, but Gulftown is slated to use the same chipsets without a redesign (makes financial sense for Intel).

goMac said:
HyperThreading isn't really buggy at all. Because of the pipelining HyperThreading brings to the table you will see some performance enhancement, at least in my experience. It isn't an extreme performance enhancement, but any extra performance is a good thing.

HyperThreading has been around a long long time. Back in 2005 when I was playing with the Intel Mac Developer boxes they all had HT. I really doubt it's got bugs at this point. 🙂

It seems to be having issues with the Nehalems from what I've run into, and what I've seen posted around. People are opting to shut it off if possible. I'm guessing, as I've never coded it, but I presume that the software containing HT isn't taylored for the new architecture (same name for the tech, but not exactly the same implementation in the hardware as the P4).

goMac said:
Also, HyperThreading doesn't necessarily need more cache. Ideally, you're pipelining instructions working with the same data, so you shouldn't need to go back to memory. Of course this is a concern really for the developer. I don't know if stuff like GCD takes cache into account, but eventually the developer probably won't have to worry about this sort of thing as much.

IIRC, the HT in the Nehalem can also allow separate instructions (say different programs) to a single core when it's not loaded. This is the hardware difference from the P4 implementation that I seem to recall. They were trying to find ways to use cores that weren't loaded heavily (assuming all were running something).

goMac said:
Yeah, as I said above, even with SSD, drives are still going to be a huge huge bottleneck. RAM is currently at 68264 megabits/sec transfer rate. An SSD is about 400 megabits at max. Still a huge huge disparity. 🙂

(Feel free to let me know if my above numbers are wrong. I just went to Wikipedia.)

SSD can hit over 400Mb/s. Way over actually. There's been an SSD developed that can hit 8000Mb/s. And that's on current NAND Flash. There's other types in the works that haven't hit the supply chain yet, that will make current Flash look like granny stuck in super glue. 😱 😛 DRAM is a possible solution now, but it's expensive and it requires a battery (not that trustworthy IMO).

Keep in mind, that the disparity can also be mitigated by running multiple channels, and is what the faster SSD drive controllers are already doing. At least those from Intel, Indilinx, and Samsung are.

There are solutions, but they've not gotten the costs down enough to allow for wide spread acceptance as of yet. The SSD market still needs time to mature. This is more of the point I was referring to. When the average computer ships with SSD as standard for example. Then the enthusiast/performance users can actually obtain drive throughputs that can blow our minds (compared to what we're used to now).

VirtualRain · Oct 18, 2009

While there's no arguing that there are a ton of bottlenecks elsewhere in the system to be addressed, the key issue at hand here, is whether there's much in the last couple of CPU offerings or the next couple coming down the pike that will compel people to upgrade their systems.

I maintain that central processing has hit a plateau and stalled and I'm not hearing any arguments to the contrary. We need more clocks. Or at least more aggressive turbo boost.

A quote from Anandtech...

If there's anything Clarksfield shows us, it's that we really want Arrandale - sooner rather than later. Quad-core processors simply aren't a major need for the vast majority of laptop users, and the higher prices and higher maximum TDP make them less desirable. Arrandale should do a great job at addressing both of those shortcomings, and the same goes for Clarkdale on the desktop. Lower power dual-core processors with Hyper-Threading will still have the ability to run four simultaneous threads, and they should also be able to run at higher clock speeds most of the time.

From their Clarksfield preview.

They want less cores and more clocks too.

nanofrog · Oct 18, 2009

VirtualRain said:
While there's no arguing that there are a ton of bottlenecks elsewhere in the system to be addressed, the key issue at hand here, is whether there's much in the last couple of CPU offerings or the next couple coming down the pike that will compel people to upgrade their systems.

I maintain that central processing has hit a plateau and stalled and I'm not hearing any arguments to the contrary. We need more clocks. Or at least more aggressive turbo boost.

For consumer parts, yes. Enterprise, no, as their aimed at this market (i.e. racks full of VM systems).

Also, that article is aimed at the mobile market, not desktop or enterprise. 😉 The dual vs. quad core for a desktop is arguable, as some do more multi tasking than others, and possibly a few run multi-threaded apps often enough it makes sense for them, without a battery to worry about (run times). I do think most can hum happily along on a dual core, but there are some exceptions. Laptops are another issue however, as there is always battery life to contend with, and compromises must inevitably be made.

Intel's strategy... is it flawed? (eg. Gulftown)

macrumors 603

macrumors Core

macrumors 604

macrumors 603

macrumors member

macrumors Core

macrumors 601

macrumors 68020

macrumors 6502

macrumors 601

macrumors G4

macrumors 65816

macrumors G4

macrumors 65816

macrumors 604

macrumors G4

macrumors 604

macrumors 603

macrumors 604

macrumors G4

macrumors 65816

macrumors 604

macrumors G4

macrumors 603

macrumors G4

Our Staff