Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Offf that topic, you're saying that a Harpertown (2008) chip is clock for clock slower than the 2009 Nehalems? That very much interests me. Describe that a bit for my education if you can. . .

http://realworldtech.com/page.cfm?ArticleID=RWT040208182719 has all the info you want and more. Nehalem makes quite significant changes to the cache structure, instruction latencies, memory system, and other parts of the chip, so it's understandably significantly faster.

As has been mentioned before, there's very little there (or in most chips) that an operating system can be "tuned" to take advantage of. Hyperthreading-aware scheduling and the few cases where SSE4.2 are relevant are the only ones I can think of. Mostly it just runs the same code faster.
 
http://realworldtech.com/page.cfm?ArticleID=RWT040208182719 has all the info you want and more. Nehalem makes quite significant changes to the cache structure, instruction latencies, memory system, and other parts of the chip, so it's understandably significantly faster.

As has been mentioned before, there's very little there (or in most chips) that an operating system can be "tuned" to take advantage of. Hyperthreading-aware scheduling and the few cases where SSE4.2 are relevant are the only ones I can think of. Mostly it just runs the same code faster.

The pipelines are significantly different (based on your link). Compilers could easily be tuned to take advantage of larger speculative buffers (i.e.: re-order buffer, retirement rf, etc.). I'm a former AMD (K8) CPU designer, so I don't know the ins and outs of the microarchitecture of the various Intel chips, but typically what you'd find from our stuff is that from uArch to uArch we'd change:

- the number of things that, at peak, could be done simultaneously. For example, we might have three ALU's in one chip, and three-and-a-half in the next. Or we might increase the number of registers, the depth of re-order buffers, etc.

- the number of cycles that it takes to do something. This is simplest to understand in terms of math functions - in an earlier chip it might take 6 cycles to do a 32-bit integer multiply, and in the next we might do a 64-bit integer multiply in 4 cycles.

Another place this often showed up is in memory latency, cache miss penalties, etc.

- the quality of speculation. Branch prediction algorithms would change, cache algorithms would be tweaked, etc., to try and increase the likelihood that data would be where we wanted it, and decrease the likelihood of having to pay a big penalty. Trace caches, etc. also fall into this category. Somewhat related is eliminating the cost of bad speculation. HT is one example.

- the amount of time it takes to do one cycle - i.e.: clock speed.

Nehalem seems to have changed all of these things, and thus I would expect it is quite possible to tweak an OS to take advantage of it. A recompile, alone, would probably have a 10% effect. (CPU architects always say everything has a 10% effect :)
 
Nehalem seems to have changed all of these things, and thus I would expect it is quite possible to tweak an OS to take advantage of it. A recompile, alone, would probably have a 10% effect. (CPU architects always say everything has a 10% effect :)

so there could be a nehalem-specific advantage to running snow leopard (not on harpertowns) that isn't chalked up to the newer chips being more efficient across the board? ... something software-based that justifies the assumption that SL will be optimized for nehalem chips
 
(edit: this is in response to cmaier)

Yeah, I was oversimplifying obviously, but a 10% win for recompiling with -march=nehalem or whatever the flag is for it vs -march=core2 still strikes me as really unlikely. Many of Nehalem's changes make it more permissive of suboptimal code (removing a lot of the unaligned load penalty, and decreasing memory latency, for example).

Also I doubt they'd do that anyway in most cases, since it would result in either less than optimal code for the more common Core 2, or with -msse4, code that wouldn't even run on it.

So, for the record, I do not expect that the ratio between SL and L in speed will vary all that much in most cases between Core 2 and Nehalem. If, with hyperthreading disabled, SL is an additional 10% faster on Nehalems, I'll be very surprised.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.