Power5 with hyperthreading

spinner · Feb 26, 2003

Taken from Intel's website:

Hyper-Threading Technology has arrived on the desktop. Look for systems with the new Intel® Pentium® 4 Processor with HT Technology logo which your system vendor has verified utilize Hyper-Threading Technology. Performance will vary depending on the specific hardware and software you use. To carry the Intel® Pentium® 4 Processor with HT Technology logo, systems must have:

The Intel Pentium 4 processor at 3.06 GHz or higher
An Intel® chipset that supports HT Technology
System BIOS supports HT Technology and has it enabled
An operating system that includes optimizations for HT Technology
Click here for an overview of the Pentium 4 Processor with HT Technology.

I guess that I was under the impression that this technology runs into the same problem that some Mac apps do, they are not multithreaded or MP aware and do not see the extra speed advantages that they could. Now I am not an expert so if this is wrong please explain why so that we all can know.

DharvaBinky · Feb 26, 2003

quite right...

Originally posted by spinner
I guess that I was under the impression that this technology runs into the same problem that some Mac apps do, they are not multithreaded or MP aware and do not see the extra speed advantages that they could. Now I am not an expert so if this is wrong please explain why so that we all can know.

You're pretty much right. But when they say "Operating system optimized to use HT" they're really talking about "Operating system that supports MultiProcessing". This rules out Windows 95, 98, 98se, and winMe. Windows NT4, 2000(all versions), and xp(all versions) support at least 2 processors.

This is on one of my production servers. It's a Dell PowerEdge 2650 which supports dual P4Xeons (the original P4s to have HT).

Notice in the above image, that the Dell ServerManager software correctly identifies that there are two P4 Xeons at 2.4GHz in the system. Nice, right? Now look at the graph below it. That's a small capture from the Microsoft Performance Monitor. Notice that it shows that there are 4 processors being charted. This is because windows *assumes* that every "thread" available is because there is another processor to run it. Win2k3 is supposed to solve this problem by correctly identifying HT processors.

So yes... earlier versions of windows have the same single thread limitation that mac os 9 and earlier has. OS X, though, would be able to support HT with no modification. However, as I mentioned before, Apple would do well to prepare OS X ahead of time for SMT Technologies (Simultaneous MultiThreading, the generic name for the technology, HT is Intel's trademarked Marchitechture name) to prevent the same headaches I am experiencing right now from Wintel's slapshod adoption.

The big hit, for us, has been in the pocket book. Since windows incorrectly identifies the dual HT Xeons as quad processors, it messes with our licensing. All of our server products and components are licensed for unlimited users. So the big difference is the number of *processors* that they are licensed to run on. The specific OUCH for us has been the licensing costs for SQL server which runs on a per processor license. MS has stated that HT "virtual processors" count as processors for the licensing of SQL server 2000. OUCH, that doubled our projected database costs...

On the upside, though... We're noticing *significant* performance gains from the HT. Database access and script parsing/execution lend themselves well to multi-threading, especially since we run 22 "virtual instances" of our web app on each of our WebFarm machines (accessing a centralized database). If you turn off HT (which is possible to do in the BIOS), we lose about 30 speed. No small potatoes.

Look to see PPCs with HT showing up in server apps first where they'll get the most benefit. End users will get *some*, but since most end-user apps (word processing, games) are single thread apps, don't look to get too much extra bang from these systems when they make it to the consumer line... (this goes for dell using P4s with HT, too)

Dharvabinky

ddtlm · Feb 26, 2003

strider42:

The 970 will be faster not because of altivec and higher clocks (althought hose certainly help), but because its not making compromises for reliability. IBM themselves have talked about this.

You should share where IBM said this, because until you can do that I'll go on believing you're restating rumors. Myself and others have already told you that on 130nm tech, a Power4+ can run at 1.45ghz. Having the PPC-970 run at 1.8ghz sometime down the road most certainly does not imply that any of these uspecified changes where made to the core as you claim. For example, Intel's big-cache Xeon MP is available at only up to 2.0ghz despite the fact that it has the same core and same manufacturing tech as P4's clocking a full 50% higher. Is it really so hard for you to believe that adding large amounts of L2 and a second core, plus more conservative clock speed safety margins, could account for the small difference between 1.45ghz and 1.8ghz?

Now, I suppose you could claim that IBM did some "things" to make the PPC-970 faster in ways other than clockspeed, because of the nebulous idea that not as many unspecified "compromises for reliability" where made. One of these things would be the AltiVec unit. However IBM makes no mention of any other core changes or improvments in the sole official document I am aware of:

http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/A1387A29AC1C2AE087256C5200611780

reliability has a direct effect on speed. The pwoer4's are made to be absolutely bullet proof. I froget allt he technical details, but its absolutely affects how fast a processor is. IBM wasn't going for the fastest chip whenh they made the power4. They were going for the best server chip, which includes being about the most reliable chip on the market. This makes it more expensive and sacrifices speed. This is absolute fact. My argument is sound.

Your arguments have no value without proof.

Yes, but IBM has stated that the 970 would be faster than the power4 in some instances, I believe because of the priorities of design in the chip.

Priorities that allowed the higher allowed clockspeed and the AltiVec units.

MacRETARD:

The Intel XEON chips also have HT.

It's Xeon, not XEON.

ffakr:

8K of additional L1 cache (btw, I'd expect this to be instruction not data cache.. but what do I know)

The P4 does not have an instruction cache as normal chips would, instead it has a "trace cache" where decoded instruction streams are kept. The size of this is not clearly defined.

The power 4 is also designed to be a dual core processor and it keeps up with the 970 with only one of if its cores. It is important to note that any Power4 (unless intentionally crippled by IBM) would post nearly twice the scores of the 970 due to the dual cores.

Only in the right task.

ktlx · Feb 26, 2003

Re: quite right...

Originally posted by DharvaBinky
You're pretty much right. But when they say "Operating system optimized to use HT" they're really talking about "Operating system that supports MultiProcessing". This rules out Windows 95, 98, 98se, and winMe. Windows NT4, 2000(all versions), and xp(all versions) support at least 2 processors.

Mostly true. Windows XP Home does not support multiple processors while Windows XP Professional does. It is one of the roughly three significant differences between the two OSes.

Dont Hurt Me · Feb 26, 2003

I see it now, apple announces the the new g5! a 7457 g4 this summer and all that talk of the 970 goes out the window. Now what?

Telomar · Feb 26, 2003

Originally posted by ddtlm
You should share where IBM said this, because until you can do that I'll go on believing you're restating rumors. Myself and others have already told you that on 130nm tech, a Power4+ can run at 1.45ghz. Having the PPC-970 run at 1.8ghz sometime down the road most certainly does not imply that any of these uspecified changes where made to the core as you claim. For example, Intel's big-cache Xeon MP is available at only up to 2.0ghz despite the fact that it has the same core and same manufacturing tech as P4's clocking a full 50% higher. Is it really so hard for you to believe that adding large amounts of L2 and a second core, plus more conservative clock speed safety margins, could account for the small difference between 1.45ghz and 1.8ghz?

In an article published in Microprocessor Reports in 1999, IBM described its use of thicker gate oxides in the POWER4 processor to obtain a failure rate that is two orders of magnitude better than comparable processors from most other manufacturers. The cost of the thicker oxides is the reduced drive current of the transistor and consequently slower switching speed of the transistors on the POWER4 processor. In the case of the PowerPC 970, the processor does not need to meet similar reliability requirements as the POWER4 processor, and as a consequence, circuit and process technology can be tweaked to obtain higher performance by trading away the near-absolute reliability required by the POWER4 processor.

Just because you don't know it or because it isn't made easily available doesn't mean it isn't true.

ddtlm · Feb 26, 2003

Telomar:

Just because you don't know it or because it isn't made easily available doesn't mean it isn't true.

Not a very useful approach for separating fact from rumors.

Anyway, your quote of that microprocessor report fails to address the issue. I have never claimed that the Power4 is not conservatively designed and manufactured, I have claimed that there is no evidence that the PPC-970 has not inherited the same conservative features. To be clear: your quote says "can". It appears to be speculation.

RIP · Feb 26, 2003

WOW

Originally posted by ozubahn
The Power4/5 and derivatives are nice, but I'm going to take the really long view. I'm eagerly awaiting the day when we can laugh at Intel users with their 15GHz Pentiums, not because they are running at 15GHz, but because they still use system clocks. That's right, I want a nice new asynchronous Mac. A FleetZero based PowerBook, for instance, would be an excellent start.

(No, I am not waiting until we get asynchronous Macs before I upgrade, but I do think that's where we will be eventually.)

WOW

ddtlm · Feb 26, 2003

Telomar:

To continue my previous post, which I think is very unclear, I think that even if it is true that the PPC-970 is designed and produced without the thick oxides and whatever, then my whole arguement is still standing. Each time that I speak about the occasional performance edge of a PPC-970 over a Power4, I list clock speed as an advantage, which is covered by your quote. (Sure hope that was more clear, I can't talk for some reason.)

Telomar · Feb 26, 2003

Originally posted by ddtlm
Anyway, your quote of that microprocessor report fails to address the issue. I have never claimed that the Power4 is not conservatively designed and manufactured, I have claimed that there is no evidence that the PPC-970 has not inherited the same conservative features. To be clear: your quote says "can". It appears to be speculation.

It doesn't really bother me one bit whether you take it for speculation but that actually comes from comments made at the Microprocessor Forum. I can tell you it's quite true and what strider42 said is in fact quite correct that's just the only current public information (to my knowledge).

There are changes between the POWER4 and 970 that although not always major will allow for better clock speeds and scaling of the chip. You can either trust the fact that I have access to significantly more information that makes me certain on this or you can continue to choose to be arrogant and naive. But it'd be nice if you stopped misleading the boards with your ignorance on this matter. Sometimes you can get "proof" that is completely incorrect and sometimes you can't get public information on things that are true. That's just the way the world is.

ddtlm · Feb 26, 2003

Telomar:

You can either trust the fact that I have access to significantly more information that makes me certain on this or you can continue to choose to be arrogant and naive. But it'd be nice if you stopped misleading the boards with your ignorance on this matter.

Needless to say I am surprised to find myself accused of arrogance when I demand proof on forums that are every day flooded by wild speculation, heresay, and plain wrong notions.

There are changes between the POWER4 and 970 that although not always major will allow for better clock speeds and scaling of the chip.

Interestingly you will find that the original point of the argument between strider42 and myself was not about scaling at all, for example strider42 said:

The 970 will be faster not because of altivec and higher clocks (althought hose certainly help), but because its not making compromises for reliability. IBM themselves have talked about this.

Now, as you can see, noone has provided anything that IBM said about this, and all the advantages you speak of are clock speed related. So, where does that leave you and your accusations about me?

I can tell you it's quite true and what strider42 said is in fact quite correct that's just the only current public information (to my knowledge).

Are you sure that you know what strider42 is argueing? If you are, then perhaps you should address that, and not address clock speed scaling.

Additionally, there are other issues I have mentioned, such as the very narrow clock speed advantage that the rather small PPC-970 is projected to have over a Power4+ (which is shipping at 1.45ghz right now). I would be interested to know where all the clock speed scaling advantages are at, should you happen to know.

ewinemiller · Feb 26, 2003

Re: Re: quite right...

Originally posted by ktlx
Mostly true. Windows XP Home does not support multiple processors while Windows XP Professional does. It is one of the roughly three significant differences between the two OSes.

I think I read somewhere that while XP Home won't support two physical processors, it does support the two logical processors the hyperthreading presents. So you can install XP home on your 3.06 and it will use the hyperthreading.

ktlx · Feb 26, 2003

Re: Re: Re: quite right...

Originally posted by ewinemiller
I think I read somewhere that while XP Home won't support two physical processors, it does support the two logical processors the hyperthreading presents. So you can install XP home on your 3.06 and it will use the hyperthreading.

Ooops, you are correct. Windows XP Home will support two logical processors. My mistake.

KingArthur · Feb 26, 2003

Originally posted by Catfish_Man
In reply to a few of the posts here:
The 970 is single core without hyperthreading (or symmetric multithreading to be more accurate, hyperthreading is Intel's name for SMT). When the 970 goes to a .09 micron manufacturing process, dual core may become practical (.13 ->.09 cuts the transistor size in half) A POWER5 derivative seems fairly likely, and would have hyperthreading. The 4x performance boost from ht is widely regarded as bull**** marketing claims, but it should give a fairly good boost (especially if the POWER4 isn't using its execution resources effectively. The Alpha EV8 was going to get a huge boost because it had way more execution units than it could normally use, and ht allowed it to use them more effectively). Neither the POWER4 nor the POWER5 will be used in Macs (the POWER4 costs several thousand dollars per chip, I've heard $7000-$8000). The 970 seems almost certain to be used in Macs (targetted at the desktop, has Altivec, etc...).

Just FYI, Hyperthreading is Simultaneous Multithreading, not symetric multithreading. I believe you were mixing up the term with symetric multiprocessing (what the dual processors do)

Also, the Intel Xenon chip is actually a P4 processor with unique features not added to the regular P4. Hyperthreading was one of these, but now the P4 utilizes this, so there is one less reason to get a Xenon processor.

Simultaneous Multithreading (hyperthreading) is both a good thing and a bad thing. Tests done with the Xenon differ greatly in performance with SMT turnned on and off. The problem with SMT is that sometimes one of the two programs being processed consumes too much of the cache resources. Then, the other program is starved of resources. The Xenon/P4 chips only put limits on the processor's execution resources, not memory resources. Therefore, a process can only use up half of the processor's queue slots, but all of the memory registers. Thus, one process has nothing in the cache and can't fill the other execution slots (which is the advantage of SMT). B/c the operating system thinks of them as two processors, it doesn't budget the "time-slicing". "Time-slicing" gives a process so many cycles of the processor's undivided attention and memory resources and then removes all of it from the processor's memory registers and puts it into a higher level of memory, and then brings in the next process, which has undivided attention etc. With the SMT, though, the OS thinks that, since there are two processors, each processor must have its own resources. The processor isn't designed to budget the memory resources, and one process, although constantly looking through the memory registers for its next part to process, never recieves anything to process b/c the other is a hog.

Another thing, someone posted that programs would have to be rewritten to take advantage of SMT. Yes and No. The only reason a program would have to be written to take advantage of SMT is if it is wanting to process two of its own threads simultaneously. The WinXP OS is designed to Hyperthread two applications. Applications would have to be written if they want to Hyperthread themselves.

Long story short, unless a hyperthreaded Power4 derivative is designed to budget all resources, there will be the same short commings as the Xenon/P4.

Also, someone earlier posted that the 970 and Power5 have not been produced. That is a false statement. The 970 is already being prepared for distribution. The Power5 has been produced and already been used in a test machine. That isn't to say it is complete, but it has been produced and is being tested to work bugs out of it.

That is my 50cents worth (or whatever value you think what I had to say has). For more info on Hyperthreading, go to http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html , although I do not know if everyone will be able to comprehend what is said. Ars Technica is a valuable resource for people wanting to learn how processors work from simple explanations to complex ones.

ffakr · Feb 27, 2003

Originally posted by KingArthur
Just FYI, ...Also, the Intel Xenon chip is actually a P4 processor ....

FYI... It's a Xeon, not a Xenon. Xenon is a gas used in lighting. ;-)

Catfish_Man · Feb 27, 2003

Originally posted by KingArthur

That is my 50cents worth (or whatever value you think what I had to say has). For more info on Hyperthreading, go to http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html , although I do not know if everyone will be able to comprehend what is said. Ars Technica is a valuable resource for people wanting to learn how processors work from simple explanations to complex ones.

Agreed. I've read almost everything they've published, and it's all been useful (aside from a few old CPU specific things like overclocking 300MHz Celerons that I skipped).

GeneR · Feb 27, 2003

How will Hyper-Threading help rendering?

For me, I use FCP and After Effects most of the time. I do a lot of archiving of stock footage from DV and most of it has some tweaking aspect to it. You could say, I'm building a stock DV library for future shoots. But the rendering for all the different shots takes so darn long that I am often rather put off by the fact that my Mac is tied up for hours (but more often) days.

And that really sucks.

I am hoping that this will change if there are faster chips, but with all the talk about 970-this and 970-that I really do not know. How does it compare to, say, 1 GHz G4? Or even a 1.4 GHz Dual G4? I'm still waiting for the day when I can shoot footage, go back to the desktop, load the stuff in, archive it and move forward.

Any thoughts about speed performance in respect to rendering would be appreciated.

ffakr · Feb 28, 2003

Re: How will Hyper-Threading help rendering?

Originally posted by GeneR
I am hoping that this will change if there are faster chips, but with all the talk about 970-this and 970-that I really do not know. How does it compare to, say, 1 GHz G4? Or even a 1.4 GHz Dual G4?

Hard to say... until we get one.
Based on SPEC (which the G4 runs poorly), the 1.8 GHz 970 should be about as fast as a new Dual G4... that would be if you were running code that took full advantage of both processors.

I'd guess and estimate that a single 970 would probably keep up with a new dual (faster in some things.. especially where the bus was concerned).
It could be quite a bit faster based on what IBM is alluding to, based on the architecture and the new, higher clock announcements.

Now, what Apple really needs to do is release at least one dual 970 box a the introduction. It may be too much to expect all duals since the chips will be scarce at first.
Dual processor x86 boxes fetch a premium. If Apple can get a dual 970 out, they'd probably beat wintel (amd-tel) on price/performance and quite possibly raw performance.... one can only hope.

ffakr.

mathiasr · May 24, 2003

The latest news related to the POWER5.

The chip is still expected around 1.5 GHz in the first half of next year:
http://www.eweek.com/article2/0,3959,1093901,00.asp

Early samples boot under LinuxPPC64:
http://marc.theaimsgroup.com/?l=linux-ppc&m=104791921406815&w=2

IBM has already produced at multithreaded chip the RS64 IV:
http://www.research.ibm.com/journal/rd/446/borkenhagen.html

A good chronology of the POWER, RS and PowerPC families:
http://www.rootvg.net/column_risc.htm

Cubeboy · May 25, 2003

Hyperthreading (Simultaneous Multi-Threading) follows the same general rules of dual processors (since it is essentially two processors except they share one set of integer-math, floating-math, SSE and code-decode units), the better threaded a application is, the greater the performance increase between a hyperthreaded processor and and it's non-hyperthreaded variant.

Cubeboy · May 25, 2003

Regarding Prescott

The advantages of Prescott over Pentium 4 are as follows:
1) Higher clockspeeds (5+ GHzs compared to 3.2 GHzs)
2) Twice the L1 Data Cache (16 Kb compared to 8 Kb)
3) Larger L1 Trace Cache (16 Kb compared to 12 Kb)
4) Trace Cache Bandwidth (4 uOps/cycle vs 3uOps/cycle)
5) Twice the L2 Cache (1024 Kb compared to 512 Kb)
6) Better branch predictor (Better prefetch, higher IPC)
7) Lower Latency (Overall better performance)
8) Improved Hyperthreading (reduced collision of threads)
9) SSE3 (better performance in gaming, media, scientific apps)
10) Additional Write Combining Buffers

Their is also a good possibility of a faster system bus, the current P4's bus is already clocked 800 mhz fsb, Prescott's bus will probably be clocked at either 1066 mhz or 1200 mhz.

mathiasr · May 26, 2003

Re: Regarding Prescott

Originally posted by Cubeboy
The advantages of Prescott over Pentium 4 are as follows:
1) Higher clockspeeds (5+ GHzs compared to 3.2 GHzs)
2) Twice the L1 Data Cache (16 Kb compared to 8 Kb)
3) Larger L1 Trace Cache (16 Kb compared to 12 Kb)
4) Trace Cache Bandwidth (4 uOps/cycle vs 3uOps/cycle)
5) Twice the L2 Cache (1024 Kb compared to 512 Kb)
6) Better branch predictor (Better prefetch, higher IPC)
7) Lower Latency (Overall better performance)
8) Improved Hyperthreading (reduced collision of threads)
9) SSE3 (better performance in gaming, media, scientific apps)
10) Additional Write Combining Buffers

Their is also a good possibility of a faster system bus, the current P4 already have 800 mhz fsb, it probably be clocked at either 1066 mhz or 1200 mhz.

Who cares? AMD and Apple marketing hype will focus on just one point: their chips are 64 bits

mathiasr · May 26, 2003

Originally posted by Cubeboy
Hyperthreading (Simultaneous Multi-Threading) follows the same general rules of dual processors (since it is essentially two processors except they share one set of integer-math, floating-math, SSE and code-decode units), the better threaded a application is, the greater the performance increase between a hyperthreaded processor and and it's non-hyperthreaded variant.

Here are some pages about the why and how of multithreading:
http://www.slcentral.com/articles/01/6/multithreading/index.php

SMT is not as efficiant as MP (multiple fullblown CPUs) nor CMP (multiple cores on the same die, often sharing L2+ caches).
It's rather a way to keep all execution units busy, and actually do something while some units wait on datas that are missing in L1 or L2 caches.

Cubeboy · May 26, 2003

Re: Re: Regarding Prescott

Originally posted by mathiasr
Who cares? AMD and Apple marketing hype will focus on just one point: their chips are 64 bits

Considering the arguments in this thread and others about the Prescott, quite a few people.

Cubeboy · May 26, 2003

Originally posted by mathiasr
Here are some pages about the why and how of multithreading:
http://www.slcentral.com/articles/01/6/multithreading/index.php

SMT is not as efficiant as MP (multiple fullblown CPUs) nor CMP (multiple cores on the same die, often sharing L2+ caches).
It's rather a way to keep all execution units busy, and actually do something while some units wait on datas that are missing in L1 or L2 caches.

No SMT is not as efficient, IBM estimates that at most, it can only be 80% as efficient as MP. From a architectural standpoint, Hyperthreading (SMT) is an parallel set of registers and logic to allow a pair of virtual processors to operate independently on a single chip. As I've mentioned before, this is like having 2 cores except they share the same cache and set of integer math, floating math, SSE and code-decode units (which would explain why it isn't as efficient as MP or CMP). Thats why we see hyperthreaded variants of the P4 listed with having 2 logical processors while older P4s without hyperthreading are listed with a single logical processor.

Power5 with hyperthreading

macrumors regular

macrumors member

macrumors 65816

macrumors 6502

macrumors 603

macrumors 6502

macrumors 65816

macrumors member

macrumors 65816

macrumors 6502

macrumors 65816

macrumors 6502

macrumors 6502

macrumors regular

macrumors 6502a

macrumors 68030

macrumors 6502a

macrumors 6502a

macrumors regular

macrumors regular

macrumors regular

macrumors regular

macrumors regular

macrumors regular

macrumors regular

Our Staff