IBM PowerPC Announcement

jettredmont · Oct 15, 2002

Originally posted by DharvaBinky

I can't see IA64 making a break for consumer space any time soon.

Intel's roadmap includes Deerfield, a desktop "value" IA-64 processor based on Itanium-3. That doesn't mean that Deerfield will replace P4 (I doubt it will, especially as IA-64 and IA-32 programs can't mix and Banias, IA-32-based mobile processor, is planned to be alive and kicking for a few years). I don't see anything on Intel's roadmap for IA-32 beyond next year's Prescott/P5 and Banias. So, if Deerfield doesn't look to fulfill it's mission, Intel's going to have to revise their roadmaps.

There is a lot of resistance to the instruction set in the developer community since it not only requires different instructions, but a whole different "kind" of programming. The IA64 ISA is a brute force method to processing, to quote one analyst "Smart Compiler, Dumb Processor".

Yes, that is the problem with EPIC/VLIW and compiler-driver Out-Of-Order-Execution (OoOE): it places much more responsibility on the compiler than ever before. Some say compilers will never be smart enough to do the job. On the other hand, if the P4 can do it in hardware with a very short code visibility window, then there is no reason that a compiler shouldn't be able to do as good a job given a near-infinite theoretical visibility window. And that's just the first step. In theory at least, the compiler should be able to do a much better job at determining which instructions can be put out of order than the processor ever could.

Additionally, I wouldn't look to see IA64 go anywhere outside of the high-end (for Intel) server market ever. There are confirmed rumors of a project at Intel called "Yamhill" which seeks to produce a 64-bit extensions to the IA32 ISA, similar to the approach that AMD is taking. Intel, ever conflicted about goals beyond pocketbook padding, has stated that AMD's approach (64-bit CPUs that are backward compatible to 32-bit apps) is undesirable since you don't gain 100% of the benefits of 64-bit addressing.

Yes, the "halfway approach" is undesirable to Intel. The main reason is that IA-32 is an albatross around its neck and Intel would love to cast it off as cleanly as Apple did the 6800x0 instruction set. Remember that IA-32 is an extension of IA-16 which is an extension of IA-8. The chain has to end somewhere, and someone's going to be a bit pissed when it does. Designing IA-64 with a clean seamless IA-32 compatibility would have drastically reduced the potential life of IA-64. So Intel went with a "pure" 64-bit effort to develop what we know as IA-64/Itanium. AMD countered with the shorter-sighted Hammer extensions, which might be enough to get the job done today but likely won't be able to compete a decade from now.

However, they can see that software vendor resistance to *completely* rewrite their code (there is no simple "porting" in the Itanic world) may eventually cause them to change their minds.

Now, I haven't ported anything to IA-64, but my understanding is that the port is fairly straightforward, although today's compilers aren't smart enough to be able to get the OoOE efficiency P4s obtain. The programmer should not have to define "safe" simultaneous streams; that is the compiler's job.

That, of course, assumes that eventually (and before your product X needs to be ported) compilers come around to enough sophistication that they can produce semi-efficient IA-64 instructions.

Personally, I write cross-platform software. On the C-and-above levels of programming, there is no difference between coding for a RISC processor and a CISC instructon set, and there should not be any difference for an EPIC/VLIW instruction set with explicit OoOE. Granted, any instruction set is going to look different at the assembler/machine levels, but there are very few people who have to be proficient at those levels for a given platform (ie, the compiler people, and the game engine people; any C-level programmer should be able to learn enough about the underlying assembly/machine language to tune a bottleneck if absolutely necessary).

Once there is an EPIC/VLIW compiler that does a better job than the P4 does in hardware, "porting" software from IA-32 to IA-64 will be relatively easy for developers. Compared to recent efforts (Y2K, Internet, OS 9 to OS X, etc), this hardly registers as a blip on the screen of programming effort.

On the consumer side, however, one will have to expect consumers to re-buy everything for the new platform. While Apple has a machine-specific packaging and install procedure well in place with OS X, on the Windows side the good folks at InstallShield will have to configure their compiler to install 32-bit code on IA-32 machines and 64-bit code on IA-64 machines so that developers can release dual-platform software. If developers don't release dual-platform software (ie, both install from a single disk, bought at the same price as the old IA-32 software), a consumer revolt is not just likely but almost assured.

jettredmont · Oct 15, 2002

Originally posted by kenohki

Why would you totally need to rewrite your code? If you're working in a high level language and using the Win32 APIs or writing portable *nix code, seems to me that you'd just need tweaking, optimization, rewriting any assembler (which you'd need to do with any architecture change) and a recompile, not *completely* rewriting your code. Add in the .Net framework and you're further abstracted. The Win32 API set was meant to be portable from the start. Thus the versions for Alpha, MIPS, and PowerPC. And Windows exists on IA64 as does Linux and Monterrey. So seems to me like the majority of the hard work has already been done.

The problem isn't in getting your C code to EPIC instructions; it is in getting them to efficient EPIC instructions. While the P4 does a lot of on-the-fly optimizations to the execution stream, EPIC/VLIW as implemented on the IA-64 does not. All out-of-order-execution decisions are left to the compiler. Which makes the processor much simpler and allows for complexity in other areas of the processor (or more L1 cache), but means that compiler programmers have to do a lot of extra thinking.

Yes, without doing any real hard thinking on the matter, one can write a straightforward C-to-Assembly compiler for IA-64. And, I strongly suspect, that is just a bit below what is out there right now. But taking advantage of the explicit execution ordering from the compiler will take time. Current comilers might take C-level programmer "hints" as well, which shouldn't be required but would make getting a compiler out the door a much quicker process. If this is the case, then, yes, the C programmer has to think about the fact that they are not in IA-32 world any more.

As for Win32 being portable ... have you looked at MS's scant non-IA-32 documentation? The shell of Win32 is available on other platforms, but there are enough "gotchas" and missing pieces that for a Windows developer to rely on Win32 to provide all hardware abstraction is downright foolish. Granted, those pieces might be filled in and the "gotchas" largely eliminated, but I don't think that MS is putting that much effort into Win32 on different platforms. Which, yes, is where .Net comes in.

alex_ant · Oct 15, 2002

Re: Re: Re: What if its a completely new pro product?

Originally posted by Catfish_Man
I'm not so sure that this chip is good for the XServe. There's a reason why x86 1Us are still using P3s. The PPC970 is NOT being designed for low power consumption (according to anything I've heard), so for now it's only going to be useful in the PowerMacs. A little later when they transition it to .09 micron... we might start seeing TiBooks or XServes with it (they're pretty similar, actually, and have pretty similar requirements).

With a Celeron-size die, a 1.8GHz clock, and the manufacturing processes IBM will be using, I will be surprised if this chip has power requirements much beyond the G4's.

kenohki · Oct 15, 2002

Originally posted by jettredmont

The problem isn't in getting your C code to EPIC instructions; it is in getting them to efficient EPIC instructions. While the P4 does a lot of on-the-fly optimizations to the execution stream, EPIC/VLIW as implemented on the IA-64 does not. All out-of-order-execution decisions are left to the compiler. Which makes the processor much simpler and allows for complexity in other areas of the processor (or more L1 cache), but means that compiler programmers have to do a lot of extra thinking.

Yes, without doing any real hard thinking on the matter, one can write a straightforward C-to-Assembly compiler for IA-64. And, I strongly suspect, that is just a bit below what is out there right now. But taking advantage of the explicit execution ordering from the compiler will take time. Current comilers might take C-level programmer "hints" as well, which shouldn't be required but would make getting a compiler out the door a much quicker process. If this is the case, then, yes, the C programmer has to think about the fact that they are not in IA-32 world any more.

And not having directly worked with IA64, I'm ignorant on how efficient the compiler is. I had assumed that it was pretty whiz-bang from all the whitepapers I've seen on it. I mean, it does speculative loading and predication so I figured it was probably bleeding edge and had been enhanced by both Intel and HP during all the delays of the first Itanium. One would think that the compiler would be a major focal point of the project.

As for Win32 being portable ... have you looked at MS's scant non-IA-32 documentation? The shell of Win32 is available on other platforms, but there are enough "gotchas" and missing pieces that for a Windows developer to rely on Win32 to provide all hardware abstraction is downright foolish. Granted, those pieces might be filled in and the "gotchas" largely eliminated, but I don't think that MS is putting that much effort into Win32 on different platforms. Which, yes, is where .Net comes in.

Yes, I know how bad it is. But they also don't support any architectures other than IA32 anymore so the documentation has stopped. Alpha was the last one to be dropped. However, one of the original design goals of NT was to provide a portable environment. Back in the early days of NT and Win32 (when other microarchitectures were supported), there was a big push to make things portable between processors. I'm sure this was because at that time, RISC was shiney and new and MS didn't want to be left out of the game had Alpha maintained it's lead or PowerPC lived up to the promise of the PPC 620.

In today's environment though, I would assume MS is making sure that code is portable from Win32 to Win64 or whatever they're calling that version of Advanced Server.

nixd2001 · Oct 15, 2002

Originally posted by kenohki

And not having directly worked with IA64, I'm ignorant on how efficient the compiler is.

I likewise have no direct knowledge. However, getting the sort of optimisations that Intel require has long been very difficult. To this end, though, it's worth noting that Intel purchased a company called Kuck & Associates a couple of years ago. These dudes are some of the best at performing optimisations - mainly as source->source transformations for simple compilers to handle. Strip mining, now a common technique to avoid data caching beating, is something they bought to the market after hiring the dude (Wolf?) who did his PhD in this area, following on with Superoptimising Supercompilers as a book.

In short, I don't think Intel are there yet, but they've got a lot of the right bits to stand a chance of getting there.

(alex_ant - anything you can chip

eek

in here re K&A., etc?)

alex_ant · Oct 15, 2002

Originally posted by nixd2001
I likewise have no direct knowledge. However, getting the sort of optimisations that Intel require has long been very difficult. To this end, though, it's worth noting that Intel purchased a company called Kuck & Associates a couple of years ago. These dudes are some of the best at performing optimisations - mainly as source->source transformations for simple compilers to handle. Strip mining, now a common technique to avoid data caching beating, is something they bought to the market after hiring the dude (Wolf?) who did his PhD in this area, following on with Superoptimising Supercompilers as a book.

In short, I don't think Intel are there yet, but they've got a lot of the right bits to stand a chance of getting there.

(alex_ant - anything you can chip eek in here re K&A., etc?)

I think you would know much more about the Itanium than me... I used to call it the Sh-itanium, but with these latest McKinley SPEC scores, I feel humbled. Whatever Intel/HP is up to, their compiler must be a work of art.

(Isn't the Microprocessor Forum happening today? Where's all the juicy news already?!?!

)

nixd2001 · Oct 15, 2002

Originally posted by alex_ant
I think you would know much more about the Itanium than me...

I was thinking of Kuck & Associates. But don't worry - minor part of the thread.

bacon · Oct 15, 2002

Re: Re: Great, 3 years too late

Originally posted by arn

Please read the rest of this thread.

Quick question: Would you rather have a higher MHz number or Faster Performace?

Remember, the IBM PowerPC is a different architecture... you can't compare the new chip to current Motorola G4's. You certainly can't extrapolate Mhz.

The 1.3 GHz IBM Power4 benchs close to the 2.8GHz Pentium according to numbers cited in this thread.

arn

So, the 1.3 GHz Power4, which is really 2 processors each running at 1.3 GHz, running effectively at 2.6 GHz benches close to the Pentium 2.8GHz chip? One would hope.

nixd2001 · Oct 15, 2002

Re: Re: Re: Great, 3 years too late

Originally posted by bacon

So, the 1.3 GHz Power4, which is really 2 processors each running at 1.3 GHz, running effectively at 2.6 GHz benches close to the Pentium 2.8GHz chip? One would hope.

If you're thinking of the SPEC benchmarks, I thought they specifically only used one processor core, so no, not 1.3*2=2.6....

alex_ant · Oct 15, 2002

ignore

(edit: redundant)

Telomar · Oct 15, 2002

IBM POWER4 based processor. - PPC 970

Target process - 0.13um SOI 8 layer Cu interconnect
Target frequency 1.4 ~ 1.8 GHz
Target sample date 2Q03
Target ship date 2H03
Power 42W @ 1.8 GHz 1.3V (low power mode @ 1.1 V)

Est. SPEC INT 937 @ 1.8 GHz
Est. SPEC FP 1051 @ 1.8 GHz

Chip features: Elastic unidirectional point-to-point interconnects between CPU and "companion chip" (i.e. memory controller/northbridge) Elastic link may run "UP TO" 900 MHz, offering 6.4 GB/s of memory bandwidth.

POWER4 internals. Max of 8 inst fetch per cycle, 8 inst issue per cycle, and 5 (4 + branch) inst dispatch per cycle.

32 KB L1 Dcache, 64 KB L1 Icache.
512K L2.

32 64 bit GPR (general purpose register)
32 64 bit FPR (float)
32 128 bit VRF (vector)

Note that that vector unit uses separate registers and is altivec compatible.

Or you can just go here. There is more infor out there but I am too lazy to grab it. There is a good thread over at Ars Technica.

*waits for the complaints to start*

alex_ant · Oct 15, 2002

Not bad at all. 90%+ the number-crunching ability of the Power4 (not including AltiVec!) at 1/3 the power draw. I'm sure this is in the realm of what people were expecting. It will fit in an Xserve in pairs. Cut the energy requirements of the 1.4GHz version by 1/3 and it will easily work in a TiBook.

Maybe it's just that, compared to the G4, anything seems fantastic.

At the very least, it's a fresh start of sorts. I'm not gonna complain.

nixd2001 · Oct 15, 2002

Originally posted by Telomar
IBM POWER4 based processor. - PPC 970
.....
*waits for the complaints to start*

What speed gain is likely to be had from moving from .13 to .09 (Fishkill being built eventually to do 0.09 ISTR)?

Either way, two of those in a quality hardware design with a quality operating system and quality applications will probably keep me content. Especially if each processor can maintain it's own memory bandwidth (now that needs a lot more details to know whether its possible - coherency etc aside).

It may be that a PC users can get more useful operations done per second (to be seen), but I'll probably get my task done first.

topicolo · Oct 15, 2002

Opteron is faster than the PPC 970!?!

Wow, I just saw the scores for the 2Ghz AMD Opteron.

Running on a mobo with dual channel PC 2700 DDR memory (that's 333Mhz DDR boys and girls), the AMD obtained a SPEC CPU 2000 INT score of 1202 and a SPEC CPU 2000 FP score of 1170. Apparently, when running on a 64bit clean compiler, the scores are about 20% higher.

Crazy. They just HAD to beat us by one step.

solvs · Oct 15, 2002

Originally posted by jadam
and who reported this story to you MR ARN!$@?????

Wow, you are an idiot.

This is the second time today you've said something stupid that I've seen (the first being that the smarter a person is, the more money they make, and how teachers are stupid for making only $30,000 a year. My Mother's teacher. And just because you go to a good school, that doesn't make you smart.). This information is everywhere. Has been for days.

And you do know ARN is one of the administrators here, don't you? What was that about you being so smart?

Dumbass.

ddtlm · Oct 15, 2002

Est. SPEC INT 937 @ 1.8 GHz
Est. SPEC FP 1051 @ 1.8 GHz

These scores mean the PPC970 is expected to get about 520 SPEC int per ghz compared to 620 per ghz for a full Power4. The PPC-970 is apparently expected to get 580 SPEC float per ghz, compared to 920 for a full Power4.

If these figures are true it is unlikely that the PPC-970 will have higher SPEC scores than the top P4s or Athlons when it is first available, since the integer score is already behind and the float score only leads the current top P4 by perhaps 12%. Well, it's a lot better than a G4, anyway.

sturm375 · Oct 16, 2002

Originally posted by rice_web
The one thing that I am hopeful for is the bus speed. IBM's G5 has been rumored to support 6.4GBs of throughput, while their G3 supports 3.2GBs. Using some math....

If 3.2GBs = 200MHz x 2 system bus (200MHz with DDR) = 400MHz

Then 6.4GBs = 200 x 2 x 2 (200Mhz with DDR and double-pumped) = 800MHz

AMD Hammer: 19.2 GB/s!

Q:How many HyperTransport links does the Hammer architecture support?
A:The Hammer architecture are designed to support up to three HyperTransport links. The combined peak bandwidth of the HyperTransport links is 19.2GB/sec.

from: Here

ddtlm · Oct 16, 2002

sturm375:

Yes, but that is only for multiprocessor Hammers (the Opterons) which cost more than normal ones, will have big L2 caches, and are generally designed for workstations and servers rather than consumer desktops.

Not to mention that all those HT links go different places, whereas the IBM bus goes one place. When transferring any single thing, there is no advantage to the Hammer as far as raw bandwidth scores.

Also note that the Hammer's HT links do not go to memory, they go to other processors and to things like AGP bridges. The Operon has dual channel DDR, presumably DDR 333, so would have something like 5GB/s memory bandwith, less than the PPC-970 when getting things from memory. (This is somewhat more complicated in multiprocessor Operton systems where each chip has its own memory and a single chip can in theory get more than 6.4GB/sec of bandwidth.)

Search

Search

IBM PowerPC Announcement

jettredmont

macrumors 68030

jettredmont

macrumors 68030

alex_ant

macrumors 68020

kenohki

macrumors regular

nixd2001

macrumors regular

alex_ant

macrumors 68020

nixd2001

macrumors regular

bacon

macrumors newbie

nixd2001

macrumors regular

alex_ant

macrumors 68020

Telomar

macrumors 6502

alex_ant

macrumors 68020

nixd2001

macrumors regular

topicolo

macrumors 68000

solvs

macrumors 603

ddtlm

macrumors 65816

sturm375

macrumors 6502

ddtlm

macrumors 65816

Our Staff