Anand smacks down Cell and Xenon processors


macrumors 65816
Original poster
Oct 28, 2003

Right now, from what we’ve heard, the real-world performance of the Xenon CPU is about twice that of the 733MHz processor in the first Xbox. Considering that this CPU is supposed to power the Xbox 360 for the next 4 - 5 years, it’s nothing short of disappointing. To put it in perspective, floating point multiplies are apparently 1/3 as fast on Xenon as on a Pentium 4.
Now I know that the 970 and the cores used in the Cell and Xenon are different but I am feeling much better about the switch to Intel.
In the end, you get what you pay for, and with such a small core, it’s no surprise that performance isn’t anywhere near the Athlon 64 or Pentium 4 class.
All you guys complaining about how IBM was screwing over Apple and giving MS 3.0GHz no longer need to worry. It was for the best.

And for the icing:
The most ironic bit of it all is that according to developers, if either manufacturer had decided to use an Athlon 64 or a Pentium D in their next-gen console, they would be significantly ahead of the competition in terms of CPU performance.


macrumors 604
Mar 17, 2004
Posted this a while ago, here.

but anyway, here's an alternate link to the article. It was taken down to protect Anandtech's developers who broke an NDA.

The Anandtech article quoted in the original post:
AnandTech Article Mirror

We knew this a while ago though. Arstechnica pretty much agreed, with their article here:

As you can see, the PowerPC 970 has double the number of floating-point, integer, and load-store execution units as its PowerPC forerunner. This enables the 970 to execute twice as many floating-point, integer, and memory access instructions at the same time. Executing more instructions at once means that programs run faster and performance is increased.
The XBox 360 processors are capable of doing half the instructions as a G5 processor at any point in time...

Then there's the fact that the XBox 360 lacks out-of-order execution.

The basic idea behind both Cell and Xenon is to make the execution core less complex by stripping out hardware that's intended to optimize instruction scheduling at runtime. Neither the Xenon nor the Cell have an instruction window, which means that these two processor designs largely forget about instruction-level parallelism. Instead, instructions pass through the processor in the order in which they're fetched, with the twist that two adjacent, non-dependent instructions are executed in parallel where possible.

This static execution scheme is pretty much the same one used in older, less complex designs, like the original Intel Pentium. Static execution is simple to implement and takes up much less die space than dynamic execution, since the processor doesn't need to spend a lot of transistors on the instruction window and related hardware. Those transistors that the lack of an instruction window frees up can be used to put more actual execution units on the die.

Of course, you can't just eliminate the instruction window and replace it with more execution units without doing something to make up for the lost instruction window. The whole point of an instruction window is to complement an increased number of execution units by extracting more instruction-level parallelism. So in order to take advantage of a large number of execution units without an instruction window, you have to rethink how you organize the processor.


This TLP strategy will work extremely well for tasks like procedural synthesis that can be parallelized at the thread level. However, it won't work as well as an old-fashioned wide execution core + large instruction window for inherently single-threaded tasks. In particular, three types of game-oriented tasks are likely to suffer from the lack of a out-of-order processing and core width: game control, artificial intelligence (AI), and physics. Each of these three tasks tends to be relegated to a single thread, and each tends to have some inherent ILP that can be exploited by a large instruction window and a robust, multipipeline integer execution unit. But I'll say more on this point in the article's conclusion.

Then the X360 suffers from deep pipelining...

Pipelining isn't free, because increased pipeline depths normally increase die sizes in other ways. There's the added control logic overhead associated with deeper pipelining, but this is minor compared to the amount of logic taken up by those other two hallmarks of a deeply pipelined machine: elaborate branch prediction logic and large caches.

For reasons I've explained in detail here, the more deeply a machine is pipelined, the greater an impact branch mispredicts and cache misses have on performance. Branch mispredicts and cache misses cause the pipeline to stall, thereby eliminating pipelining's performance-enhancing advantages. Deeply pipelined processors invariably include fairly large caches and very elaborate, highly accurate branch prediction schemes aimed at preventing pipeline stalls. Large caches take up large amounts of die space and draw a correspondingly large amount of power. Similarly, complex branch prediction schemes like that of the PPC 970, which use multiple tables to store branch histories and branch targets, also carry a high cost in terms of die space and power consumption. All of these factors can contrive to make deep pipelining just as inefficient as a wide execution core.

Then there's the cache...

Because the Xenon's caches are tuned for streaming media applications, when it comes to streaming media code they should perform quite well. Unfortunately, however, the "streaming media" application profile doesn't fit every part of the average game engine. In spite of the mitigating factors I've discussed above, the small size of Xenon's L2 is going to hurt branch-intensive game logic, AI, and physics code. Just how much it'll hurt is open to dispute, but I'll take up this topic in more detail in my conclusions.
So it's probable that the PPE's branch mispredict rate is higher than that of the 970, and that this higher mispredict rate, when combined with the PPE's relatively small caches, will hurt performance on branch-intensive code.

Now for the conclusion,

Rumors and some game developer comments (on the record and off the record) have Xenon's performance on branch-intensive game control, AI, and physics code as ranging from mediocre to downright bad. Xenon will be a streaming media monster, but the parts of the game engine that have to do with making the game fun to play (and not just pretty to look at) are probably going to suffer. Even if the PPE's branch prediction is significantly better than I think it is, the relatively meager 1MB L2 cache that the game control, AI, and physics code will have to share with procedural synthesis and other graphics code will ensure that programmers have a hard time getting good performance out of non-graphics parts of the game.

Furthermore, the Xenon may be capable of running six threads at once, but the three types of branch-intensive code listed above are not as amenable to high levels of thread-level parallelization as graphics code. On the other hand, these types of code do benefit greatly from out-of-order execution, which Xenon lacks completely, a decent amount of execution core width, which Xenon also lacks; branch prediction hardware, which Xenon is probably short on; and large caches, which Xenon is definitely short on. The end result is a recipe for a console that provides developers with a wealth of graphics resources but that asks them to do more with less on the non-graphical side of gaming.

And since the PS3 uses the same processor as the XBox 360, with a bunch of SPE's attached:

At any rate, Playstation 3 fanboys shouldn't get all flush over the idea that the Xenon will struggle on non-graphics code. However bad off Xenon will be in that department, the PS3's Cell will probably be worse. The Cell has only one PPE to the Xenon's three, which means that developers will have to cram all their game control, AI, and physics code into at most two threads that are sharing a very narrow execution core with no instruction window. (Don't bother suggesting that the PS3 can use its SPEs for branch-intensive code, because the SPEs lack branch prediction entirely.) Furthermore, the PS3's L2 is only 512K, which is half the size of the Xenon's L2. So the PS3 doesn't get much help with branches in the cache department. In short, the PS3 may fare a bit worse than the Xenon on non-graphics code, but on the upside it will probably fare a bit better on graphics code because of the seven SPEs.

I don't know why all you guys made a big deal about the PowerPC processors in the XBox 360, they're nowhere near a G5 ;)

My quotes were from the Arstechnica article, definitely read the Anandtech one in the top of my post.


macrumors P6
Jun 4, 2003
StokeLee said:
IVe read that three times, and ive still no idea what it meant.
im still going to buy an Xbox 360 as i love the LIVE service.
oh well :D
And for non-technical people as yourself, that's perfectly fine, no need to worry - some people just like a certain type of game, or the LIVE service for instance as you referenced, or simply like the platform "just cuz" and could care less about the actual performance of what's under the hood. As I say, nothing wrong with that at all. :)

And thanks GFLPraxis for the detailed synopsis. :cool: