Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

BenRacicot

macrumors member
Original poster
Aug 28, 2010
80
45
Providence, RI
There's a lot of discussion on x86 vs Arm already, but in addition to the perf-per-watt war I'm not understanding the value of the Mac Pro vs Studio featuring the same M2 Ultra setup.
  1. Where's the value in the Mac Pro unless the clock speeds were pushed higher than the Studio's version?
  2. Do power limitations exist in the Arm architecture that keep it from competing with x86?
  3. If Arm chips begin clocking the same speeds as Intel chips wouldn't Arm pull ahead with a large lead?
 
1. Mostly internal PCIe Expansion,
2. Nope, take a look at Ampere CPUs or ARM's own Neoverse architecture.
3. Not necessarily. It all depends on how the caches are designed/sized, the litography used, performance-per-watt, pipeline size, and how well the CPU branch-predicts (and many other factors). It's not all about how fast the CPU runs at anymore, that trope ended in the early/mid 2000's (although theroetically ARM has an advantage as it is less complex then x86, but intel has announced x86-S to combat that).
 
For Mac Pro to have a M2 Extreme chip would necessitate a $11k SKU price point.

The reason why Apple has the efficiency edge is mainly due to Apple having an edge in terms of the node they use, the PDN (power delivery network) tech, and packaging.

Their ARM cores are actually more complex than the x86 competitors; significantly wider and with larger resources for out of order and speculation. Most people assume there is some kind of "magic" that makes ARM better that x86, but that is not the case. The ISA has little impact on overall power consumption given the same microarchitectural resources.

Apple uses their larger/more complex cores to their advantage, by running them at a slower clock rate. While allowing them to do more work per clock cycle. This allows them to operate on the frequency/power sweet spot for their process. One has to note that power consumption increases significantly (way higher than linear) the higher the frequency.

Here is where the PDN technology comes into play. Apple uses the most advanced technology to distribute power to keep all the functional units feed, which requires the ability to supply a lot of instantaneous power. To do so, Apple uses a 3D stacked architecture of 2 dies; one for the logic, and another one on top (or bottom depending where you look at it) to distribute the power. In contrast, almost every one else has to use the same die to do logic and distribute power.

The irony is that a simpler/smaller ARM core would have to be clocked faster in order to compete with Intel/AMD cores. And it would end up consuming the same high power.

Apple also has a very good SoC design. Meaning that they integrate most of the system on a single die; the CPUs, the GPU, the NPU (AI accelerator), the Codec (video processing), the camera block, I/O (USB, WiFi, ethernet, PCIe/TB, etc), and the memory controller.

For some stuff like AI and video encoding, having custom silicon handling it is far far more efficient than running it on a general purpose code.

Lastly, it also comes to packaging. Apple not only integrates the SoC in a single die, but it has the memory chips on the same package. This allows them to use low power mobile DDR chips, and since they are on package it also reduces significantly all the power that having the memory transactions run through the system's PCB externally would consume.

So it's a combination of Apple using a single package where Intel/AMD laptops require multiple through their PCBs to support the same functionality. As well as Apple having access to better overall fabrication technology for that single package that AMD/Intel have for theirs.

The trend seems to be that it is becoming more efficient for mobile vendors to scale up their products into laptops, than it is for desktop vendors to scale down their products into laptops.

There is also a key difference in business models: Apple is a system's vendor. Meaning that they sell the finished product, not just the processors. So they can use several parts from the vertical process to subsidize others. In this case, Apple can afford to make very good SoCs because they don't sell those chips elsewhere, meaning that they are not as pressured to make them "cheap" in terms of area for example. Since they're going to recoup the profit from elsewhere in the product.

In contrast; AMD and Intel sell their processors to OEMs, so they only get profit from the processor not the finished system. So they have to prioritize cost, by optimizing their designs for Area first and then focus on power. This is why both AMD and Intel use smaller cores, which allows them for smaller dies. But which have to be clocked faster in order to compete in performance, unfortunately that also increases power.

This is probably they key difference; Apple can afford the larger design that is more power efficient for the same performance. Whereas AMD/Intel have to aim for the smaller design that is less power efficient for the same performance.
 
1. Mostly internal PCIe Expansion,
2. Nope, take a look at Ampere CPUs or ARM's own Neoverse architecture.
3. Not necessarily. It all depends on how the caches are designed/sized, the litography used, performance-per-watt, pipeline size, and how well the CPU branch-predicts (and many other factors). It's not all about how fast the CPU runs at anymore, that trope ended in the early/mid 2000's (although theroetically ARM has an advantage as it is less complex then x86, but intel has announced x86-S to combat that).
Wow if PCIe expansion (sans video) is the main benefit then the value is not there for the Mac Pro at all.
If you needed expansion you can get that for hundreds of dollars in a PC MB, not thousands in a Mac Pro.
These facts even signal that their wont be value in its future either.

It really seems like the death of a product line to me, unless the Extreme chips are coming soon to renew it.
 
Please note that what I write here is my uninformed amateur opinion; there are plenty of semiconductor industry experts here who are much better qualified to discuss these matters.

1. Internal expansion (but I get that it's a bit disappointing to see a 150W SoC in that chassis)

2. What exactly do you mean by "power limitations"? Apple in particular is fairly competitive with x86 in performance, while consuming much less power for similar results. It is true that current x86 CPUs are capable of very high peak clocks, giving them a slight edge in absolute per core (but at an extreme power consumption cost, we are talking 15% higher peak performance at 7-10x higher power usage). This is a consequence of specific processor core design rather than which ISA it runs.

3. This is not about ARM or Intel, but about chip design and architectural tradeoffs. Most contemporary ARM architectures (Apple included) decided to operate using lower clocks in order to achieve lower power consumption. And when I write "lower clocks" I mean that these CPUs are physically incapable of operating under high clocks. I am not an engineer, so I don't know how it works exactly, but it was explained to me that logic designs are sensitive to timing issues. The circuits can apparently be designed in a way that lets them operate with very low power, but if you try to push the frequency past a certain threshold the signals will desync and you'll error out. In x86 land, they use circuit designs less sensitive to timing, which allows them to reach much higher clocks, but the baseline power consumption is much higher even at lower clock. Now, limiting your hardware to running at low clock only is obviously bad news for performance, so current ARM designs compensate by using a wider execution backend (more compute units per core) and executing more instructions per cycle. For example, Apple can do four independent floating-point operations per cycle, x86 processors can do only two. It is not clear to me why x86 CPUs won't use such wide backends as well, maybe it's more difficult to do when you aim for high clock, or maybe x86 is more limited by instruction decode. Intel did recently make their cores wider (with Alder Lake), but it's still much narrower than Apple's.
 
Wow if PCIe expansion (sans video) is the main benefit then the value is not there for the Mac Pro at all.
If you needed expansion you can get that for hundreds of dollars in a PC MB, not thousands in a Mac Pro.

It really seems like the death of a product line to me, unless the Extreme chips are coming soon to renew it.

Apple identified these use cases for the Mac Pro
At most 20% will go Dell/HP/Lenovo or delay a replacement MP that approaches the 1.5TB unified memory limit.

Pro desktop market size is ~75,000/year for both Mac Studio and Mac Pro.
 
Wow if PCIe expansion (sans video) is the main benefit then the value is not there for the Mac Pro at all.
If you needed expansion you can get that for hundreds of dollars in a PC MB, not thousands in a Mac Pro.
These facts even signal that their wont be value in its future either.

If you only need expansion, sure. If you also need a CPU with a high number of cores and advanced vector processing or a GPU with large amount of RAM, that's a bit more difficult. For example, a HP workstation with specs similar to the M2 Ultra will set you back around $8000-9000. So Mac Pro is actually a good deal. Of course, with a HP workstation you can also get much more powerful hardware (for much more money).
 
Wow if PCIe expansion (sans video) is the main benefit then the value is not there for the Mac Pro at all.
If you needed expansion you can get that for hundreds of dollars in a PC MB, not thousands in a Mac Pro.
These facts even signal that their wont be value in its future either.

It really seems like the death of a product line to me, unless the Extreme chips are coming soon to renew it.

Just because video was not mentioned at WWDC does NOT mean video is permanently off the table. There's more then enough bandwidth across the PCIe bus in the Mac Pro to easily handle the bandwidth of a high-performance videocard. The other consideration is that the majority of PC motherboards (that aren't built by OEMs like HP, Lenovo, or Dell) have at most two PCIe x16-capable and two x1 slots, whereas the Mac Pro has FOUR PCIe x8 Gen 4 and two PCIe x16 Gen4, in addition to the PCIe x4 Gen 3 slot used for the I/O card. The Mac Pro also has two SATA ports above the PCIe slots, along with one USB and some other port I do not recognize, so there are multiple expansion options.

For heavy video production, users could add video capture cards, SSD cards (ASUS has one that can hold up to 21 SSDs on an x16 card) or anything else they feel would be needed.
 
Just because video was not mentioned at WWDC does NOT mean video is permanently off the table.

Apple is very clear that they don’t support GPU expansion.

There's more then enough bandwidth across the PCIe bus in the Mac Pro to easily handle the bandwidth of a high-performance videocard.

Sure, but not much else. All the PCIe expansion points are wired via a switch to a single x16 interface on the chip. So even if their policy were open to third-party GPUs (and it’s certainly not!), Mac Pro would struggle with one.
 
intel has announced x86-S to combat that).
A problem with x86-S is that their "competitive moat" of legacy hardware/software support is removed making them better footing with Apple SoC & Android SoC.

This will increase the cost of x86 for legacy users and slow down refresh cycles due to lower demand for legacy x86.

I can foresee a day it be as relevant as mainframes today.
 
It is not clear to me why x86 CPUs won't use such wide backends as well, maybe it's more difficult to do when you aim for high clock, or maybe x86 is more limited by instruction decode. Intel did recently make their cores wider (with Alder Lake), but it's still much narrower than Apple's.

I remember an Intel exec stating publicly a few years back that pushing x86 beyond a certain number of simultaneous decodes actually caused more problems than it fixed. I think there's two factors at play there. First, since x86 instructions are VLI rather than fixed length, there are additional resource expenses in terms of the CPU continually checking for the end of one instruction and the start of another. Second, out of order execution (OOE) requires higher cache utilization because of the varying instruction sizes. Either way, it seems to be an issue inherent to x86 as a whole.

With ARM-based SoCs instructions are a fixed length, so the CPU can be hard coded to that length instead of having to include logic for instruction detection. This allows ARM to run wider decoders because CPU resources are not being dedicated to the aforementioned checks for a new instruction.
There's a lot of discussion on x86 vs Arm already, but in addition to the perf-per-watt war I'm not understanding the value of the Mac Pro vs Studio featuring the same M2 Ultra setup.
  1. Where's the value in the Mac Pro unless the clock speeds were pushed higher than the Studio's version?
  2. Do power limitations exist in the Arm architecture that keep it from competing with x86?
  3. If Arm chips begin clocking the same speeds as Intel chips wouldn't Arm pull ahead with a large lead?

Clock speeds are largely irrelevant, as others have stated in this thread. What's more important is Instructions per cycle, or IPC. With a higher IPC, a processor can run at a slower clock speed and still process more instructions in one minute than a higher clocked processor which runs at a lower IPC. Think of it this way: Factory A can build 2000 widgets per hour on each of its two production lines, for a total of 4000/hour. Factory B can only build 1500 per hour on a line, but it has four lines running simultaneously. While each line in factory B only produces 75% of Factory A's per-line production, Factory B as a whole produces 6000 widgets per hour, 50% more than Factory A. That's how IPC works in the computer industry.
 
Apple uses die on die approach. It's more like a 2.5D than a true 3D stacking. By this I mean, The main die has the usual layout and they use the metal layers mostly for signaling. Then on the other side of the substrate they attach another thin die, with no logic. There they put most of the capacitors and they use the metal layers for power routing, which is fed through vertically into the other die (the one with the actual logic layout).

Intel is trying to do the capacitance on an extra layer on top of the metal layers for their dies.

Basically Apple gets to do away with most of the capacitors on their package, by putting them straight onto the other side of their die. This makes for a better PDN (their cores use, ironically a lot of power but for short bursts) and reduces system cost.
 
There's a lot of discussion on x86 vs Arm already, but in addition to the perf-per-watt war I'm not understanding the value of the Mac Pro vs Studio featuring the same M2 Ultra setup.
  1. Where's the value in the Mac Pro unless the clock speeds were pushed higher than the Studio's version?
  2. Do power limitations exist in the Arm architecture that keep it from competing with x86?
  3. If Arm chips begin clocking the same speeds as Intel chips wouldn't Arm pull ahead with a large lead?
The 1990s called and want their benchmark back :). While there is thermal throttling in laptops compared to desktops, clock speed is no longer a reliable indicator of real world performance - it’s a lot more complex than that today.
 
A problem with x86-S is that their "competitive moat" of legacy hardware/software support is removed making them better footing with Apple SoC & Android SoC.

This will increase the cost of x86 for legacy users and slow down refresh cycles due to lower demand for legacy x86.

I can foresee a day it be as relevant as mainframes today.
Very likely. The vast majority of financial transactions today are still processed on mainframes.
 
  • Like
Reactions: Longplays
Very likely. The vast majority of financial transactions today are still processed on mainframes.

It counter intuitive, unless you understand how levels of integration work.

There is an emotional component to our thinking. ;-)

When Apple released the M1 (5nm) in November 2020, they were a few process nodes ahead (10nm & 7nm) of Intel (14nm from 2014-2020). The shrinking in area is quadratic.

What we are witnessing is the same thing that happened when the microprocessor took over the mainframe/supercomputers.

The perception was the system that took a whole room and had lots of blinking lights had to be the more powerful. However, what was happening was that the microprocessor guys were integrating the same functionality that took lots of separate boards on a mainframe down to a few chips.

There were some very specific use cases where the mainframe had the edge, but for the 99% of the rest of the applications, we were ending up with system on our desktops that were faster than a computer who took a whole room. Heck, you can now buy a GPU for a $1k that is more powerful than the fastest supercomputer from 2000, which cost millions of dollars, took an entire floor in a datacenter, and used almost 1 megawatt.

The microprocessor vendors also had access to larger economies of scale, which meant they could spend more money in development of their designs/tech so they were able to overlap the old large system vendors who had slower development cycles and smaller revenues.

The same thing is now happening with SoCs. They are having larger levels of integration, so they can fit a whole PC into a single chip. Which means that things run faster, with less power, and less cost. And since they are leveraging the mobile/embedded markets that are larger and are growing faster than the traditional PC/datacenter stuff.

The SoC vendors are the ones with access to the largersteconomies of scale today. So they are developing things faster.

How large? SoC vendors who make up 100% of all smartphones shipped

Android (all price points)

- 2021: 1.124 billion units
- 2022: 0.979 billion units

Vs

iPhone ($429-1599)

- 2021: 235.8 million units
- 2022: 226.4 million units

As compared to all x86 vs Apple Silicon Personal Computers shipped

Windows (all price points)

- 2021: 322.2 million units
- 2022: 263.7 million units

Vs

Mac ($999 & up for laptops + $599 & up for desktops)

- 2021: 27.9 million units
- 2022: 28.6 million units

I'll add SoC vendors who make up 100% of all tablets Windows?

Android/Windows (all price points)

- 2021: 110.5 million units
- 2022: 101 million units

vs

iPad ($449-2399)

- 2021: 57.8 million units
- 2022: 61.8 million units

Below are the total units shipped of Macs, iPads & iPhones

- 2021: 321.5 million units
- 2022: 316.8 million units

vs

Windows (all price points)

- 2021: 322.2 million units
- 2022: 263.7 million units

Apple devices out shipped all Intel/AMD PCs combined. Apple only caters to the top ~20% of any market they enter. Apple leveraged iPhone & iPad SoC R&D to create >90% of Apple Silicon. <10% R&D for whatever Mac-specific requirements are paid for Mac revenue.

aapl-1q23-line.jpg


Which is why you end up with a mobile chip trading blows with a whole PC.

So you will see mobile SoCs getting more and more powerful at a faster rate than desktop microprocessors. And once they pass the inflection point, the desktop processor starts to actually lag in performance and can't catch up.

perf-trajectory.png


This has happened several times Mainframes -> Minicomputers -> Microcomputers -> SoCs... and it's usually correlated with jumps in levels of integration.

BTW, you still have old guys who come from the mainframe era still in denial of how a PC could possibly be faster ;-)

Do not be surprised that workstation desktop users who insist on PCIe expansion slots will eventually be like them.

Hope this makes sense.
 
Sorry, my comment was somewhat snarky. It's not really about speed or technology, it's about legacy code and reliability - that's why mainframes still exist. It's cost-prohibitive to replace the code - 60 years of undocumented embedded business logic. And 5 9's isn't enough for those systems - the Z in Z-series is 'Zero downtime'.

Don't get me wrong, there's no chance I'd put new systems onto Mainframes today. But Intel vs ARM? I don't see the latter replacing the former any time soon. If anything, we'll see HAL's that make the underlying chip irrelevant.
 
Do power limitations exist in the Arm architecture that keep it from competing with x86?
I read somewhere that it could be that the Apple Silicon systems are specifically designed to run at the frequencies they’re released at… there’s not an option to just clock the CPU as high as power will allow because it’s not just a CPU, it’s a CPU PLUS all the subsystems. Subsystems that, in other PC’s, have the ability to be clocked separately for stability.

Also, considering that Apple Silicon is designed to be as performant per watt as possible to start with, some OTHER ARM architecture designed to run hot might overtake Intel at some point, but I doubt Apple Silicon, with it’s focus on the much larger consumer market, ever will.
 
  • Like
Reactions: BenRacicot
I read somewhere that it could be that the Apple Silicon systems are specifically designed to run at the frequencies they’re released at… there’s not an option to just clock the CPU as high as power will allow because it’s not just a CPU, it’s a CPU PLUS all the subsystems. Subsystems that, in other PC’s, have the ability to be clocked separately for stability.

More likely because the CPU circuits were not designed to run at high frequencies. Apple certainly uses different clocks for different systems. One of the points of Apple Silicon is that processors and even parts of processors can be powered down.
 
  • Like
Reactions: BenRacicot
I read somewhere that it could be that the Apple Silicon systems are specifically designed to run at the frequencies they’re released at… there’s not an option to just clock the CPU as high as power will allow because it’s not just a CPU, it’s a CPU PLUS all the subsystems. Subsystems that, in other PC’s, have the ability to be clocked separately for stability.

Also, considering that Apple Silicon is designed to be as performant per watt as possible to start with, some OTHER ARM architecture designed to run hot might overtake Intel at some point, but I doubt Apple Silicon, with it’s focus on the much larger consumer market, ever will.
The big performance gap is in raw GPU power. It may be more performant per watt, but Nvidia still runs circles around the on-board GPU (even with unified memory). Until and unless Apple ups their GPU game, Apple Silicon will remain second best.
 
  • Haha
Reactions: G5isAlive
The big performance gap is in raw GPU power. It may be more performant per watt, but Nvidia still runs circles around the on-board GPU (even with unified memory). Until and unless Apple ups their GPU game, Apple Silicon will remain second best.
I think once Apple aligns iPhone chip cores to Mac chip cores then we'll see better performance with GPU cores among other things.

There was a rumor that iPhone chips last Sept were supposed to have Ray Tracing
 
I think once Apple aligns iPhone chip cores to Mac chip cores then we'll see better performance with GPU cores among other things.

There was a rumor that iPhone chips last Sept were supposed to have Ray Tracing
How? A15 chips aren't even remotely close to Nvidia's performance, and the benchmarks show them lagging the M2.

The best option would be to add eGPU capability back to Macs, but they've gone all-in on unified memory, and still have sour grapes over some old hurt with NVidia (that's why even the intel macs are stuck with inferior AMD GPU's). This is one of those cases where being proprietary really does hurt users.
 
How? A15 chips aren't even remotely close to Nvidia's performance, and the benchmarks show them lagging the M2.

The best option would be to add eGPU capability back to Macs, but they've gone all-in on unified memory, and still have sour grapes over some old hurt with NVidia (that's why even the intel macs are stuck with inferior AMD GPU's). This is one of those cases where being proprietary really does hurt users.
I am speaking of Core generation tech.

iPhones chips get refreshed annually.

Mac chips every 19.5 months.

As time goes on their Mac chip Core generation tech gets more and more delayed.

If you scale up this September's iPhone chip cores to those of Mac chips then you will see performance improvements.

eGPU is a short term solution. A long term solution is for core generation alignment.

The performance tragectory of Mac GPU cores will surpass those of flagship RTX before 2030.

RTX 4090 draws 450W + power spikes. Mac Studio M2 Ultra non-binned has a CPU max power draw of 215W for the whole system.

And the Ulttra is a paltry 2nd place to the 4090.

Nvidia created material financial damage to Apple, reputational damage for having faulty Macs due to Nvidia parts and IIRC Nvidia leaked Apple future products.

So their being "butt hurt" is an understatement.

Anyone who wants a i9 & 4090 are better served buying a PC that cost a fraction of a $7k of a base Mac Pro
 
Last edited:
I am speaking of Core generation tech.

iPhones chips get refreshed annually.

Mac chips every 19.5 months.

As time goes on their Mac chip Core generation tech gets more and more delayed.

If you scale up this September's iPhone chip cores to those of Mac chips then you will see performance improvements.

eGPU is a short term solution. A long term solution is for core generation alignment.

The performance tragectory of Mac GPU cores will surpass those of flagship RTX before 2030.

RTX 4090 draws 450W + power spikes. Mac Studio M2 Ultra non-binned has a CPU max power draw of 215W for the whole system.

And the Ulttra is a paltry 2nd place to the 4090.
Exactly, Apple wins power per watt. But at the price of substantially worse performance. If you want/need top level raw performance, Apple simply can't compete. That's not going to change until they come out with a non-power/thermally constrained desktop-only chip. But they are clearly on a 'one chip to rule them all' architecture with a primary design goal of performance per watt, so the need to run the same chip in a laptop will continue to cripple the desktops.

There's not much point in buying a Mac pro over a studio with the current design, unless you need very specialized PCIe hardware. That's a shame - Apple's going to miss out on a lot of the LLM/AI/ML developer market because the machines are going to be a distant second to linux/windows intel systems.
 
How? A15 chips aren't even remotely close to Nvidia's performance, and the benchmarks show them lagging the M2.

Performance per GPU shader cluster is the same (in fact, Apple might even do better since their architecture seems to be less prone to stalls, but I’m still not 100% sure about the details), it’s just that Nvidia has much more die space to put compute logic on, plus, they are more aggressive about reusing shared hardware functionality which gives them more effective compute per mm2. For example, AD102 (the die that powers 4090) is just 20% larger than M2 Max, but Nvidia can fit almost 4x as many shader clusters on it!
 
  • Like
Reactions: BenRacicot
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.