What's next beyond ARM?

ian87w · Oct 9, 2020

We've reached 5nm, and even ARM Cortex X-1 is still on ARMv8.2-A instruction wise. Are we already close to the limit of ARM? I mean sure, there's still 3nm and 2nm, but there's a literal physics limitation, no?

So what's next beyond ARM? It seems that the ceiling would be reached quite fast.

leman · Oct 9, 2020

ian87w said:
We've reached 5nm, and even ARM Cortex X-1 is still on ARMv8.2-A instruction wise. Are we already close to the limit of ARM? I mean sure, there's still 3nm and 2nm, but there's a literal physics limitation, no?

So what's next beyond ARM? It seems that the ceiling would be reached quite fast.

Not sure what you mean by “limits of ARM”. There are certainly physical limitations to semiconductor technology, but there are also a lot of tricks one can use to get better performance and power efficiency out of a CPU. I think Apple CPUs success story is a great illustration of that.

matram · Oct 10, 2020

It would be interesting to hear some of experts on this forum say what is the limit for the feature size on silicon. I understand that 5 nm is mainly a marketing number, but somewhere there will be a physical, limit even if it is in pico-meters. Of course this should apply to any foundry and is not specific to ARM. TSMC seems to be leading over intel for now.

ian87w · Oct 10, 2020

leman said:
Not sure what you mean by “limits of ARM”. There are certainly physical limitations to semiconductor technology, but there are also a lot of tricks one can use to get better performance and power efficiency out of a CPU. I think Apple CPUs success story is a great illustration of that.

Sure, AS is a great example. But then in context of Apple Macs, they just do the transition now, at 5nm. I mean what else they do to improve performance once the physical limitation is already near? From a lay person perspective, it seems like there are only a few ways to improve performance.
- Higher clockspeed: so far, this is helped by the silicon size, which is closing in to the ceiling seemingly.
- Instruction set: seems like we are just seeing iterations of ARMv8, even for the future Cortex X-1.
- Other co-processors like GPU: I guess this is another route, with Apple buffing up the GPU and adding on other co-processors like the ISP, the Neural engine, etc.

Wondering if the ARM route can give Apple at least another 15 years (like intel did), or will there be another more lucrative thing to transition to sooner?

leman · Oct 10, 2020

ian87w said:
Sure, AS is a great example. But then in context of Apple Macs, they just do the transition now, at 5nm. I mean what else they do to improve performance once the physical limitation is already near? From a lay person perspective, it seems like there are only a few ways to improve performance.
- Higher clockspeed: so far, this is helped by the silicon size, which is closing in to the ceiling seemingly.
- Instruction set: seems like we are just seeing iterations of ARMv8, even for the future Cortex X-1.
- Other co-processors like GPU: I guess this is another route, with Apple buffing up the GPU and adding on other co-processors like the ISP, the Neural engine, etc.

Wondering if the ARM route can give Apple at least another 15 years (like intel did), or will there be another more lucrative thing to transition to sooner?

I am definitely not an expert on CPU design (maybe someone like @cmaier can chime in), but aside from simply increasing clocks (which is primarily limited by node size), I can see the following potential ways to increase the performance:

- further widening the architecture (increasing the number of execution units and the front-end width), no idea how feasible it is on an already very wide architecture such as Apple Silicon — there is only that much ILP one can exploit...

- scaling up the number of cores, this is the path Intel and AMD has been going for the last couple of years and Apple has tremendous potential here (won't help with single-threaded performance though)

- optimizing the architecture to reduce execution latency — no clue what are the limits here

Overall, I am sure that the linear performance gains will continue for a while. Apple has yet to make a high-core design, and given how their performance-per-watt metrics, they are likely to be very performant. Single-threaded performance is probably going to plateau very soon however. But then again there is Nuvia... a startup founded by Apple's former chief CPU designer, who claims that they can deliver significant single-core performance increases... so who knows.

And once there is no trick left in the sleeves... no idea. I doubt a different instruction set is going to be a panacea. ARM already makes things fairly easy for the CPU, and by throwing away the 32-bit comparability mode Apple gets valuable die space for more important things. People mention RISC-V, but I don't really see how RISC-V is "better" than ARM except being open source. One would probably need different computing paradigms, and I don't have enough overview over the current state of the research to speculate. But we are still a decade or two away from it. Something will turn up, as it usually does

cmaier · Oct 10, 2020

leman said:
I am definitely not an expert on CPU design (maybe someone like @cmaier can chime in), but aside from simply increasing clocks (which is primarily limited by node size), I can see the following potential ways to increase the performance:

- further widening the architecture (increasing the number of execution units and the front-end width), no idea how feasible it is on an already very wide architecture such as Apple Silicon — there is only that much ILP one can exploit...

- scaling up the number of cores, this is the path Intel and AMD has been going for the last couple of years and Apple has tremendous potential here (won't help with single-threaded performance though)

- optimizing the architecture to reduce execution latency — no clue what are the limits here

Overall, I am sure that the linear performance gains will continue for a while. Apple has yet to make a high-core design, and given how their performance-per-watt metrics, they are likely to be very performant. Single-threaded performance is probably going to plateau very soon however. But then again there is Nuvia... a startup founded by Apple's former chief CPU designer, who claims that they can deliver significant single-core performance increases... so who knows.

And once there is no trick left in the sleeves... no idea. I doubt a different instruction set is going to be a panacea. ARM already makes things fairly easy for the CPU, and by throwing away the 32-bit comparability mode Apple gets valuable die space for more important things. People mention RISC-V, but I don't really see how RISC-V is "better" than ARM except being open source. One would probably need different computing paradigms, and I don't have enough overview over the current state of the research to speculate. But we are still a decade or two away from it. Something will turn up, as it usually does

If I remember correctly, Manu (at nuvea) used to work on blocks I owned at AMD. (Think he was one of the folks helping me on the Opteron integer unit, but it was so long ago I could be misremembering what he did.). I think he was also a unit lead while I was there and when I moved on to run the CAD team Glad to see he’s had success.

I think your summary is about right. Apple will always have an advantage so long as they continue to use the design philosophy they got from the heritage of their cpu designers, which came from Exponential, EVSX, and DEC. Don’t rely too heavily on automated design, and expert designers can take full advantage of what the process node gives you. AMD had the same philosophy.

The other Arm design teams in the world tend to come from the Gpu or asic world, and don’t share that philosophy.

I don’t think single thread performance will plateau for them any time soon, though I think 20 percent per year will slowly decrease.

I also think they will slowly modify the instruction set, dropping instructions they don’t need, and adding instructions thatare uniquely relevant to their own needs, which will help. They don’t need to be follow an ISA used by anyone else, after all. They control the compiler, the OS, the boot firmware, Rosetta, etc.

And, of course, other units (gpu, neural, something new) are always increasing in importance and there is lots of headroom for improvement there.

robbieduncan · Oct 10, 2020

One of the things ARM is great for is heterogeneous computing. It’s relatively easy to combine some ARM cores with specialised cores for other things on the same die. Way back in the day when I was at Uni (20+ years ago) this was including hardware JPEG encode/decode blocks and things like that. These days we see high performance GPU cores, Apple’s “Neural Engine” etc on the same die. So perhaps we will see a plateau in main CPU performance but this can be offset with dedicated hardware for specific tasks improving overall system performance.

Or just add more cores!

Or a pocket quantum computer?

theluggage · Oct 10, 2020

ian87w said:
So what's next beyond ARM? It seems that the ceiling would be reached quite fast.

One of the side-effects of this transition should be a further reduction in the amount of application software that contains CPU-specific code or optimisations. Hopefully developers will replace x86-only code with calls to standard Mac OS frameworks wherever possible, rather than adding lovingly hand-optimised ARM code. That way, firstly, those applications should be able to benefit from future acceleration technologies added to Apple Silicon (see @robbieduncan's comment on heterogeneous computing) and, secondly, if/when the time comes to move to RISC VI, WeHaven'tThoughtOfItYet(tm), or "Whups, Intel's got their act back together" we shouldn't have this great gnashing and wailing of teeth over software compatibility.

...I believe the App Store already has the ability to distribute apps as "bitcode" which gets translated to match the target processor on delivery, although that seems to be more about coping with different flavours of ARM64. Meanwhile, Windows .net/common language runtime is based on bytecode, as is the whole Java ecosystem and large parts of Android. In the future, only the writers of the lowest-level OS code should need to know what the CPU is (esp. when the CPU is mainly acting as glue for a collection of GPUs, vector processors, neural engines etc.).

The fact that Apple can have this sort of clean break every decade or so whereas Windows is condemned to support 25 year old Win32 binaries for the foreseeable future should be their biggest advantage over the PC world. It's one reason not to cry into our beer too much over the loss of Bootcamp...

robbieduncan · Oct 10, 2020

theluggage said:
...I believe the App Store already has the ability to distribute apps as "bitcode" which gets translated to match the target processor on delivery, although that seems to be more about coping with different flavours of ARM64.

I believe this has already been used in anger on the watch: before AW Series 4 it was a 32-bit CPU. Series 4 and on are 64-bit and, as per other Apple OSes, WatchOS on Series 4 and later demands 64-bit. Apple had demanded bit-code from the get-go for WatchOS. So all apps instantly became 64-bit and just worked. Pretty amazing. I think it may still be optional on the iOS/iPadOS App Stores. Not sure it’s supported on MacOS yet...

BarbaricCo · Oct 10, 2020

Just a transitional period before quantum PPC GX comeback.

Elise · Oct 10, 2020

What's next beyond ARM?

Hand?

Okay sorry lolol

Waragainstsleep · Oct 10, 2020

BarbaricCo said:
Just a transitional period before quantum PPC GX comeback.

Introducing.......QMac.

ChrisA · Oct 10, 2020

ian87w said:
We've reached 5nm, and even ARM Cortex X-1 is still on ARMv8.2-A instruction wise. Are we already close to the limit of ARM? I mean sure, there's still 3nm and 2nm, but there's a literal physics limitation, no?

So what's next beyond ARM? It seems that the ceiling would be reached quite fast.

Yes, soon semiconductor feature size will reach a limit. You can't just keep making stuff smaller forever. And you can't simply rise the clock speed to make it go faster either. But there is a lot that can be done...

1) obviously you can add more cores. cores use power but you turn them on and off as needed
2) If a core can only do some many clock ticks per second, then you make it do more each time the clock ticks. For years now processors have been doing this like speculative execution. I remember years (decades) ago hand-coding assembly on a CDC mainframe. That CPU had 10 functional units and we the programmer had to keep all of them fed with something to do. This idea (from the 1960s) can be greatly expanded. We can unroll loops on a CPU if there are enough arithmetic units. In short, we have a LONG way to go in this direction.
3) you can add more different kinds of cores into the mix. Already we see special neural network blocks and encryption blocks. What these do is move functionality off the ARm CPUs and into hardware.
4) what if Apple added an FPGA to the Apple Silicone? Then software apps could (optionally) include a bit file for the FPGA and in effect create custom hardware functional blocks and more anything the developer wants from CPU to hardware.

Apple has total control of both software and hardware now and they can do what they want. Note that up until the early 1980s this was the normal situation. IBM, DEC, CDC, and Perkin-Elmer, and others all had their own CPU architectures and their own OS and so on. Apple is going back to this model. It worked well as it gave each company a way to differentiate themselves from others.

Joelist · Oct 10, 2020

You need to remember that all Apple uses is the ISA - the microarchitecture of Apple Silicon is completely different from Cortex or any other ARM design. Those who have looked at it have said the only thing it resembles at all is the Intel Core 2 microarchitecture.

As to process Apple has exclusive access (by buying it all up) to TSMCs 5nm process. So assume the as yet unrevealed Mac family will be on 5nm. Apple can easily scale up the core counts and cache sizes from A Series to give them what they want for Macs. The basic core designs and such are already desktop class with short, wide pipes and super accurate branch prediction.

Longer term, Apple may at some point just go to their own ISA and stop paying ARM money. They already have their own microarchitecture so putting in their own ISA is not as big a step.

Tech198 · Oct 10, 2020

ian87w said:
We've reached 5nm, and even ARM Cortex X-1 is still on ARMv8.2-A instruction wise. Are we already close to the limit of ARM? I mean sure, there's still 3nm and 2nm, but there's a literal physics limitation, no?

So what's next beyond ARM? It seems that the ceiling would be reached quite fast.

Beyond Arm ?? lol just wondering.... Apple marketing would be good to get this without giving away any surprises like they usually do in code.

Intel has doing his for a while,, Apple's a new-commer to Mac chips, so who knows what their capable of.

mr_roboto · Oct 10, 2020

cmaier said:
I also think they will slowly modify the instruction set, dropping instructions they don’t need, and adding instructions thatare uniquely relevant to their own needs, which will help. They don’t need to be follow an ISA used by anyone else, after all. They control the compiler, the OS, the boot firmware, Rosetta, etc.

They are already adding instructions - A13 has AMX, an ISA extension for accelerating matrix multiplication. However they aren't providing documentation or tooling to support use of AMX outside Apple-provided math acceleration libraries. As far as iOS developers are concerned, the A13 is a standard AArch64 CPU.

I expect the public facing side of Apple's CPUs to continue being standard ARM. While they do control all of the things you mention, they get value from other companies also working on AArch64 tooling, and this is an area where Apple's been un-Apple for a long time. LLVM, clang, and Swift are all open source projects where Apple has conspicuously welcomed outside contributors, and has even tried to avoid being an absolute dictator of future direction. They also benefit when open source libraries are more easily ported to macOS because macOS supports standard ARMv8.x. They could decide to go stand on their own island, isolated from the rest of the world, but why? Even from a cynical point of view, the more standard they are, the more free labor they get from other companies and individuals contributing to their tools.

There's also this to consider: according to what I've heard, AArch64 was an Apple-ARM joint project. Meaning that Apple funded it, and the two companies worked closely together on the ISA spec. That's just hearsay and I don't know if the person I heard it from is authoritative, but it would explain how Apple was able to ship 64-bit ARM silicon about a year ahead of anyone else. And, if true, it would mean they're already happy with the baseline AArch64 ISA.

boss.king · Oct 10, 2020

I doubt Apple would have made the switch to ARM if they didn't have long roadmap planned out. Even if they can't shrink the process anymore, they would likely continue to expand to incorporate more dedicated coprocessors (I'm not sure if that's the correct term, this is far from my comfort zone) to make things more efficient and parallelised. And then there's always the GPU side waiting to be tackled.

Tech198 · Oct 10, 2020

mr_roboto said:
They are already adding instructions - A13 has AMX, an ISA extension for accelerating matrix multiplication. However they aren't providing documentation or tooling to support use of AMX outside Apple-provided math acceleration libraries. As far as iOS developers are concerned, the A13 is a standard AArch64 CPU.

I expect the public facing side of Apple's CPUs to continue being standard ARM. While they do control all of the things you mention, they get value from other companies also working on AArch64 tooling, and this is an area where Apple's been un-Apple for a long time. LLVM, clang, and Swift are all open source projects where Apple has conspicuously welcomed outside contributors, and has even tried to avoid being an absolute dictator of future direction. They also benefit when open source libraries are more easily ported to macOS because macOS supports standard ARMv8.x. They could decide to go stand on their own island, isolated from the rest of the world, but why? Even from a cynical point of view, the more standard they are, the more free labor they get from other companies and individuals contributing to their tools.

There's also this to consider: according to what I've heard, AArch64 was an Apple-ARM joint project. Meaning that Apple funded it, and the two companies worked closely together on the ISA spec. That's just hearsay and I don't know if the person I heard it from is authoritative, but it would explain how Apple was able to ship 64-bit ARM silicon about a year ahead of anyone else. And, if true, it would mean they're already happy with the baseline AArch64 ISA.

They could always stay on their own isolated island and continue to control.. but i don't really see Apple kickng to move to open source further until they can guarantee their security won't be compromised.

This is probably why they move in steps.. and are more reluctant than others would be. There would be a reason, nt because their fear open source, because it would unline their whole security model..

As a move to ARM, where they control everything across all their hardware products they can more easily control all of this much better...

Users want their stuff to be kind of open, and while that would be benefit, they would not to compromise on securty, depending on where they extend to.

Gerdi · Oct 10, 2020

mr_roboto said:
I expect the public facing side of Apple's CPUs to continue being standard ARM. While they do control all of the things you mention, they get value from other companies also working on AArch64 tooling, and this is an area where Apple's been un-Apple for a long time. LLVM, clang, and Swift are all open source projects where Apple has conspicuously welcomed outside contributors, and has even tried to avoid being an absolute dictator of future direction. They also benefit when open source libraries are more easily ported to macOS because macOS supports standard ARMv8.x.

This! Apple has huge benefits from the ARM ecosystem. Changing the ISA is like shooting themselfs in the foot. I could name a few more examples, like .Net/Mono including all frameworks sitting on top of .Net like the popular Unity engine or Node.js along with Electron or Java runtime (JRE) etc.

cmaier · Oct 10, 2020

Gerdi said:
This! Apple has huge benefits from the ARM ecosystem. Changing the ISA is like shooting themselfs in the foot. I could name a few more examples, like .Net/Mono including all frameworks sitting on top of .Net like the popular Unity engine or Node.js along with Electron or Java runtime (JRE) etc.

These would simply be recompiled. Not a lot of arm assembly language in the source code for any of the things you’ve listed.

leman · Oct 10, 2020

Tech198 said:
They could always stay on their own isolated island and continue to control.. but i don't really see Apple kickng to move to open source further until they can guarantee their security won't be compromised.

Not sure what you are saying here. Apple OS is built upon open source foundation, Apple has been using (and investing in) open source since, well, forever, and Apple low-level OS code is published as open source.

jdb8167 · Oct 11, 2020

boss.king said:
I doubt Apple would have made the switch to ARM if they didn't have long roadmap planned out. Even if they can't shrink the process anymore, they would likely continue to expand to incorporate more dedicated coprocessors (I'm not sure if that's the correct term, this is far from my comfort zone) to make things more efficient and parallelised. And then there's always the GPU side waiting to be tackled.

TSMC is displaying confidence that they can get to 2 nm by 2024. Even if that is off by a year or two, the day when Apple can’t rely on process shrink is still years off. It is probably more cost than physics that will bring the end of die shrinks.

Icelus · Oct 11, 2020

SVE2 (Scalable Vector Extension) would be a great extension for the Macs. This is a SIMD extension with 128-bit up to 2048-bit scalable vectors (https://community.arm.com/developer...chnologies-for-the-arm-a-profile-architecture).

DynamIQ is another technology to put different type of CPUs in a (on chip) cluster (https://www.anandtech.com/show/11213/arm-launches-dynamiq-biglittle-to-eight-cores-per-cluster). One "leak" suggest this might already be the case for the first ARM Macs (https://gadgetcrutches.com/gadgets/...arm-macbook-will-come-with-8-12-and-16-cores/).

Significant1 · Oct 11, 2020

Eventually Quantum computers in some degree will be the next step.

leman · Oct 11, 2020

Icelus said:
DynamIQ is another technology to put different type of CPUs in a (on chip) cluster (https://www.anandtech.com/show/11213/arm-launches-dynamiq-biglittle-to-eight-cores-per-cluster). One "leak" suggest this might already be the case for the first ARM Macs (https://gadgetcrutches.com/gadgets/...arm-macbook-will-come-with-8-12-and-16-cores/).

I don’t believe that Apple will use these kind of configurations. First of all, they dint need them: their high-performance cores are already faster than anything ARM and co can put out. Second, I dint see much point besides benchmark gimmicks. Qualcomm and co for example are reportedly going for a 1+3+4 or something like that, but the only reason behind it is that they can claim being as fast as Apple in single core benchmarks.

The 8 and 12 core rumors most likely refer to a 4+4 and 8+4 configurations.

What's next beyond ARM?

macrumors G3

macrumors Core

macrumors 6502a

macrumors G3

macrumors Core

Suspended

Moderator emeritus

macrumors G3

Moderator emeritus

macrumors member

macrumors 6502

macrumors 6502a

macrumors G5

macrumors 6502

Cancelled

macrumors 6502a

Suspended

Cancelled

macrumors 6502

Suspended

macrumors Core

macrumors 601

macrumors 6502

macrumors 68000

macrumors Core

Our Staff