Future of data-parallel processing on Apple Silicon

leman · May 13, 2022

Maynard Handley (author of the M1 explainer mega document) has been doing some more digging and is speculating that Apple has some far reaching plans for it's AMX (Apple Matrix coprocessor) hardware. You can read the original post and discussion here, with a short summary below: https://www.realworldtech.com/forum/?threadid=206023&curpostid=206023

- AMX currently accelerates large matrix multiplication, with the focus on throughput (a lot of data processed per unit of time, but you might need to wait a bit for the results to be returned)
- some patents suggest that AMX might support more sophisticated operations in the future (that will be very helpful for ML applications)
- some more patterns suggest that Apple aims to make AMX into a throughput-oriented general purpose vector processor (think AVX512, streaming SVE or RISK-V vectors)

Mind, this is all just speculation, but it does paint an interesting picture for data-parallel processing on Apple Silicon. If this is correct, we are going towards the following:

- latency-oriented "short" (128bit) SIMD in the CPU itself (NEON, maybe SVE2), optimised for smaller problem sizes where you need to mix SIMD operations with "normal" operations
- throughtput-oriented "long" SIMD in a dedicated coprocessor, optimised for large, specialised problems such as ML and numeric operations on relatively large bodies of data, most likely hidden behind APIs, with no direct access to low-level instructions
- throughput-oriented GPU kernels for general-purpose processing of very large datasets

I think this would be in line with Apple's current approach of replicating functionality so that they can offer an optimal tool for many problem domains. Extending AMX in this way will allow Apple to close the functionality gap to both Nvidia (Tensor Cores) and workstation Intel (AVX512), without any noteworthy drawbacks. And there is still NPU for low-power machine learning inference and the GPU for everything else.

P.S. If they are indeed headed this way, I can see them skipping SVE/ARMv9 alltogether.

Search

Search

Future of data-parallel processing on Apple Silicon

leman

macrumors Core

Our Staff