I assume you are a practitioner. Can you answer the following question:
It *seems* that, even on the M1, all the major ML frameworks are bifurcating into infer on the Apple NPU, and train on the Apple GPU. So what's the role of AMX (introduced for the purposes of ML, and with the first few patent updates very much ML-focussed) in all this?
Answers I could imagine (but I don't know):
- AMX is useful in the exploratory phases of designing a network, where you don't know what you're doing or how it will work, and 32b, even 64b accuracy, is helpful and easy. Once the design is stabilized, you know how much and where you can move it to 16b or even 8b and GPU is better.
- we run the first phases of training on the GPU to get close to convergence, then the final phases at higher accuracy on AMX. (I've suggested doing this, in a different context, for large matrix manipulation like eigenvalues or solving elliptical PDEs)
- neural networks have many layers, and some of them are best handled on a GPU, some on AMX, some even on a CPU?
- or perhaps AMX is no longer really part of ML anymore?
(Which is not to say it's worthless! The latest patents to my eyes suggest it is being turned into a kinda super AVX512, still the matrix functionality but also a lot of the FP compute abilities of AVX512 without the downsides. So a great facility for math/sci/eng type use cases -- especially when it can become a direct compiler target. (Which I expect is coming. The current AMX design has been in such flux, so many new features added each year, and it's clear that the initial instruction set has not grown well into the new functionality! So I expect them at some point, when the pace of change slows down, to fix/redesign the instruction set and perhaps, at that point, expose it to the compiler.)