I've always been a bit of a bit fearful of AI and so have my head in the sand a bit but - can anyone with chip knowledge explain how a chip enhances AI? I thought it was all done on request and return by supercomputers guzzling water? I also have no idea what 'neural engines' on a chip do. Is enhanced AI different to that? If someone could help an old man out here?
So there are 2 phases for modern style AI (AI isn't a single thing but a collection of techniques, most people mean deep learning or machine learning nowadays). Anyway the 2 primary phases are training a model, which is insanely energy and compute intensive and whatever machine you have isn't big enough (because if you build it bigger, well then you just add more parameters to your model! When I do some model training I use all 24 cores on my Intel box along with the +11,000 CUDA cores on my NVIDIA GPU since ML training is a massive matrix operation and GPUs are massive matrix crunching cards), but once a model is trained it takes, way, way less horsepower to use the model to classify inputs; think of your brain, learning to do a complex task (say surgery) is a very long intensive process, but once you've learned it, performing surgery as a trained surgeon is much much easier [as a doctor I speak from experience]. Unlike say our brains, most AI models are fixed once trained, so they are not continuously learning from their ongoing use, some do, but most don't which greatly reduces the compute load on the end user's computer. Another load folks often ignore (well not the people doing it) during the training phase (other than the brutal matrix operations) are the feature selection, which is where the algorithms pick out the important distinguishing features of the training data. For instance think of teaching your kids how to tell the difference between a cat and a dog, I mean in many features they're pretty similar (quadrupeds, furry, sharp canine teeth, good hearing, good sense of smell, claws) but clearly they are different species, and you could select features such as diet (cats are obligate carnivores, while dogs are omnivores, but more carnivorous), dogs cool via panting while cats cool by stealing souls via cold stares, etc, etc. These tasks may take more or less computation to pull out, and so sometimes the hardware acceleration is on the feature detector.
So the LLM models we all see in every marketing hype post took an
arctic ice cap melting(tm) amount of compute to create, but you can use many of the generative AI systems on a hefty desktop to create content on your machine. As you can imagine with many algorithms, you can find steps where you can design specific hardware to do the steps faster than a general purpose computers (such as using GPUs in the training part which is kind of a "misuse" of their intended purpose), so Apple is jamming a bunch of hardware functions that their AI libraries need to do the classification step. A lot of AI code assumes a lot of parallel processing capability (what you need to do big matrix operations). there are of course other steps which are possible to accelerate in the pipelines (on both the training and classification side)