My guess is that it focuses mainly on AI and thermal improvements.
There is a bit of room for thermal, but it isn't massive. 10% if they go with the same power - which would be hard to sell.
If there are further AI improvements those have to come with either another groundbreaking change of the architecture (because M4 already is crazy good at that ppw wise) or memory. The latter would point to them using SoIC for some kind of on stacked memory not unlike some of AMDs Epyc chips use, but if so then possibly "under" the actual die instead of "on" the die - for thermal reasons.
But that would be relatively expensive, hence I would be rather surprised if they did it for the standard M Series variant. Remember: AMD prices in that ONE small SoIC die on the 9900X3D vs the 9900X as about 300$ extra. Doesn't mean it's 300$ more expensive, not even close. But it means it certainly isn't cheap - and that's on N4P, meaning refined 5nm. On the other hand, yes, THAT would really fix a lot of problems the M4 has with AI stuff, and frankly also general GPU tasks, and it would allow for more power to go towards actually computing stuff and less towards moving around bits.
The reason I don't see them stacking either GPU or CPU or even an NPU on top or even under the SoC, period: For technical reasons you can't do the same nodes stacked on top of each other. And since all architecture since M3 is N3 that would mean they would have to put some compute logic in 5nm structures. And even for Apple backporting either of those logic structures to 5nm isn't "cheap", not even talking about losing the advantages gained with 3nm.
Which could make hella sense, but would be hella ambitios and hence
again I don't see it on standard M5: Put I/O (serdes stuff like TB), accellerators (de/encoders), DRAM interface in a 5nm slab at the bottom, and stack a 3nm slab with GPU, NPU and CPU on top - and have both slabs use the additional space gained for big caches. Like the Mall Cache Mx already has, just (much) bigger.
If they actually put that into the standard M5 ... that would run circles even around the M4. The question is: how are they gonna put that into a 600 dollar computer then and make any profit? ^^