Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
So the performance increase between M4 and M5. Is it due to architectural upgrades (the new cores for example) or is it the chiplets?

Because they are still using 3nm so that shouldn’t account for any of the uplift?
 
So the performance increase between M4 and M5. Is it due to architectural upgrades (the new cores for example) or is it the chiplets?
Surely it is both? Apple's philosophy is to push forward, on all fronts. [I'm hoping they will let Anand Shimpi talk about Fusion Architecture, similar to his last appearance, in between the M2 Pro/Max and the M2 Ultra.] It appears there is a sort of parallel or bidirectional process as the architecture moves between the A-series and the M-series -- by the time the platform makes it from A18 Pro in the iPhone 16 Pro to M5 Pro/Max in the MacBook Pro, it is tried-and-true. The A19 Pro, at the same time, benefits from that experience and builds on it.

The introduction of SoIC (integrated chips, or "chiplets") fits into this -- the M series gets it first, but only with the Pro/Max, while the A-series may never need it -- SoC lives on.
Because they are still using 3nm so that shouldn’t account for any of the uplift?
It's still a process-node refinement, from N3E to N3P, so that contributes -- TSMC has advertised a 5% performance increase with a 5%-to-10% power-efficiency improvement. [Tom's Hardware] So it helps, but it's not the heart of the matter.
 
Last edited:
  • Like
Reactions: Antony Newman
@DrWojtek @tenthousandthings I don't see how chiplets can have any part in improving performance. What am I missing?
I just mean the new framework ("Fusion Architecture") may allow for changes that weren't possible when it all had to fit onto a single die. The new core structure probably isn't just a change in names, more die space means more design flexibility, with different limits.
 
Thanks. I am simply trying to forecast next step -> M6.

If Apple indeed are planning to slim down the devices, I wouldn’t be surprised if they use the entire node shift to bring down power consumption, exclusively. The Max apparently throttles in the M5 14” enclosure, just imagine if it was even thinner.
 
  • Like
Reactions: Populus
The M5 Pro and M5 Max have separate dies for both the CPU and the GPU, right? Do you think they will be encapsulated together?

Do you think they will engineer two heatipes or dissipators, one for the CPU and a bigger one for the GPU? Maybe that’s not possible for the MacBook Pro, but for the Mac mini and Mac Studio…?

The M1-3 Ultra is two dies and had one . The gap between the dies is extremely less than a mm. To put into proper context here is Apple's picture of the M1 Ultra package.

Apple-M1-Ultra-chipset-220308_big.jpg.large_2x.jpg


The two chips basically touch either other so regardless of whatever put on top, you are still going to get thermal transfer between the two dies.

The next generation TSMC SoIC technolgy comes in two forms.. Either some dies layered on top. That is even less possible to posit two different spreaders (and even more thermal coupling between the dies). The other is horizontally like the older technique used in the Ultras so far. (there is a small 'bridge' die and the two are stacked on top of that. ). CPU and GPU split very heavily points to the horizontal set up.

In the picture above if the 'top' die is 50% smaller and the lower die was made 50% bigger it is the same thermal problem. Just slightly reshuffling the 'deck chairs' on the same boat.


Nevermind that the spreader likely needs to have a larger 2D footprint than what. Again, even if have two spreaders if they touch each other in any substantive way ( or leak heat into one another), you still have a significant thermal coupling problem.

I mean, with those neural accelerators, the 20/40 cores GPU will get very hot in machine learning tasks… and if they share the heatpipe, the heat could be transferred to the CPU, which is usually not as hot, right?

If it is one, large monolithic die how is the problem any different? There are 'hot spots' there too if corner case loads which isolate certain subsections.

If the CPU and GPU 'chiplets' were twice as far away as the RAM modules in the above picture. Then there are two more isoloated contexts. But the whole point of 'Fusion' it is make a bigger (and easier to make) one chip out of two relatively smaller dies. They are extremely tightly coupled back together after splitting them apart.

Even AMD Rzyen that isn't as microbump bridge packaged isn't individualized per chiplet.

1413018-am5-ryzen.jpg


The immediate thermal path should be more focued on "up and away" than on 'side to side". But once 'up" some amount do want to spread out as far in both axis as long as those keep moving toward the next (even larger) exit from there.
 
  • Like
Reactions: Populus and Basic75
I think WWDC is our best chance of seeing one this year, but timing is tight if they stick to what they've done in the past.

MacBook Pro M1 Max -> Mac Studio M1 Max/Ultra = 4 months, 23 days
MacBook Pro M2 Max -> Mac Studio M2 Max/Ultra = 4 months, 20 days
MacBook Pro M4 Max -> Mac Studio M4 Max/M3 Ultra = 4 months, 4 days

Anytime they've released a Max chip in a MBP and followed it up with a refresh for the Studio with that same Max chip, it's been about 4-5 months. WWDC would be closer to 3 months. I suppose there could be a late June/early July press release since it's just supposed to be a chip upgrade, no design changes.

Apple could do a 'soft launch' / 'sneek peek' as they have done with the Mac Pro before. They talk about what it going to be and then go gadge non-hard committ demand and then set up the ramp to suit.

One reason for the delay above is that Maxes are used to make the Ultras. If your laptops with Maxes are completely selling out what you make the Ultras with? So one of the issues is how much chiplet overlap is there going to be. WIth the laptop Pro and Max both likely sharing a CPU chiplet if the demand for Pro is unexpectly high you don't have 'extra' CPU chiplets for even more Maxes ; let alone for Ultra (if Ultra used the same chiplet). If the Ultra CPU chiplet was different (e.g., desktop CPU in some fashion on I/O ) then the 'laptop' Pro demand doesn't have much of an impact. Similar issue if the Max GPU chiplet is shared between studio's Max and demand for Max laptops is past forecasts.

If the 'Ultra' isn't just two dies then may need different packaging facilities and probably different bridge die(s).

Of course, Apple's free to change their timing on things whenever they please,

It likely isn't just whatever Apple's wants ... a substantive compoent of this is what the contractor manufactures can build and deliver. Products that have twice as much base capacity RAM at this point need more of a scare part.
If the "fusion" process is more complicated... who else is in line for that more complicated packaging. If it is some AI 'drunken sailor budget' competitor, they may have bought up a large portion of production slots.

Apple could 'jump the queue' to get to "ultras"/"Max" if their AI servers with M2 gen stuff are now whoefully underpowered for the job at hand.


but I just don't see an October release for a Studio. I think if we don't see one by Summer, it'll be pushed to 2027, but both October and 2027 seem crazy far away when it seems like inventory of the Studio is drying up now. So to me, WWDC is looking really good.

Depends. If the '"Ultra" variant is entangled in some CoreAI library upgrade that only comes with OS27 , then it is stuck waiting on a part that won't show until about October.
 
While I do not know the exact mechanisms, the fact that it does is obvious, is it not? If it weren't improving performance, why do it at all?
Smaller dies give better yields. Getting more functional dies from a given wafer and lowering the overall cost per die.

It also means you can mix and match foundry nodes depending on the part as not everything scales linearly.
 
  • Like
Reactions: tenthousandthings
~
The M1-3 Ultra is two dies and had one . The gap between the dies is extremely less than a mm. To put into proper context here is Apple's picture of the M1 Ultra package.

Apple-M1-Ultra-chipset-220308_big.jpg.large_2x.jpg

I wrote this for a different thread, but it applies here:

TSMC goes out of its way to say that SoIC ("Fusion Architecture" in Apple's implementation) is compatible with UltraFusion ("InFO-L" in TSMC's current terminology, if I understand that correctly, they have a tendency to change their nomenclature without public comment).

The hope, as I see it, without any expertise other than having good reading-comprehension skills and a bit of time on my hands, is that Apple could introduce an Ultra-only secondary SoIC chip (to be paired with a standard Max) that replaces the CPU die with a second GPU die. To my mind, that fits Apple's criteria -- it doesn't require much additional R&D other than with regard to UltraFusion's local ("L") silicon interconnect, which is presumably already part of the Ultra budget.

This theory assumes that the CPU and GPU dies are similar in size, which I don't think is something we know at this point. I don't recall seeing any representations of Fusion Architecture -- the last time Apple did the CGI imagery was M3 (press release) -- the M4 (press release) doesn't use CGI and doesn't show relative sizes (at least not directly), but it does show graphical representations of the layouts, so an improvement overall, IMHO. But unless I've missed something, the M5 (press release) has none of that, neither CGI nor layouts, which makes me think more is to come when the Ultra launches. Perhaps there's something there they don't want to reveal at this point?

[On the other hand, the M3 Ultra (press release) was the same, so maybe I'm reading too much into the absence of any graphics.]
The next generation TSMC SoIC technolgy comes in two forms.. Either some dies layered on top. That is even less possible to posit two different spreaders (and even more thermal coupling between the dies). The other is horizontally like the older technique used in the Ultras so far. (there is a small 'bridge' die and the two are stacked on top of that. ). CPU and GPU split very heavily points to the horizontal set up.
The use of a horizontal layout inside the IC is probably confirmed now, since the original rumor from Ming-chi Kuo (screen shot of X post from December 2024 attached, now confirmed to have been accurate with regard to SoIC, at least) specified something called "SoIC-mH" (where "H" = horizontal) -- unlike Gurman, Kuo is more rigorous and does not mix-and-match things that he hears. So I think it's a safe bet.

For those who don't know what @deconstruct60 is referring to, see this diagram of different internal SoIC layouts from TSMC (see here for the original context):

TSMC SoIC heterogheneous integration schema.png
 

Attachments

  • Screenshot 2026-03-14 at 11.13.33 AM.png
    Screenshot 2026-03-14 at 11.13.33 AM.png
    271.1 KB · Views: 56
Last edited:
  • Like
Reactions: DrWojtek
While I do not know the exact mechanisms, the fact that it does is obvious, is it not? If it weren't improving performance, why do it at all?
The only thing obvious to me is that chiplets are a technology to improve yields and flexibility, not performance, that's why I'm asking.
 
Off the top of my head and not being specific:

* Separation of ’tiles’/chiplets -> lower temperatures -> higher performance potential

* Higher yield rate due to smaller chips may allow for more complex, better architechture/design, which lower yields, but since yields are up due to chiplets, it evens out

* Same argument but regards to new fab tech -> Apple might be more willing to try bleeding edge nodes, since chips are smaller -> higher performance

* Separation of chiplets may allow for faster architechture designs - for example cache and RAM could become more easily placeable and accessable -> higher performance
 
~

I wrote this for a different thread, but it applies here:



The hope, as I see it, without any expertise other than having good reading-comprehension skills and a bit of time on my hands, is that Apple could introduce an Ultra-only secondary SoIC chip (to be paired with a standard Max) that replaces the CPU die with a second GPU die. To my mind, that fits Apple's criteria -- it doesn't require much additional R&D other than with regard to UltraFusion's local ("L") silicon interconnect, which is presumably already part of the Ultra budget.


Two 'GPU' dies ends up with zero CPUs. How is that going to be a useful "Ultra'. The CPU dies likely contains the I/O (thunderbolt / usb ) , the SSD , Security enclave . If drop those also how useful a Mac will you have?


From Apple's description there is an issue. Their New fusion is "two dies" joined. That makes it very likely that that each of these only one one "fusion edge". ( Just like the 'twin' Max die Ultras packages. ) If there only two dies it seems pretty unlikely that either of these are 'pure' GPU or CPU. There is other stuff on the die also. And that has substantive impacts in how you can mix-and-match them and still get a working SoC system.

So two of each of these don't make a system. Likely that the GPU die has the memory I/O attached to it ( since it is a the much bigger bandwidth 'hog' ). So two 'CPU' chiplets gets you no memory which doesn't go far. And two GPUs gets you no I/O which isn't very useful as a personal computer system. ( doesn't even get you are useful GPU PCI-e chip either because the PCI-e was on the other chiplet. might be able to shovel a third smaller die between the two to get that back plus some other stuff. But past 'two dies'. )


Apple could produce a bigger than a 'Max' package by attaching a bigger GPU die (with more memory I/O also) to the CPU die. Additionally could replace both. A bigger CPU die ( with more I/O if the laptop version is stripped down ). It wouldn't be double the number of GPU cores but it could be 'more'. ( more cores added with more memory bandwidth likely would have higher performance. Along with being more expensive. 🙂 )

But end up with some chiplets with a much lower unit volume rate. Some speculative hope with SoIC might come some either stacked I/O or stacked cache to keep the 2D footprint smaller. If lower volume and much smaller those trade off with one another to keep costs down.



This theory assumes that the CPU and GPU dies are similar in size, which I don't think is something we know at this point.

Focused chiplet design isn't about the same size chiplets between the variants. If doing focused function decomposition you actually want different sizes. Two very large identical twins is more 'lowering design budget' than 'chiplet' design. Want to end up with at least one chiplet that can use across package products (e.g, reusing the
CPU + I/O chiplet with both Pro and Max. That way costs get spread out over more systems that use those. )

Size is more of an issue in that there is usually some limitation of how big SoiC or Info-LSI or CoWoS-LSI can be applied to. The technology that Apple used for M1-M2 era was limited to one reticle . SoIC likely has recticle multiple limits also.

If the M5 Max was split (Fission) because it got too big to get better yields and ease of production , then a problem could appear if try to 'double it' and still be able to use the same technology to 'fusion' it back together again.

From the numbers that have appears I suspect what Apple is calling P cores are bigger than E cores (and smaller than Super). But that makes the CPU section larger. Similar with the GPUs getting more NPU like compute that seems likely to have gotten bigger also. It has upside in that closer to a Ultra without having to spend twice as much silicon. However, the doubling on top of that might have slipped away (at least in a more affordable way).



The use of a horizontal layout inside the IC is probably confirmed now, since the original rumor from Ming-chi Kuo (screen shot of X post from December 2024 attached, now confirmed to have been accurate with regard to SoIC, at least) specified something called "SoIC-mH" (where "H" = horizontal) -- unlike Gurman, Kuo is more rigorous and does not mix-and-match things that he hears. So I think it's a safe bet.

Yeah two 'hot' dies of CPU and GPU stacked vertically probably doesn't work at this point.

It could be possible to bury some of the i/O and cache but that would be two different dies each for the Pro and the Max which probably would drive up costs ; not lower them (at least for laptops). The M1 Pro , Max used shared design. Pretty good chance just being done with with chiplets this time.
 
The only thing obvious to me is that chiplets are a technology to improve yields and flexibility, not performance, that's why I'm asking.

If chiplets allow you to bust through the retricle limit then you get more performance on local data grid workloads that scale with cores. More cores , more work done , more performance. it won't help as much with single threaded drag racing, but that. too isn't all of 'performance'.

AMD Ryzen through Epyc is mostly a matter of more of the same chiplet coupled to higher levels of I/O to keep up with the increased performance. [ it isn't maximum perf/Watt though. So will see more more monolithic and fewer chiplets in laptops variations. ]

display output GPU tends not to work quite as well.
 
Last edited:
Off the top of my head and not being specific:

* Separation of ’tiles’/chiplets -> lower temperatures -> higher performance potential

chiplets intercommunication doesn't lower power consumption. It is higher. Can offset that by using smaller connections at absolute minimal distances. but the style that AMD has used on Ryzen/Epyc for first couple of generations isn't lowering. [ Laptop in those generations there are no chiplets. ]


* Higher yield rate due to smaller chips may allow for more complex, better architechture/design, which lower yields, but since yields are up due to chiplets, it evens out

Errr. more so chiplets allow optoin to use two different fab processes. One more dense that is kept smaller ( so can go more complex) and other better sutited to something that doesn't require maximum desne (like I/O ). Most of the time something that is 2-3 years mature has higher yields than something that is 1 (or less) years old. It is more so when you can use it more so than the more complex is worse (forever).

doing a 'complex' which is really a mismatch with the fab process design kit is going to get you lower yields. However, that is more a learning curve issue. In general "better architecture/design" should at least be in part , fewer mistakes. The most Rube Goldberg complex microarchitecture probably isn't 'better'.


* Same argument but regards to new fab tech -> Apple might be more willing to try bleeding edge nodes, since chips are smaller -> higher performance

bleeding edge usually comes with relatively lower yields . What limiting here is the impact of the defects to a smaller area. ( so that may be able to salvage the die to something useful. ). The defects are not disappearing. What you are doing with chiplets is limited the scope of impact ( less collateral damage to stuff that didn't have the defect).


* Separation of chiplets may allow for faster architechture designs - for example cache and RAM could become more easily placeable and accessable -> higher performance

Stacking cache on a CPU cihplet isn't 'easier'. But it can cut down the route to where the data has to go with blowing out the reticle limit.
 
  • Like
Reactions: Basic75
chiplets intercommunication doesn't lower power consumption. It is higher. Can offset that by using smaller connections at absolute minimal distances. but the style that AMD has used on Ryzen/Epyc for first couple of generations isn't lowering. [ Laptop in those generations there are no chiplets. ]




Errr. more so chiplets allow optoin to use two different fab processes. One more dense that is kept smaller ( so can go more complex) and other better sutited to something that doesn't require maximum desne (like I/O ). Most of the time something that is 2-3 years mature has higher yields than something that is 1 (or less) years old. It is more so when you can use it more so than the more complex is worse (forever).

doing a 'complex' which is really a mismatch with the fab process design kit is going to get you lower yields. However, that is more a learning curve issue. In general "better architecture/design" should at least be in part , fewer mistakes. The most Rube Goldberg complex microarchitecture probably isn't 'better'.




bleeding edge usually comes with relatively lower yields . What limiting here is the impact of the defects to a smaller area. ( so that may be able to salvage the die to something useful. ). The defects are not disappearing. What you are doing with chiplets is limited the scope of impact ( less collateral damage to stuff that didn't have the defect).




Stacking cache on a CPU cihplet isn't 'easier'. But it can cut down the route to where the data has to go with blowing out the reticle limit.
Thanks!
 
Two 'GPU' dies ends up with zero CPUs. How is that going to be a useful "Ultra'. The CPU dies likely contains the I/O (thunderbolt / usb ) , the SSD , Security enclave . If drop those also how useful a Mac will you have?
There would still be a CPU die, in the first Max. You would only drop the CPU in the second Max, replacing it with another GPU.

The classic Ultra (2x Max) would still be an option, but there would be an option for an Ultra configuration with 1x CPU and 3x GPU. That’s all I’m suggesting.
Their New fusion is "two dies" joined. That makes it very likely that that each of these only one one "fusion edge". ( Just like the 'twin' Max die Ultras packages. ) […]
I’m not sure I understand you. TSMC says plainly that SoIC (which Apple is calling Fusion Architecture SoC) can still be interconnected via advanced packaging like InFO-LSI (UltraFusion). They (Fusion and UltraFusion) aren’t the same thing. As I understand it, InFO-LSI doesn’t require symmetry — the interconnected units don’t have to be identical — so it seems to me it could be used to do what I’m describing.
 
  • Like
Reactions: eldho
There would still be a CPU die, in the first Max. You would only drop the CPU in the second Max, replacing it with another GPU.

The classic Ultra (2x Max) would still be an option, but there would be an option for an Ultra configuration with 1x CPU and 3x GPU. That’s all I’m suggesting.

I was not thinking of putting two "Fusion" connectors on the GPU chiplet. One to get back to a Max function and another to connect to another whole Max. Each one of these "Fusion" connectors is overhead in terms of die space.
If the Max is largely being "sectioned' because it is getting 'too large' , then more 'overhead usage' may not be on the top of the list to be done.

Also once again possibly in space where the laptop version is carrying 'dead weight' function. Apple could make another dual sized connector chiplet for desktop , but if making another chiplet why not just make it larger and keep the same one connector.

I have some doubts that second connector that spans two larger Mac packages is going to be same tech.

Certainly for the 1x CPU and 3x GPU. That probably busts the reticle limit (with Max variant GPU chiplets).

If instead of using the other edge for a second 'chiplet connector' that space was used for more memory I/O , then the bandwidth (and memory capacity ) would go up over the Max.


I’m not sure I understand you. TSMC says plainly that SoIC (which Apple is calling Fusion Architecture SoC) can still be interconnected via advanced packaging like InFO-LSI (UltraFusion). They (Fusion and UltraFusion) aren’t the same thing. As I understand it, InFO-LSI doesn’t require symmetry — the interconnected units don’t have to be identical — so it seems to me it could be used to do what I’m describing.

InFO-LSI has reticle limits. (about 1x ). TSMC made no big hype about SoIC greatly expanding those limits. If the Max splits because it got 'too big" then there is a pretty good chance two "new fusion" packages will be larger than old Maxes. If your packages , mono or chiplet-fused , get bigger than the system used to 'fuse' these changes. (CoWOs) land . SoIC doesn't replace CoWoS.

SoIC differs from InFO-LSI in that the individual 'bumps'/'connection points' used to link dies is substantially smaller.

SoC-Exceptional-scalability.png


Those smaller bonding pitch means it takes up less space if stacking these vertically. That part does not seem a likely use from Apple's description.

When restricted to the horizontal placement form means that either could implement the UltraFusion connection of
" ... Apple’s innovative UltraFusion uses a silicon interposer that connects the chips across more than 10,000 signals, providing a massive 2.5TB/s of low latency, inter-processor bandwidth — ..."
https://www.apple.com/newsroom/2022...s-most-powerful-chip-for-a-personal-computer/

as a about half as small connector. Or perhaps about same width connector crank that up to 30,000 and 7TB/s. The CPU and GPU count probably isn't getting any smaller so the chip is likely going to be about just as big (and wide).

So not exactly the same but structurally not all that different. It is still a connector and a interposer if doing it horizontal.

There is no 'free lunch'. Doing these smaller and smaller bumps requires more precision placement of dies and the material used to connect the 'bumps'. That means uses equipment with chip drawing precision... which are reticle limited.

If the combined chiplet Max's are more than 1/2 rectile limit then joining two Maxes would likely start to fall into the CoWoS zone of putting them together. If those are bigger bumps then would need a different connector and/or an adapter to bridge the transitions. That can be done but it is going to be more expensive. (Plus the AI data center chips will pay megabuck for CoWoS production slots ... so more 'tax' on that, if can even get a production slot. ).

The Ultra is already a 'fringe' SoC in terms of volume. If it gets substantively more expensive, that probably just gets more 'fringe'.

I think part of reason the Pro and Max share a CPU chiplet is to better share costs. The objective is not to make the most exotically expensive chip possible. It is to make chips that Apple can afford to churn faster ( get to payoff quicker). There is limited "hand me down" system placement for the Pro and Max. And even less for the Ultra. The MBP churning these about every 12-14 months is 'problem'.

Maybe the "Apple server " chip is bound for CoWoS and there some shared subsystems they can use between Ultra and that. That would spread the cost out over more units ( although "AI Server" chip isn't making any direct revenues. ) [ I wouldn't bet on that. I suspect more likely an CPU chiplet that dumps all the desktop I/O. Broadchip for at least networking I/O and maybe some compute , but swapping space to keep the sizes down and stay away from CoWoS sized issues to solve. ]


Longer term the evolution along SoIC is to make the packages more vertical than horizontal. Start folding the system level cache and maybe memory I/O underneath and putting more cores in the same amount space ( instead of doubling off horizontally ). But walk before run on this iteration.
 
Last edited:
  • Like
Reactions: Basic75
 
Have we gotten any cache size information yet? I'm concerned about going from 12 M4 P cores to this odd mishmash. Not hypothetically, I have a machine to purchase soon.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.