Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
The Blender render score for the 76 core M2 Ultra is 3,220.
The Blender render score for the Nvidia 4090 is 11,200.


And the score for the M3 Max is 3485.6 points. Look at where the puck is going, not where it is.

Will M4 Max reach parity with 4090? No way in hell. There is a huge difference in power consumption, Apple is not going to close that gap with smart technology. They could however improve their GPU performance by 30-50% with just a few simple architectural tweaks. The foundation is already there.
 
I will say that - Yes; it makes sense for Apple to scrap M3 Ultra as being used M3 Maxes.
What I don't agree with is that the M3 Ultra chip will be Monolithic. I still think it will use UltraFusion - even if it's a newer version.
I personally could see two 16 P Core and 64 GPU Core Dies being connected for M3 Ultra.
 
Given that the m4 generation will likely have big modifications to the neural engine to support AI, and likely thunderbolt 5 /usb4v2, I would be very wary of an m3 ultra chip if it doesn’t have those features. In fact, I would skip it.

"Given that iPads will likely have OLED, I would be very wary of any MacBook if it doesn't have that. In fact, I would skip it." 😐

Apple isn't going to iterate on everything in one go.
 
(I'm not a chip designer.) I wonder if the M3 Ultra could have the ultra connection on both sides allowing a 3+ configuration.

Three equally sized 'max-like' sized , relatively large dies ( ~400mm^2 ) , probably not.

The TSMC packaging technology that Apple uses (InFo-LSI) caps out at about reticle limits ( ~800). 3x 400 is likely too big. [ There is a another packaging technology at TSMC of CoWoS but all of that capacity is gone. The AI mania has consumed it all. And skeptical that Apple would get into a esculating price bidding 'war' to try to wrestle capacity away from others with lots of even fatter profits margins to throw at it. Also even more skeptical that Apple is trying to make most expensive solution possible ( as the RAM/SSD capacity charges are already quite high). ]

Apple could do something like decompose the I/O off the die and have more compute focused. Really do chiplets and decompose into different sizes. The 'problem' with the M1-M3 Max is that the 'top' is full of I/O functionality which pragmatically needs to be on the edge of the die. If subsitute in UltraFusion where did the I/O go?


[ I/O die ] -- PCI-e , Thunderbolt , USB , possible display controllers. etc. ( Ultra Fusion only one side)
[ Compute die ] -- P core clusters , E core clusters , NPU , GPU , AV fixed function and Memory controllers (and system cache). ( UltraFusion two sides ; 'top' / 'bottom' )

The Compute die (CD) would be incrementally smaller than a Max ( minimally prunning off I/O . Can also prune off some GPU clusters if want even smaller (e.g., old M1 Pro vs old M1 Max) . The I/O would be a lot smaller and doesn't necessarily have to bleeding edge fab process ( but to minimal gap between laptop version of Max can keep the same. )



Then could do something like

[ I/O die ]
[ CD ] == A 'Desktop' Max-like. Consistent 6 Thunderbolt ports on all Mac Studios.
[ I/O die ]

[ I/O die ]
[ CD ] == A 'Desktop' Ultra.
[ CD ]
[ I/O die ]


[ I/O die ]
[ CD 1 ]
[ CD 2 ] == A 'Desktop' More-than-Ultra ( but likely pushing the limits of Info-LSI )
[ CD 3 ]
[ I/O die ]


The distance between the middle Compute die and I/O dies is longer ( more latency) , but I/O off the package is going to have higher latency anyway (relatively it is already much slower than the intra compute-compute traffic) . Also not scaling up I/O past two, so the intra-chip-data-network bandwidth pressure isn't changing between configurations.
Probably will loose a some traction with the gap between CD 1 and CD2 . If the CD's are 'as big a possible' even more so. If a bit smaller than a Max then a bit less so.
[ Apple could manage this by only turning on the GPU clusters in adjoining pairs (e.g., only CD1-2 or CD2-3 ) when workload isn't excessively 'embarrassingly parallel'. ]

But adding another compute die to the 'layer cake' would likely start to introduce latency problems.

[CD 1 ]
[CD 2 ]
[CD 3 ]
[CD 4 ]

CD 1 and 4 are relatively much farther apart (e.g., their respective pools of memory are farther away, so getting data takes longer and there is more traffic clogging the intra 'data highway'). Also pushing the I/O dies further away from 2 and 3. Scaling in only one direction means things get farther and farther apart. It also get more and more expensive to make with LSI interconnects.

The M3 Pro is a bit 'dialed back' in terms of memory bandwidth so if the desktop Max-lite came in a bit lower than laptop Max that probably wouldn't impact putting a gap between Mini M3 Pro and Studio Desktop Max.

Costs amortized over all of Mac Studio and Mac Pro sales would spread out the costs. A relatively large 'fork' off the Max just for 'Ultra Studios' and 'Mac Pros' is likely dubiously low volume. Apple could make some exotic , even more expensive stuff than the above ... but then who is going to buy it? Not sure why Apple would want to scale past 3 if looking at return on investment. Pretty good chance it isn't there.
 
  • Like
Reactions: Chuckeee


Apple's M3 Ultra chip may be designed as its own, standalone chip, rather than be made up of two M3 Max dies, according to a plausible new theory.

apple-silicon-1-feature.jpg

The theory comes from Max Tech's Vadim Yuryev, who outlined his thinking in a post on X earlier today. Citing a post from @techanalye1 which suggests the M3 Max chip no longer features the UltraFusion interconnect, Yuryev postulated that the as-yet-unreleased "M3 Ultra" chip will not be able to comprise two Max chips in a single package. This means that the M3 Ultra is likely to be a standalone chip for the first time.

This would enable Apple to make specific customizations to the M3 Ultra to make it more suitable for intense workflows. For example, the company could omit efficiency cores entirely in favor of an all-performance core design, as well as add even more GPU cores. At minimum, a single M3 Ultra chip designed in this way would be almost certain to offer better performance scaling than the M2 Ultra did compared to the M2 Max, since there would no longer be efficiency losses over the UltraFusion interconnect.

Furthermore, Yuryev speculated that the M3 Ultra could feature its own UltraFusion interconnect, allowing two M3 Ultra dies to be combined in a single package for double the performance in a hypothetical "M3 Extreme" chip. This would enable superior performance scaling compared to packaging four M3 Max dies and open the possibility of even higher amounts of unified memory.

Little is currently known about the M3 Ultra chip, but a report in January suggested that it will be fabricated using TSMC's N3E node, just like the A18 chip that is expected to debut in the iPhone 16 lineup later in the year. This means it would be Apple's first N3E chip. The M3 Ultra is rumored to launch in a refreshed Mac Studio model in mid-2024.

Article Link: M3 Max Chip Has Hidden Change That Could Affect Future 'M3 Ultra' Chip
Maybe Apple has found a way to not wates money in an interconnect that will never be used. Only the chips used to build the "Max" will have it. After all they will be built using an entirely new process.

Or maybe there will never be an M3 Max

Or maybe maybe they will use some other interconnect technology. After all the huge server farms use Ethernet to connect computers that are not in the same chassis. Perhaps Apple will use,(I don't know) perhaps PCIe as a chip interconnect. Yers the chips would be be some tightly coupled and each would need it's own local copy of MacOS. Butwith a bus-like interconnect you can connect dozens or even thousands of computers

However it is a general rules that physically shorter interconnects run faster. If it is on-die it can be very fast

The best solution for people usingMacs to do things like render video is with a server farm. They just need to rewrite FCPto do that. Then people who need tons of computing power can simply buy a dozen Mac Studios and a big 10Gbe switch.
 
Not to mention the inherent security flaw (GoFetch) in all M series chips. The M3-only mitigation is not sufficient because it requires developers to opt-in and change their code when performing specific operations, and Apple didn't even disclose this functionality existed until last week. Javascript can execute an attack that steals data, I am not satisfied that Apple is only disabling the Prefetcher when performing encryption operations (which only works on M3 chips), and also that it doesn't (so far?) seem like a bug bounty was paid out for this is which is incredible given how significant the flaw is.

Unless the Ultra has some logic redesign specifically addressing this I would absolutely not spend a ton of money on a decked-out pro machine which was my plan. I'll either hobble along with my old Mac for another year or get a base level M3 Studio and replace it as soon as M4 ones are available.

I completely agree about your points also, the neural engine is likely to be massively upgraded in M4.

Edit: since people are challenging this (which is understandable, it is a complicated issue) I'll attach the conclusion of the research work. You can also see my follow-up posts in this thread for a bit more vague background. The most severe issue is getting security keys, which is what news coverage has focused on rightly because it is the biggest problem, but the scope of the flaw is not limited to that.

This part in particular, which is not in the screenshot below but is on page 1 of the abstract elucidates my concern:

"Undergirding our attacks is a new understanding of how DMPs behave which shows, among other things, that the Apple DMP will activate on behalf of any victim program and attempt to “leak” any cached data that resembles a pointer."
They only tested this to 3rd party security libraries and not Apple 1st party security library, that should be a sign 🤪
 
Exactly what I thought because it doesn’t have the fusion thing.

I also predicted Apple will skip M3 Ultra because of this.
 
The M3 Ultra, if not taking advantage of the current economies of scale that means producing a solution that’s guaranteed to sell less than 200,000 units a year cost effective, will be quite expensive indeed!
 
If the current M3 die doesn't include Ultrafusion, it would cost an extra $1B to tape out a special M3 Ultra and pass onto (relatively) few customers.

Developing a new M3 Ultra with on N3B (the current TSMC fab process) with far more area dedicated for Neural seems Unlikely with its 55% yield (ie Rather LOW compared to the 80+ yield of N3E).

Developing a new M3 Ultra with on N3E would also be 'odd' because it would require $$$ to be spent on new design rules (N3E is not rule compatible with N3B) - and would be the only M3 on N3E .. With the iPhone with A18 core coming out only FOUR months later.

Random Observation : "M3 Max [Nov'23] outperforming the M2 Ultra [Jun'23]" means that for 7 months of the year the ULTRA Desktop would have been SLOWER than the top Laptop. Is this a bit of a bitter pill for the premium users to keep down?

Perhaps what Apple will do this year is FIX this anomaly - by releasing the "M4 Ultra" in Nov'24 on TSMC N3E. And then release the ULTRA every 2 years in Q4.

AJ
 

The theory comes from Max Tech's Vadim Yuryev, who outlined his thinking in a post on X earlier today. Citing a post from @techanalye1 which suggests the M3 Max chip no longer features the UltraFusion interconnect, Yuryev postulated that the as-yet-unreleased "M3 Ultra" chip will not be able to comprise two Max chips in a single package. This means that the M3 Ultra is likely to be a standalone chip for the first time.



This would enable Apple to make specific customizations to the M3 Ultra to make it more suitable for intense workflows. For example, the company could omit efficiency cores entirely in favor of an all-performance core design, as well as add even more GPU cores.

That 'theory' doesn't make much sense. If traded out the die space for the two UltraFusions connectors then spae the E cores are soaking up matters even less. You are not going to fit substantially more P core clusters (with required associated Cache) or GPU clusters ( again with required associated cache ) into of that "connector space" you just 'saved'. (both size/area and shape of the space). To get those you'd have to make the chip bigger. However, if already made it 'Bigger' by subsuming another whole Max ( with I/O, storage , memory controllers , systems cache and what not ) it is already bigger.

If the twitter pics of the M1-3 Max dies are all to the same scale the M3 size > M2 size > M1 size . The basic Max has been on a bloat path. In that context the UltraFusion going away was no "savings" in size ... it got 'eaten' by the computational blob of the Max (that the process shrink ( N5P -> N3 ) didn't save it at all. ) .

The M1 Max started off as a "not so great chiplet" ... the M2 Max got bigger sliding even deeper into that zone. So if the M3 Max is even bigger ... deeper still into the dubious chiplet zone.

At minimum, a single M3 Ultra chip designed in this way would be almost certain to offer better performance scaling than the M2 Ultra did compared to the M2 Max, since there would no longer be efficiency losses over the UltraFusion interconnect.

That "better performance' is doubtful . The UltraFusion connector basically joins the intra chip mesh of the two Max's together into a bigger mesh. If having problems because the remote/'other die' memory controller is too far away from the compute cluster, then it will be just about as far way on much bigger, reticle limit, sized die.

UltraFusion probably uses incrementally more power (which is a thermal thing to manage) , but it is really not introduce any more creditable difference making distance. ( It is a 3D connection where the bridge is beneath the two dies so you can't really make the two dies any closer. )


Dumping UltraFusion would increase Perf/Watt. Which Apple has generally chased with the M-series. It could turn out cheaper depending upon the tradeoffs between packaging costs/availability and lower yields on wafers. Smaller chips are cheaper to make , but cobbling back into a monolithic costs money.


Furthermore, Yuryev speculated that the M3 Ultra could feature its own UltraFusion interconnect, allowing two M3 Ultra dies to be combined in a single package for double the performance in a hypothetical "M3 Extreme" chip. This would enable superior performance scaling compared to packaging four M3 Max dies and open the possibility of even higher amounts of unified memory.

Pick a side ... UltraFusion is significant barrier to scale , but even bigger UltraFusion connection won't be.
Two recticle sized chips will have components even farther away from each other than The less than recticle M1 Ultra did.

UltraFusion uses the 'short' edge of the Max die. The 'long edge' is primarily used by the "Poor Man's HBM" memory subsystem. The major problem with an "even bigger" UltraFusion connector is that if get to the point now "need" the long edge then they would need to "rob Peter to pay Paul".

I suspect folks might be hand waving at Blackwell B200 , but that package uses real HBM , not 'poor man's HBM" to save die edge space.


Little is currently known about the M3 Ultra chip, but a report in January suggested that it will be fabricated using TSMC's N3E node, just like the A18 chip that is expected to debut in the iPhone 16 lineup later in the year. This means it would be Apple's first N3E chip. The M3 Ultra is rumored to launch in a refreshed Mac Studio model in mid-2024.


N3E would be even more dubious if shifting of E-cores and massive uplift of P/GPU cores . Just doing the same current cache sizes on N3E will be bigger since it backslides to N5P sizes. N3E exchanges increased die space consumption for cheaper fab costs. So an M3 Max that got bigger even with a N3B shrink would get even larger on N3E. There is a path to a "monolithic" Ultra chip there but likely would come in at less than 2x a N3B Max; not more than doubling. The push could be that 'fell out' of scope for InFO-LSI because even a double Max was too big. Monolithic N3E would be at least as much "cheaper to make" as it was "performance".

N3E just for a relatively hyper low volume "Ultra only" package just doesn't make much sense. Have to redesign for the design library ( N3B not compatible with N3E) and then not make very many of them. That is alot for a die that has been completely decoupled from Max unit volumes.

The iPhones have volume to pay for a respin for themselves. In part because not just iPhones; those A18 will be 'hand me down" into iPads and other products. Throwing a boat anchor (no 'hand me down" system targets), fringe Mac chip on top of that is kind of playing with fire.
 
"Given that iPads will likely have OLED, I would be very wary of any MacBook if it doesn't have that. In fact, I would skip it." 😐

Apple isn't going to iterate on everything in one go.
No. Look at the competitive landscape. Qualcomm Snapdragon Elite X is coming, bringing an ARM powerhouse to PC laptops for the first time. It will have a pretty robust NPU. 40+ TOPS. Same with the upcoming Zen 5. And at some point Intel will probably step its game up in the NPU arena, likely not with Arrow Lake, but with the next lake.

Which means that Apple is at risk of being perceived as falling behind in the AI arena. They have shareholders to answer to. We also know Apple is going to make a big push on AI at WWDC. So to me, this is a strong clue that the NPU in Apple silicon will get an upgrade. Likely sooner rather than later.

We also know thunderbolt 5 has been announced (Intel Barlow ridge) and at some point Intel will embed a TB5 controller into its CPU.

I believe it is highly likely the m4 generation will introduce those features for competitive reasons, with a slight chance that the m3 ultra/extreme will also have those features as well. But if it doesn’t have them, by time m4 ultra rolls around, it should have them then. So proceed with caution. Because remember, Apple Silicon is not upgradable.
 
Last edited:
I’ve been expecting the M3 Ultra to be a dedicated chip. All the M3 chips are a unique design so it makes sense that would carry over to the Ultra.

I think everything else mentioned is wild speculation. I don’t expect a chip higher than Ultra, and I don’t expect the Ultra chip to do something dumb like leave out all the efficiency cores.
 
M1 max 32 gpu cores......M2 max 38 gpu cores.....M3 max 40 gpu cores. With the with max version it seems like the GPU core count differences are getting smaller with each revision. Fair to think the M4 max will only have 41 cores or be the same as M3max? And apple will advertise as faster speed marketing? Running out of silicone space?
 
which windows+nvidia RTX GPU is just miles ahead.
it‘s Linux + nvidia RTX or pro acceleration cards > $20K H100 and the new Blackwell (OMG look at blackwell). Also in the 2D areas, motion animation, image processing - AI gets more and more important. So today I use the Mac to open a remote shell to my Linux workstation.

But look at the fireworks Nvidia is setting off in many key areas, while Apple adds a new camera and new color options to the next iPhone ...

The M series is great. But many top level chip designers left Apple and the M series design has come to an end.
 
And the score for the M3 Max is 3485.6 points. Look at where the puck is going, not where it is.

Will M4 Max reach parity with 4090? No way in hell. There is a huge difference in power consumption, Apple is not going to close that gap with smart technology. They could however improve their GPU performance by 30-50% with just a few simple architectural tweaks. The foundation is already there.

Since M2 Ultra has 2x higher result than M2 Max it means M3 Ultra would be 2x faster than M3 Max in Blender with a score of 6383. That's more than half the 4090. M3 Extreme with 2 M3 Ultra would be faster than 4090 with 12767.
 
If the current M3 die doesn't include Ultrafusion, it would cost an extra $1B to tape out a special M3 Ultra and pass onto (relatively) few customers.

It isn't $1B. And not necessarily few if roll it out to the whole Studio+MacPro line up. Cofined to just > $4K products there is a different unit volume than if drop down to $2K.


Developing a new M3 Ultra with on N3B (the current TSMC fab process) with far more area dedicated for Neural seems Unlikely with its 55% yield (ie Rather LOW compared to the 80+ yield of N3E).

N3B based on really old reports doesn't shed much like. The current gap isn't that large. And if concerned about costs re-spin to N3E would add costs also ( pragmatically reducing the supposed gap costs difference).
The Ultra already gets "far more area" for Neural because it would double the number of cores ( and aggreate bandwidth) over the Max. The Neural cores already got an uplift with M3 generation. "more gas on the fire sooner" isn't going to do anything substantially better.









Developing a new M3 Ultra with on N3E would also be 'odd' because it would require $$$ to be spent on new design rules (N3E is not rule compatible with N3B) - and would be the only M3 on N3E .. With the iPhone with A18 core coming out only FOUR months later.

The premise is the A18 would have N3E too. For the "iPhone" it isn't just the iPhone coming out in 4 months. First, the iPhone will be sold for several years. Second, the A18 will be dribbled out into more products over time ( iPad , perhaps AppleTV , etc. ). The "Ultra" for generation 'n' typically dies when the Studio/MP is replaced. The lifecycle is shorter. In short, the A18 has more than several years to get the money back at several millions per year rate rate the entire time.





Random Observation : "M3 Max [Nov'23] outperforming the M2 Ultra [Jun'23]" means that for 7 months of the year the ULTRA Desktop would have been SLOWER than the top Laptop. Is this a bit of a bitter pill for the premium users to keep down?

That is for corner cases. It isn't generally true ( e.g., CPU focused workloads).

Perhaps what Apple will do this year is FIX this anomaly - by releasing the "M4 Ultra" in Nov'24 on TSMC N3E. And then release the ULTRA every 2 years in Q4.

Likely delusional that the Ultra is going to run at the same iteration frequency that the plain Mn SoCs will. The Ultra have no "hand me down" systems to go to which likely means they will stay in products longer. The plain Mn can be handed down to the iPad line up and sold for years longer. Those chips have a much easier time recovering costs because have a much longer "new product" service life . The larger the chip/package the more costs there are to recover. An technically completely arbitrary 12 month cycles makes no sense there. It is a dual edge sword of Apple silicon having one, and only one, 'customer' (Apple products). There are no other system vendors to sell the Ultra to when the lead Mac product dumps it.
 
  • Like
Reactions: Antony Newman
ultra. extreme, max …. What matters today is the number of ML/Cuda Cores.

If you want to know what innovation looks like:
Oh please. You would be the first to say "Apple has strayed from its core technology and has no business making robots. Massive fail! And it should be ⅓ the price anyhow!
 
I'm skeptical. The M3 Max die is already huge! They're going to have a 2× bigger single die for M3 Ultra? That seems crazy.

There are a number of options that could pan out.
* The interconnect is there but not clearly visible in this blurry image.

It is an off chip connector ( which need bigger elements ; not smaller ones) so that possiblity it isn't visible when other smaller elements are is pretty low. It isn't that fuzzy. Collectively, UltraFusion is large.

" .. Apple’s innovative UltraFusion uses a silicon interposer that connects the chips across more than 10,000 signals, ..."

There are more than several thousand connectors there. Even a very small size multiplied by 10,0000 isn't going to be a small number anymore. And if Apple went to 15K , 25K , 35K it would be even harder to 'hide'.


Is it "photoshopped" ( like Apple die shots of Max previous prior to the Ultra reveal)? Maybe, but an bigger problem is that (if to same scale) the M3 Max area > M2 area > M1 area. The M3 is big. Doubling up an even bigger die at the same cost effectiveness is problematical.

* There is a different variation of the chip that has the interconnect that we haven't seen yet.

Part of the problem here is the photos are all the same relative resolution is that the M3 Max has to 'eat' the area occupied by the connector to hold the "more stuff" that comes with the M3 generation. If had to tack UltraFusion on top of that the die would get even bigger.

If Apple is aiming to keep this inside of a reticle limit size then likely would need a new die that had "less stuff" so that could put the interconnect back in.



* M3 Ultra will be significantly scaled back from what we expected based on M1/M2 Ultra, being a more modest upgrade over M3 Max.

Yeah if die area is being used up faster than the process changes N5P-> N3B is giving you savings in space then ... yeah have a bloat problem if going to use exacty the same baseline design criteria.

However, the primary competition of the M3 Ultra is to the M1 Ultra ( and previous Intel models), not the M3 Max. People are replacing whole systems , not the SoCs. M3 generation NPU and GPU cores got significant improvements. Folks who need "more than Max" performance whether that is 30% or 80% more both of those are still more. More of an issue of whether Apple prices the "more" correctly to line up with the customer's "pay more to get more" valuation.


* M3 Ultra isn't even going to make an appearance this generation.

Even if the M3 Ultra was going to show up, it makes increasingly less sense to attach is utterly useless UltraFusion connector to the Max chips that are heading for MBP where it will never be used. As the wafer costs go higher and higher each generation, Apple is throwing away more and more space. If the Max MBP's are selling at a much higher than expected it makes even less sense.

It isn't a large area on an individual chip but if mulitple it by 10 , 20 , 40 million in aggregate it adds up.
 
Folks who render for a living don't care about power consumption.
They care about speed.
Let’s extrapolate that forward a few generations. Unless the GPU industry puts real effort into breakthroughs in per/watt, people are going to have to upgrade their home circuit breakers to handle the load.

Meanwhile, Apple has a massive amount of headroom to dial up the power, but they’re a long-term thinking company and have spent years focusing on making things as efficient as possible.

NVIDIA/AMD/etc are going to hit a ceiling for professional users in the not too distant future unless people are going to be willing to install server grade power systems…
 
Hasn’t TSCM been working on their own interposer “fabric”? Who’s to say Apple hasn’t adopted theirs because it makes the fab process easier?

 
  • Like
Reactions: Antony Newman
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.