Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Any diagrams or details on how the M3 Ultra is made? They must have changed the original design of the M3 Max somehow because even two M3 max added together would not add up to 512 GB RAM.

Apple uses semi-custom RAM packages in all the variants. Just stack the RAM twice as high and use ranking so that the 'higher' stack uses same memory channel as the lower stack. ( can only talk to half at a one instance. ) . Like when older Mac Pro had 3 memory controllers and 4 DIMM slots or 4 memory controllers and 8 DIMM slots. Only it isn't standards based packaging.

The possible negative side effect is that the System Level Cache doesn't get any bigger if do all of this externally. So the percentage of total RAM being covered by the cache goes down. That is true even if don'g "double stack" the memory ( 96GB would get a higher percentage than 128GB or 256GB. )

And somehow they added Thunderbolt 5 to the machine (M3 Max chip only did Thunderbolt 4) - either on the die OR they had to use a separate controller chip off the die to add Thunderbolt 5.

Extremely unlikely for the 'external, discrete controller'. TBv5 could have been 'broken' or uncertified in M3 Max.
You have to be certified by Intel to use the name 'Thunderbolt'. Or an unfinished version run at TBv4 which didn't have a few sub-elements working correctly for v5.

However, some early die analysis indicated that there was no UltraFusion subsystem on earlier M3 Max dies at all. In that case the most likely solution is that this is really a Max+ die ( version 1.2; same overall general die with some modifications and narrow tweaks). Thunderbolt 5 could have been added along with the UltraFusion connector.

The timing here seems to indicate that the Max+ dies didn't start until approximately the plain/laptop version production wound down to a halt.


I am just curious the extent of redesign needed beyond just fusing two M3 Max together.

If they are going to eventually use the M3 Max+ in a Mac Pro there probably should be improvements in the PCI-e backhaul they provision out to the external switch. Either PCI-e v5 or just symmetric two x16 PCI-e v4 , but something to match the slots a more distinctive 'value add' to help differentiate the product.
 
  • Like
Reactions: nowakj66 and EugW
That’s a pretty meager increase in a benchmark, however it’s not a real-world task. I would expect to see bigger improvements in the M3 Ultra’s actual performance in real applications people use, like say 50% faster than M4 Max, in some video & 3D rendering times, gaming, AI, etc. Not twice as fast, but definitely a lot more than a measly 10% faster.

But there’s no denying the M4 cores are better than M3 cores, and an M4 Ultra would be an even better performer than this.
 
That’s a pretty meager increase in a benchmark, however it’s not a real-world task. I would expect to see bigger improvements in the M3 Ultra’s actual performance in real applications people use, like say 50% faster than M4 Max, in some video & 3D rendering times, gaming, AI, etc. Not twice as fast, but definitely a lot more than a measly 10% faster.

But there’s no denying the M4 cores are better than M3 cores, and an M4 Ultra would be an even better performer than this.
Others have said that Geekbench 6 multi-core CPU is actually fairly representative of multi-core performance in many mainstream workloads, since most software is not optimized to take proper advantage of 32 CPU cores. Those who can properly utilize those 32 CPU cores are a small minority.

Probably a bigger group of customers that can benefit from the M3 Ultra are those who need one or all of the below:

- Double the number of media encoders
- Very high memory bandwidth
- A bazillion GPU cores
- Tons of RAM
 
Others have said that Geekbench 6 multi-core CPU is actually fairly representative of multi-core performance in many mainstream workloads, since most software is not optimized to take proper advantage of 32 CPU cores. Those who can properly utilize those 32 CPU cores are a small minority.

Probably a bigger group of customers that can benefit from the M3 Ultra are those who need one or all of the below:

- Double the number of media encoders
- Very high memory bandwidth
- A bazillion GPU cores
- Tons of RAM
Yeah I should have been more clear, I wasn’t talking about the mainstream, I was talking about the minority. M4 Max will be a better choice for “mainstream” applications due to fastest single core etc.

I meant to say the M3 Ultra will be a lot more than 10% faster in “the real (niche, pro) applications that people (who really need an Ultra) actually use.” We’re not sitting around running Geekbench all day :) And we’re not paying 2X the cash for 10% more performance either!
 
Yeah, this is why I ended up selling the M1 Ultra and then the M2 Ultra, and I won't be buying any more Ultras. The next gen Max chip kept coming out soon after. So really this comes down to if you need more memory, higher memory bandwidth or more GPU cores. Software engineers should just stick with the M4 Max unless they have a specific use case for the extra AI prowess.
Yep exactly my disappointment with Apple. Pretty shameful how they can mess up their pro line this bad.
 
  • Like
Reactions: ruka21
Anecdotal results shouldn't be trusted.
This has already been proven with M1 and M2 Ultras. The M-next Max is very close and sometimes outright beating the Ultras. Seems like the flagship is not the top end which is a giant shame.
 
Because that’s not how professional chip releases work. Not for Apple, and not for Intel. I’m less sure about others. The reason is that it takes more work and more engineering to get these mega chips ready.
Intel never had their N+1 laptop chips beat the top of the line N desktop chips.
 
1x or 100% = the same speed.
1.5x = 50% faster.
1.5 faster = 150% faster.
I am aware that we are splitting hairs here, but nobody would use the statement "1x faster" to mean that something is the same speed as before.
The use of "faster" after a multiplication factor is clunky, and IMO the best language is the one actually used by Apple in their announcements, i.e. "up to 1.5x the performance of ..."
 
  • Like
Reactions: jido and CalfCanuck
The best M4 Max configuration of the Mac Studio (16-core CPU. 40-core GPU, 128GB memory and 8Tb SSD) is the same configuration as I have in my M4Max 16" Mac Book Pro which also has the nano screen.

So Apple's best M4 Max configuration laptop and the best M4 Mac configuration Mac Studio should have the same performance numbers, right?

In a couple of weeks we should have some real numbers on both Mac Studio configurations.
 
  • Like
Reactions: Riot Nrrrd
The best M4 Max configuration of the Mac Studio (16-core CPU. 40-core GPU, 128GB memory and 8Tb SSD) is the same configuration as I have in my M4Max 16" Mac Book Pro which also has the nano screen.

So Apple's best M4 Max configuration laptop and the best M4 Mac configuration Mac Studio should have the same performance numbers, right?

In a couple of weeks we should have some real numbers on both Mac Studio configurations.
The Studio is supposed to slightly out perform, due to thermal and power headroom, and perhaps less internal hardware to run especially without the MBP screen. But this only happens under the heaviest stress for prolonged period of time.
 
To be fair Intel Xeons were always a few generations behind the desktop chips. What made them faster were the extra cores. In this case the M4 is just that much faster than the M3 across the board. So the multi core Geekbench score isn’t all that much faster. Some applications will make better use of more cores however to distribute the load.
Not with Nehalem and Westmere. Only post 2010 Intel flipped and stopped releasing at the high end first. This also decimated the HEDT market, the server folks were somewhat content to wait.


More cores are good for very specific workloads but the 20% single core improvement makes it a real conundrum for a lot of workloads, particularly Audio where single core speed matters a lot so you don’t get audio dropouts which can still happen with a ton of oversampling. Add in the GPU being so much weaker vs. M4 for the 3D workloads …it’s a strange release.

Intel never had their N+1 laptop chips beat the top of the line N desktop chips.
Pentium M (Dothan) did, although it wasn’t really N+1 per se since it had a different (and much better) architecture. It was the entire genesis for the ‘Core’ series and tick/tock cycle Intel used for the following 15 years.
 
Last edited:
What's the min RAM for that?
It is going to take the full 512 as the next option down is 256 and the model itself is 404 GB. In practice, the whole model needs to fit in memory with some headroom. It's just unbelievable that we're gonna be able to run these huge models, natively and securely at really high settings on a stock home computer. I don't think that has sunk in for a lot of people yet. This entirely changes the optimal hardware narrative for smaller users from Nvidia to Apple.
 
It is going to take the full 512 as the next option down is 256 and the model itself is 404 GB. In practice, the whole model needs to fit in memory with some headroom. It's just unbelievable that we're gonna be able to run these huge models, natively and securely at really high settings on a stock home computer. I don't think that has sunk in for a lot of people yet. This entirely changes the optimal hardware narrative for smaller users from Nvidia to Apple.
I see where you’re coming from, but that “stock home computer” still costs almost US$10000.

I guess that’s cheap for that one specific niche usage but not exactly common for a so-called “stock home computer”.
 
I see where you’re coming from, but that “stock home computer” still costs almost US$10000.
Which is peanuts for any company that needs privacy with their data. Even for freelancers is not that expensive and you could easily pay it off with one or two jobs.
 
  • Like
Reactions: 8KYUP
  • Like
Reactions: pksv
That seems to max out at 128 GB for Apple Silicon. Needs to be updated I guess.


That is my main point. That is a workstation, not some typical "stock home computer".
Absolutely a game changer, solely for that one 512GB option.
 
It is going to take the full 512 as the next option down is 256 and the model itself is 404 GB. In practice, the whole model needs to fit in memory with some headroom. It's just unbelievable that we're gonna be able to run these huge models, natively and securely at really high settings on a stock home computer. I don't think that has sunk in for a lot of people yet. This entirely changes the optimal hardware narrative for smaller users from Nvidia to Apple.
Yeah I love Private LLM and have become so hungry for RAM. Would love to run the full DeepSeek R1. That spec makes it still too steep for 99% of people. Excited for the quality coming with newer more efficient Open models rolling out in the next months.
 
Seems like the only use for the Ultra is for GPU performance.

If only there was a way you could have one CPU and a different GPU... almost like there was a reason desktops have separate components.
 
Seems like the only use for the Ultra is for GPU performance.

If only there was a way you could have one CPU and a different GPU... almost like there was a reason desktops have separate components.
Yes, but if you consider that an Nvidia 5090 (for example) has 32 gig of VRAM you would need 16 of those $4000 cards to run the current breed of full-size large language models and it cannot be done to any reasonable level on system memory. You can do it extremely well on a single M3 ultra with 512 gig of integrated RAM. That is the play here and it is epic!
 
  • Like
Reactions: Earendil
Maybe for not that specific quote, but Apple often writes say "2X faster" when it means "1X faster".

You say they do this “often”, but I’ve never seen it.
Are you able to provide a link?
 
the odd thing is that it's nowhere near twice as fast as an m3 max. Any explanation for that?

Possible explanations off the top of my head include:
- This particular benchmark doesn't cope well with that many cores/threads (e.g. too much thread contention).
- Potential overhead/bottlenecks in the benchmark and/or chip (e.g. limited by memory bandwidth? cpu cache? etc).
- Good ol' bug in the benchmark or somewhere else in the system, that could be fixed later.

That's why we shouldn't take this one result on one benchmark as fully representative.

A full test with other benchmarks and real world apps (blender, davinci resolve, cinebench, etc) will make the picture clearer.
 
  • Like
Reactions: Earendil
Intel never had their N+1 laptop chips beat the top of the line N desktop chips.

Assuming you’re correct (I’m not going to verify), and jokes about Intel’s ability to create better chips aside, beat them in what?
1. We can’t yet be sure of the speed of the M3 Ultra.
2. The M3 Ultra clearly beats the M4 in every single area but *maybe* single thread, based purely on the stats. For example, early tests show a 38% better GPU score compared to the M4 Max. A more clear picture is coming though.
3. You should never buy an Ultra chip if all you need is single thread, as you’ll be paying a gob of money for high powered features you don’t need.

The POINT of an Ultra is multi thread (CPU and GPU), and access to more ram.
El Capitan has a relatively poor single thread performance, doesn’t mean it isn’t a super computer ;-)
 


The first alleged benchmark result for Apple's new M3 Ultra chip has surfaced in the Geekbench 6 database tonight, allowing for more performance comparisons. The high-end chip is available in the new Mac Studio, introduced earlier this week.

M4-Max-and-M3-Ultra.jpg

Apple said the M3 Ultra chip is the "highest-performing chip it has ever created," and the unverified benchmark result seems to confirm that. In the single result, the 32-core M3 Ultra chip achieved a multi-core CPU score of 27,749, which makes it around 8% faster than the 16-core M4 Max chip that previously held the performance record. The result also reveals that the M3 Ultra chip is up to 30% faster than the 24-core M2 Ultra chip.

As expected, the M4 Max chip tops the M3 Ultra chip in terms of single-core CPU performance by nearly 20%, according to the result. This is due in part to the M4 Max chip being manufactured with TSMC's second-generation 3nm process, whereas the M3 Ultra is likely based on TSMC's first-generation 3nm process.

We now await additional M3 Ultra benchmark results to see if these scores are accurate, as they seem to be on the lower side compared to what was expected. For example, Apple advertised the M3 Ultra chip as being up to 1.5x faster than the M2 Ultra chip, so that 30% increase mentioned above should seemingly be closer to the 50% mark. Apple never said how the M3 Ultra chip's performance compares to the M4 Max chip, though.

As always, real-world performance may vary somewhat, but synthetic benchmark tools like Geekbench 6 provide a useful baseline for comparisons.

Watch this space, as we would not be surprised if additional Geekbench 6 results for the M3 Ultra chip end up having higher performance scores.

The benchmark was spotted by @jimmyjames_tech and shared by Vadim Yuryev.

Update: Three more M3 Ultra results have surfaced in the Geekbench 6 database, and the average multi-core CPU score has increased to 28,160. This means the M3 Ultra chip is around 10% faster than the M4 Max chip, up from the original 8% figure. Overall, it looks like the M3 Ultra chip is indeed not much faster than the M4 Max.

Article Link: M3 Ultra Chip is Only 10% Faster Than M4 Max in Benchmark Results
The moral of the story. Save your money and just get the M4 Max!
 
Maybe Apple shouldn’t build “ultra” chips but should find ways to make the most of the Max versions?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.