Not just their first but probably their only computer system ever.For MOST people buying their first computing systems this year? An iPhone would be enough![]()
Not just their first but probably their only computer system ever.For MOST people buying their first computing systems this year? An iPhone would be enough![]()
There are a LOT of apps that are not written to take advantage of Apple Silicon. And, as a result, they will run poorly, even when ported over. Primarily because, being cross platform, the developers don’t want to expend a lot of effort to make things work properly.Question, since I don't know, are these apps written for the Mac or windows based applications ported over to the Mac?
I would be curious on the audio and video production apps, as I haven't heard about those bringing the Mac to it's knees. Interesting.
Those rumors were from folks that weren’t paying attention for the last almost three years now (or that valued social media attention highly).Apple’s been telegraphing what Apple Silicon will be this entire time. There will be a baseline processor, and every more performant tier will differ by number of cores. And, by the time of the Mac Studio, there were still those thinking that whatever the Mac Pro was was NOT going to follow a very clearly set out pattern.
It’s not a knee-jerk move, it’s what it was going to be all along. It’s not supposed to be 1 level below a fully functioning server, it’s supposed the be the fastest Mac someone, that WANTS a Mac, can buy, which also happens to offer PCIe slots as a feature.
Considering that the Mac Pro at it’s HIGHEST yearly unit sales likely never amounts to over one half of 1% of Apple’s yearly revenues, the sales of the Mac Pro, good OR bad, won’t have a material affect on Apple’s bottom line.
Yeah, I didn't miss the Mac Pro bit. Very interesting.
The other thing that grabbed my attention was the 2.5 TB/s (UltraFusion!) interconnect bandwidth (at 27:55). Johny then said that it's "more than 4 times the bandwidth of the leading multi-chip interconnect," which I'd think refers to AMD's Infinity Fabric 3.0 at 400 GB/s bidirectional. Oddly, though, he could have also said "more than 6 times the bandwidth," so I'm not sure what to make of that.
Anyway, the M1 Ultra has 800 GB/s memory bandwidth, 400 GB/s per die. So, 2.5 TB/s die-to-die seems... excessive. I guess there's some kind of cache unification that requires enormous throughput. Or perhaps the interconnect is designed to handle more dies in a different configuration.
Edit: After ruminating a bit, I think that Johny’s “more than 4 times the bandwidth” statement may be a hint at the next-gen interconnect/interposer.
Grabbing a napkin… Since each die can do 2.5 TB/s, and four dies would require six direct interconnects to be fully connected (3 on each die), each interconnect would handle 2.5/3 TB/s = 833 GB/s. Infinity Fabric can do 400 GB/s switched, so 800 GB/s total in a quad setup, and thus an average of 800/4 GB/s = 200 GB/s overall. 4 * 200 < 833 < 5 * 200, QED… Yeah, that’s the best I could massage the numbers.
Edit 2: I looked into Infinity Fabric some more and while it’s not switched, it’s unclear to me what the actual bandwidth would be in different configurations and my numbers above might be correct… ish. It seems like there are a lot of rules for different configurations.
Anyway, I saw a twitter post on March 8 by a Japanese reverse engineering firm TechanaLye indicating that they’d analyzed the UltraFusion region on the M1 Max. Nice die shots, but, sadly, the details would be in one of their paid reports.
Gotcha, Gurman haa no actual sources at TSMC neither the Mac Pro r&d team.
Wrong, but never arranged as a 4 tile squared window, it actually looks as a 4 Chip strip (I prefer to name it a dominoes). It's UltraFusion it's daisy-chainable north/south, m2 max requires memory interconnected at its sides, a 4 tile arrangement to block two channels front each SOC.
It's difficult without exposing the source, but not just UF+ has north/south path, it's bias are not at m2 Max's Edges but close to m2 center, likely m2 extreme/Ultra UltraFusion bridge being more like a carpet where an M2 Max lies with north and south interface for additional CPUs, Even it may resemble Nvidia 4 GPU nvlink arrangement.
A thing that intrigues few engineer with access to the same sources is not just M2 max have inbuilt UltraFusion provisions, m2 pro seems too , maybe not to daisy chain m2 pro but for other added capabilities on later devices as PCIe5 buses Even dGPU, it's hard to guess why m2 pro also include something which seems an lower rank UltraFusion.
Edit: m1 extreme also briefly existed it was based on a quite Long bridge where two soc at each side connected each other, besides expensive it had memory related issues which late doomed it.
From my understanding of the current situation is that much of the software still isn’t coded to actually be able to get data to the GPU fast enough, so the Ultra never gets to actually show its uplift.
There are workflows that do showcase phenomenal performance, but that seems to only be from vendors who have re-architected how their software works.
Apple ran a whole entire session at 2022 about optimizing and scaling GPU code in Applications.
![]()
Scale compute workloads across Apple GPUs - WWDC22 - Videos - Apple Developer
Discover how you can create compute workloads that scale efficiently across Apple GPUs. Learn how to saturate the GPU by improving your...developer.apple.com
It was not just the "chip engineers" that expect app developers to do their jobs well. It is closer to Apple expects developers to do their jobs well. Apple has rolled out more tooling to help with doing optimizations. With the tools and the tutorials it should more tractable for the less lazy to do something at this point. ( at least for the two die "Ultra" class solution. ). Apple expects developers to optimize their apps.
Apple is not particularly likely going to go do power bleeding, triple backward hardware somersaults trying to make badly optimized code run faster. The hardware is there. If the developers are using bubble sort where quick/merge would work better that it isn't Apple's job to 'fix' that.
Is Apple going to get perfect linear scaling with zero code optimizations across 2 and then 4 dies ? Probably not. AMD and Nvidia aren't with monolithic dies either.
Even if Apple made some improvements in "UltraFusion 2" to smooth out some very highly sensitive NUMA characteristics between 2 dies , it would likely pop back up at the 4 die coupling stage. So the "issue" isn't going to completely go away with hardware covering up dubious code assumptions.
CPU code there really isn't a huge problem for well crafted scaling algorithms. (e.g., the NASATruss benchmarks on Apple's Studio marketing page scale. the Adobe stuff doesn't. That is not an surprising at all. Not even in the slightest. Adobe is relatively very slow to optimization the bulk of their code base. That is not an hardware issue in the slightest. )
M2 or folk can use better Xcode tools and tutorials. Can't solve this issue solely with hardware. It time at least as much as hardware.
The M1 Extreme likely would have several other problematical issues besides GPU scaling. Economics ( four largish dies, multiple interposer fusion chips, more expensive packaging. etc. and yet much, much, much lower volumes. ) . Apple probably needs a die that isn't focused on being a MBP 16" chip. ( e.g. 4 TB controllers per die at the stage of a 4 die package is extremely likely at least 8 more TB controllers than you need. ) .
Do a 4 die package with TSMC N3 ( or N4P ) would make lots more sense to manage the overall package size. It M2 isn't bring magic sprinkles but should/could be done with far more appropriate tech that is independent of the microarchitectural issues. Bringing the Extreme back under the 300W zone will help with the operational environment for the package.
Unless Apple had a major addition for PCI-e v4 provision the M1 Extreme was also likely weak in the area PCI-e provisioning for workstation class jobs.
.
I don’t even expect it to double the power of the Mac Studio. The Mac Studio, after all, IS currently the fastest Mac Apple makes and faster than the Macs that came before it. Even if it’s only 20% faster, it’ll be the fastest Mac yet and, for those who want/need the fastest Mac, that’s what they’ll get.
In my mind, the differentiators will be related to RAM, storage, physical port options and other things above/beyond just CPU/GPU performance (maybe more ProRes encoders/decoders, stuff like that). I’m not thinking “how could this beat a Mac Studio”. I’m thinking more like, “Who, specifically, are the very few that need something that the Mac Studio doesn’t offer as options… and how many of that small group are not going to like what Apple presents?” I truly expect that some users that are waiting to see what it is (and have plans to buy it) will not like what they see because it drops some old “Mac Pro” expectation and they may drop macOS orrr… just use their Intel box until it dies. And I believe Apple’s factored in this loss of what can’t be more than a few thousand at this point.
For raw power the RTX4090 is still the king, but don’t forget that Nvidias most powerful card RTX4090 has only 24GB and the RTX8000 has 48GB, if you want to do AI with large models having direct access to 192GB is huge. We know that some of the parallel performance on Apple silicon is stunning like fluid simulation http://hrtapps.com/blogs/20220427/
And the IO when working with video is best in class when running a lot of video simultaneou. Also substance painter can easily eat 24GB of video memory if you use a many layers…
They wouldn’t sell a ton of them because the Mac part is secondary to the far more important NVidia part. And, NVidia parts are ALWAYS going to be cheaper in a Windows box. They’re forced to Windows now because there’s no NVidia on Mac. They’d STILL be forced to Windows due to the Mac prices.BUT... if Apple allowed NVidia RTX or Quadro cards to work in the new Mac Pro... they would sell a ton of them to the kinds of people who need CUDA and other things. Right now those customers are forced into Windows machines.
The Mac Pro only appeals to a small niche market... so why are they making it even smaller?
🤔
Just a matter of convenience built over the years. We would have to move all of the user directories to external disks. Certainly doable, but it will not be a simple Mac-to-Mac transfer, and it will not be possible to do it using the Mac software.why would you need to boot from external enclosure? you should not have any files other than your os on the system drive anyway so even 256GB should be enough to run system fast definitely store files on TB drives. none of this are new problems either? all recent macs aside from the last Mac Pro had these limitations
I think you are correct, I was also unable to find any non-Apple GPU’s that have access to 128 GB of RAM. So, for those folks that have work that REQUIRES that much VRAM, Apple’s the only game in town. Having more VRAM absolutely doesn’t mean it’s faster. Having more VRAM just means it runs in the first place.The bandwidth of M2 Ultra is WAY slower than workstation GPU which is totally meaningless. Having more VRAM doesn't really mean it's faster and there are so many factors to consider. Beside, Apple GPU itself is way slower than RTX 30 series so VRAM doesn't mean better or faster.
OH, actually, just read the post at investopedia which says that “The halo effect is a term for a consumer's favoritism toward a line of products due to positive experiences with other products by this maker.” Which, in that case, it’s likely been the iPhone for quite awhile. Guess I never knew what “halo product” was!Agree, or disagree, Vision Pro is now Apple's halo product. That's their vision of the future.
Having more VRAM is meaningless when bandwidth is much slower, GPU core performance is slow, and consume too less power. It's like having more RAM will give more performance.I think you are correct, I was also unable to find any non-Apple GPU’s that have access to 128 GB of RAM. So, for those folks that have work that REQUIRES that much VRAM, Apple’s the only game in town. Having more VRAM absolutely doesn’t mean it’s faster. Having more VRAM just means it runs in the first place.
Yeah, that post certainly hit the nail on the head!The fastest Mac - it matches the M2 Ultra Mac Studio - it simply has user accessible PCIe ports. That's the only differentiator. Although it'll be $40K cheaper on the top end configuration than the outgoing Intel Xeon based Mac Pro from 2022: https://www.theverge.com/2023/6/5/23750154/apple-m2-ultra-mac-pro-cheaper-intel-mac-pro
Nvidia has a single card with 128 Gigs of RAM on it? I wasn’t able to find it, what’s the part number?Having more VRAM is meaningless when bandwidth is much slower, GPU core performance is slow, and consume too less power. It's like having more RAM will give more performance.
Btw, Nvidia already have 80GB of VRAM and can expand as many as possible which is WAY more than 128GB of RAM. Apple Silicon cant really do that.
I said 80GB of VRAM which is A100. You are ignoring that that Apple Silicon's unified memory works differently. And like I said, PC has way faster bandwidth which already outperforms Apple Silicon and they can just add GPU whatever they want which can go beyond 128GB of VRAM.Nvidia has a single card with 128 Gigs of RAM on it? I wasn’t able to find it, what’s the part number?
Having more VRAM is meaningless if the use case doesn’t require it, certainly! If the use case can be worked with chunks of RAM smaller than 80 (and I’d imagine most Nvidia use cases are written to require far less contiguous RAM than that for obvious reasons), then it would make sense (financial and otherwise) for a user to leverage that solution.
Kinda sad the PCIe slots are only PCIe 4, not 5 or 6.
Gosh, I hope not, nevertheless I have to agree with you, it looks like they have hitched their wagon.Agree, or disagree, Vision Pro is now Apple's halo product. That's their vision of the future.
I said 80GB of VRAM which is A100. You are ignoring that that Apple Silicon's unified memory works differently. And like I said, PC has way faster bandwidth which already outperforms Apple Silicon and they can just add GPU whatever they want which can go beyond 128GB of VRAM.
Since Apple GPU itself has a poor GPU performance, I wouldn't expect too much about it.
Oh, no, I understand fairly well how the unified memory works. For example, the CPU can write a block to memory and the GPU can read the block, without the CPU having to queue a packet of data to shuffle across PCI first. Following that, the GPU can update the block, the CPU can read the result, then write a new value value in that block, have the GPU ready to read that, etc.I said 80GB of VRAM which is A100. You are ignoring that that Apple Silicon's unified memory works differently. And like I said, PC has way faster bandwidth which already outperforms Apple Silicon and they can just add GPU whatever they want which can go beyond 128GB of VRAM.
Since Apple GPU itself has a poor GPU performance, I wouldn't expect too much about it.
No MPX slots means that they are probably not ever planning on supporting third party GPUs even as compute accelerators.
I would say NO
There is no way you can add more RAM especially with Apple Silicon chip.
Thanks, that's a bummer then.I would expect not, and even if you could, it would be orders of magnitude slower than the on-package RAM.
It is possible that Apple looked into offering off-package RAM (via an additional memory controller) and found the performance to not be acceptable or it might have caused some type of issue that made it an undesirable path to follow.
Actually, that is why we run multiple GPUs in our systems.I think you are correct, I was also unable to find any non-Apple GPU’s that have access to 128 GB of RAM. So, for those folks that have work that REQUIRES that much VRAM, Apple’s the only game in town. Having more VRAM absolutely doesn’t mean it’s faster. Having more VRAM just means it runs in the first place.
With NVLink spanning memory load was somewhat possible (e.g., two linked 24GB VRAM GPU cards could manage 48GB of data). However, it was never widely adopted — presumably why SLI and NVLink are now EOL. Multi-GPUs (i.e., cards) are still plenty beneficial nowadays but it’s more of working in parallel (e.g., each renders a different frame or each processes a different simulation).Actually, that is why we run multiple GPUs in our systems.
Even bottom of the product stack 3d software (Poser 13) will use as many GPUs as you can stuff in the case. I'm looking at getting another RTX 3060 to go in mine
No probably not, but I also don’t believe that the new M2 ultra is as fast as let’s say if I would add the latest mpx modules available into my Mac Pro.Do you think your 2019 system’s GPU outperforms the one Apple just released?
In the world of digital content creation, there are dozens, if not hundreds of native Apple apps and plugins that push the hardware to its limits. Anything that simulates inter-particle forces - like fluid dynamics - will quickly show you how powerful you think your CPU/GPU is. I’m not personally into audio, but I’ve seen examples where people have got tons of separate audio tracks layered up with complex effects, all playing concurrently in real-time. Doesn’t take much effort to see how you could easily stress the most powerful of systems. Likewise with video: throw in a dozen video layers in something like After Effects and add visual effects to them, and it very quickly stops being real-time. The very idea that ‘pro’ users can’t completely overwhelm the Mac Studio is beyond ludicrous.Question, since I don't know, are these apps written for the Mac or windows based applications ported over to the Mac?
I would be curious on the audio and video production apps, as I haven't heard about those bringing the Mac to it's knees. Interesting.