Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
It’s not laziness. It’s intelligent product management: Improving the experience for 80% of users by sacrificing the 20%. The integrated chip solution which disables GPU support is what enables significantly better performance for users that don’t need it.


80/20 split is likely an order of magnitude off. This is mainly more about software than hardware. Apple has eliminated dGPUs for all product models from iMac through upper end MBP. By volume shipped, Intel was the dominate mac GPU vendor by far. If you sit in these macrumors forums you'd think there was a vast AMD vs Nvidia war going on. That really wan't the 'world war' at all if take two steps back and look at the whole 'forest' rather than just one 'tree'.

If loop in the iPad (**) with plain M-series now the 'experience' is more like 99.96% vs 0.04% even if Apple had looped only the Mac Pro (plus some odd ball eGPUs that were left) in. If round that to one digit after the decimal point its 100.0% to 0.0%

I think Apple's primary objective was to wipe out dGPUs entirely from the laptop line up. They did a n extremely good job of that. The collateral damage on that though turned out to be wiping out dGPUs on the iMac performance class of Macs also. Three years ago if told folks that Apple was going to match on some benchmarks the 4070 with an iGPU , you would have gotta lots of "what are smoking" looks.



** Point at the iPad saying those a iPadOS apps and mac has macOS apps completely misses the point here. If you drop down to just the graphics programming level of those apps and put aside the relatively narrow UIKit, File storage libraries, and a few other orthogonal library differences of those apps, it is going to be mainly the same optimization changes need to make from the legacy ( 10 years ago) graphics structures where had to copy data to/from the GPU and several other baroque issues that Apple GPUs don't have. That GPU programming subset if what Apple is trying to put "all the wood behind one arrow" . Extremely optimized apps in this specific area should work better all the way up and down that hardware stack. That shared 'experience' is what they are managing to.
It is mainly software ( and the assumptions built in that hinder/help performance ) that is the primary issue.
 
Interviewer: Will there be external GPUs for the Mac Pro?

Apple's man: Yes, we asked Nvidia if they would design a set of 4 boards for the new Mac Pro, but they replied that we don't allow them writing Mac drivers anymore, therefore it's not entirely clear to me how we could do it.
 
80/20 split is likely an order of magnitude off. This is mainly more about software than hardware. Apple has eliminated dGPUs for all product models from iMac through upper end MBP. By volume shipped, Intel was the dominate mac GPU vendor by far. If you sit in these macrumors forums you'd think there was a vast AMD vs Nvidia war going on. That really wan't the 'world war' at all if take two steps back and look at the whole 'forest' rather than just one 'tree'.

If loop in the iPad (**) with plain M-series now the 'experience' is more like 99.96% vs 0.04% even if Apple had looped only the Mac Pro (plus some odd ball eGPUs that were left) in. If round that to one digit after the decimal point its 100.0% to 0.0%

I think Apple's primary objective was to wipe out dGPUs entirely from the laptop line up. They did a n extremely good job of that. The collateral damage on that though turned out to be wiping out dGPUs on the iMac performance class of Macs also. Three years ago if told folks that Apple was going to match on some benchmarks the 4070 with an iGPU , you would have gotta lots of "what are smoking" looks.



** Point at the iPad saying those a iPadOS apps and mac has macOS apps completely misses the point here. If you drop down to just the graphics programming level of those apps and put aside the relatively narrow UIKit, File storage libraries, and a few other orthogonal library differences of those apps, it is going to be mainly the same optimization changes need to make from the legacy ( 10 years ago) graphics structures where had to copy data to/from the GPU and several other baroque issues that Apple GPUs don't have. That GPU programming subset if what Apple is trying to put "all the wood behind one arrow" . Extremely optimized apps in this specific area should work better all the way up and down that hardware stack. That shared 'experience' is what they are managing to.
It is mainly software ( and the assumptions built in that hinder/help performance ) that is the primary issue.
I’m not talking about the split between Mac Pro and other Macs. 80/20 is my estimate (mostly based on other people’s guesses, no actual knowledge) between Mac Pro users not needing PCI graphics cards, and Mac Pro users that do.
 
  • Like
Reactions: AlphaCentauri
I keeping wondering why it took such a long time to release what is essentially a Mac Studio inside a box with PCIe expansion slots that can't even support graphics cards.

I'm very disappointed with the Mac Pro.
It’s quite likely that they simply put it on hold because they needed the people that should have worked on it for limiting the damage of all sorts of delays caused by the supply crisis. In my company it has caused problems in all sorts of weird places, and screwed up the roadmap in all sorts of ways, for years to come. The fact that Apple’s product launches has been as little impacted as it has, is remarkable.
 
There’s something oddly comforting about this discussion about macs. It could be 1994 or 2004 or 2014

Macs have pros and cons relative to other personal computers and that hasn’t changed for decades.

iPhone on the other hand has very few cons. It’s amazing what marketshare across all types of industries and so many markets will do.
 
Unified memory access is preferable, though it can be done with external GPUs as AMD did with EPYC Trento and MI250X for supercomputers; AMD simply modified the IO die to include dGPUs in the unified memory space via Infinity Fabric. Apple simply isn’t interested. AMD’s next MI300 is more akin to a very large GPU with a CPU on-package with a ton of HBM3. Should also be noted that Apple still refuses to do business with Nvidia.

AMD really didn't do "Unifed Uniform" memory with the MX250X. The two dies on the 250X present to the end user app as two GPUs; not one.

It is 'less painful' to do copying on the MI250X but the memory pools are not really unified. The applications have to explicitly finish off the 'unification' with explicit code.

Apple's solution is for the most part transparent. [ there are a narrow set of cases where if kind of 'sloppy' assignment of work to cores where memory is far away can re-surface the distance issues that Apple tries very hard to automatically hide. But apps with bad assumptions and work allocation just work slower with no explicit memory 'unifying' calls. On MI250X that just doesn't scale ( you just get one GPU out of the two).

( Yes there are tools like MPI on top to paper over NUMA memory access that are used on supercomputers/clusters , but those are custom apps. Those calls are all basically explicitly present in the code and how the data is segmented has to be very carefully thought about in advance. )

Same think with NUMA Infinity Fabric connection. If get multiple node hop memory references ... at some distances that have to be explicitly mitigated in some apps.

I think Apple's 'war' on anything Khronos ( OpenCL/Vulkan/OpenGL ) or anything other than Metal is a bit limiting. Short to intermediate term, it probably won't hurt them as long as iPhone and iPad and lower half of Mac market keep growing. But long term, if better , more widely adopted portabel standards appear , Apple will likely get let i the dust trailing to catch up.

If MI300 presents as just one GPU then they would have turned the same 'corner' that Apple has. I'm kind of skeptical that they are done that. The info presented so far is that they are going from two GPU compute ties to six. If they had trouble doing two (admitted bigger ones) how is six going to get easier? Some recent rumors have stated there are 4 base memory/cache tiles in the package. Each base tile is coupled to one base tile and then persumable the base tiles are cobbled together. tile X on base tile A that needs to get to tile Y on a base tile three position away is going to take as much time as pulled memory from the base tile it is on? Probably not.

We'll see tomorrow (6/13). But good chance the MI300 does more to shrink a two MI250X+AMD CPU whole logic board for a supercomputer node into one very large, expensive package than it does for unifying the GPUs into one presented uniform larger GPU. It is going to take apps that are built to scale across multiple components in a supercomputer node and run them faster with huge decreases in power wasted on making them overly discrete. those apps already have the explicit remote, NUMA access juggling assumptions already built into them.


Anyway, tl:dr - Dedicated external GPUs are possible within unified memory space, but Apple wants you to use its silicon instead of wasting all of the die space they dedicated for M2’s iGPU.

Again, kind of ignoring the impacts on the software code . it isn't going to be exactly the same code for both paths. NUMA memory is 'unified' , but it is also NUMA (not uniform). When Apple says they have "Unified Memory" , I think that is technically "Unified , Uniform Memory" because they have a 'completely transparent to the software' requirement built into their definition.
 
It seems to this non-expert that this was the trade off all along. You could make huge performance gains with lots of specific, common workflows by putting everything on the chip together. But, this has the trade off of giving up expandable RAM and GPUs. Apple first did this with portable devices where it made sense where no one expected to expand these things.

Apple made a purposeful decision to forego expandability for the benefits the M series chips have brought, and that has worked for a large portion (likely a vast majority of) their customers. To me, the question isn’t “why won’t they do expandability” it was “should we hold back huge advancements for the majority for the sake of expandability” (and the answer was to move forward with the M series chops and launch this Mac Pro with its disadvantages. )

I wonder (can’t wait for this tell-all book) just how long ago Apple saw this trade off and made this decision. Many things, especially the saga of the trash can and their unwillingness to update it, might make more sense if Apple knew 10 years ago it was going to need this bold of a trade off now.
 
  • Like
Reactions: mw360 and poorcody
FY0.gif
 
Has Apple not heard of this magical thing called AI (/s)? Right now available libraries for ML rely heavily on GPU power, NVIDIA tensor cores. The libraries that do make use of Apple technologies do so in sub-optimal ways - either because Apple has not provided the right hooks (GitHub issues pointing to lack of support in Apple frameworks) or because industry is heavily using Tensor cores. Sure, Apple's chipsets may be very powerful, but they are limited by what Apple provides at the framework level.

Tensorflow and Pytorch both run on Apple Silicon "metal" framework. It's fast. But just as important is that my lowly Mac Mini M2-Pro, A can run fairly large models. The Apple GPU has access to more VRAM than my Nvidia card has. and the entire M2 Pro machine costs less than a high-end Nvidia GPA.

Apple Silicon actually is very cost-effective for machine learning tasks. When you look at performance in ML tasks vs cost. The Nvidia A100 is the usual GPU card but it costs more then a Mac Studio

What does an Nvidia GPU with 64 GB of VRAM cost? What does a Mac Studio with 128 GB of RAM cost?

Try it yourself. There is no need to make theoretical arguments. When I actually o training, the batch size can be higher on the Mac than on a cost-comparible Nvidia card
 
Has Apple not heard of this magical thing called AI (/s)? Right now available libraries for ML rely heavily on GPU power, NVIDIA tensor cores. The libraries that do make use of Apple technologies do so in sub-optimal ways - either because Apple has not provided the right hooks (GitHub issues pointing to lack of support in Apple frameworks) or because industry is heavily using Tensor cores. Sure, Apple's chipsets may be very powerful, but they are limited by what Apple provides at the framework level.
they probably would buy some AI company at some point, with all the Linux/windows computers so they will have plenty of dedicated GPU in the future XD

that would be very funny, "Siri A.I." powered by Nvidia...
 
Tensorflow and Pytorch both run on Apple Silicon "metal" framework. It's fast. But just as important is that my lowly Mac Mini M2-Pro, A can run fairly large models. The Apple GPU has access to more VRAM than my Nvidia card has. and the entire M2 Pro machine costs less than a high-end Nvidia GPA.

Apple Silicon actually is very cost-effective for machine learning tasks. When you look at performance in ML tasks vs cost. The Nvidia A100 is the usual GPU card but it costs more then a Mac Studio

What does an Nvidia GPU with 64 GB of VRAM cost? What does a Mac Studio with 128 GB of RAM cost?

Try it yourself. There is no need to make theoretical arguments. When I actually o training, the batch size can be higher on the Mac than on a cost-comparible Nvidia card
there are some point I dont get from your argument, could you elavorate it further please?

I though the mac studio only advantage vs a comparable pricced PC is performance per watt, you can get better GPu in PC with same price, true is, not better VRAM for GPU..., so VRAM is the most important feature in GPU for training??

I dont get how this could be, then NVIDIA should be working on 128GB GPU... or this is so expensive it can be possible?
 
Try it yourself. There is no need to make theoretical arguments. When I actually o training, the batch size can be higher on the Mac than on a cost-comparible Nvidia card
I have used it, and compared it to using NVIDIA GPUs... and am not terribly impressed. But perhaps I'm not doing the same things you are.
 
How do you explain Apple's announcement regarding making it easier to port DX12 games to Mac at WWDC23?

To me, it seems that Apple's strategy has changed and wants to be a major player in AAA gaming again.
How do you explain Apple's announcement regarding making it easier to port DX12 games to Mac at WWDC23?

To me, it seems that Apple's strategy has changed and wants to be a major player in AAA gaming again.
Yes, it adds a bit of extra juice to Mac gaming. But it’s still true that historically they make enough money from casual gaming not to be that bothered. And GPU performance isn’t great for AAA gaming - unless you convert everything to metal, which obviously existing game devs aren’t interested in doing. Especially with vision, I think apples gaming will come up from the casual gaming end (disruption). Adding on dx12 adds one small enabler.

I’d love to see full on Aaa gaming on the Mac but it’s clearly not there any time soon.
 
Tensorflow and Pytorch both run on Apple Silicon "metal" framework. It's fast. But just as important is that my lowly Mac Mini M2-Pro, A can run fairly large models. The Apple GPU has access to more VRAM than my Nvidia card has. and the entire M2 Pro machine costs less than a high-end Nvidia GPA.

Apple Silicon actually is very cost-effective for machine learning tasks. When you look at performance in ML tasks vs cost. The Nvidia A100 is the usual GPU card but it costs more then a Mac Studio

What does an Nvidia GPU with 64 GB of VRAM cost? What does a Mac Studio with 128 GB of RAM cost?

Try it yourself. There is no need to make theoretical arguments. When I actually o training, the batch size can be higher on the Mac than on a cost-comparible Nvidia card
And my principal frustration isn't that Apple hardware isn't capable - but rather Pytorch cannot implement certain routines because there are no hooks - I can dig up some GitHub issues if you are interested. One in particular was related to SparseMPS backend, but that is mostly on the PyTorch folks. There had been a few held up on an OS X update, and I never followed them to see if it was resolved or not. However, it would be nice to see Apple collaborate with some of those projects to improve their integration.

I'm mostly doing inference, not training. But I have over 400 mac mini's I'd like to run larger models on, and cannot because they take too long to run. We are talking about a difference of 5 minutes (on linux) vs an hour (on M2 - not pro) for the same task. It's possible batching isn't being used effectively, or cannot be for what I'm doing.
 
The architecture matters more than the specs.

A car with 10 wheels will not drive faster than a car with 4 wheels.

More RAM doesn't make a system perform better in and of itself.
Why would you assume the baseline is a car with 4 wheels? A car with 2 wheels drives faster than a car with 1 wheel, 3 wheels is a revolution and a car with 4 wheels corners the heck out of the 3-wheeler. Put 18 wheels on the car and you can haul massive loads in one go. This is exactly what happens when you add ram.
 
there are some point I dont get from your argument, could you elavorate it further please?

I though the mac studio only advantage vs a comparable pricced PC is performance per watt, you can get better GPu in PC with same price, true is, not better VRAM for GPU..., so VRAM is the most important feature in GPU for training??

I dont get how this could be, then NVIDIA should be working on 128GB GPU... or this is so expensive it can be possible?

Yes. The model has to fit in the vram. In the ML universe, the A100 GPU is the most common GPU you will come across. The new one has either 40 or 80 GB or VRAM. This is Apple's competition. The gamer GPUs are for gamers.

To get better performance they place multiple A100 cards in the same computer. So that way you really can have an effective 128+ GB

You can buy the GPU here on Amazon

How many of the people complaining about the fact that you can not place a high-end GPU card in a Mac Pro would actually do so if they could. It would be silly. Most of us lit Amazon or Google buy these cards and place them in data-center servers then we rent time of the servers. Why would anyone buy their own? Google rents time on the servers for $0.87 per hour per A100 card.

But the Mac Studio lets you have all this VRAM and a very fast GPU for 1/2 the price of one A100 card.
 
As an Amazon Associate, MacRumors earns a commission from qualifying purchases made through links in this post.
  • Like
Reactions: cocoua
Top 5 Companies, Worldwide PC Workstation Shipments, Market Share, and Year-Over-Year Growth, 2022 (shipments in thousands of units)

Company2022 Shipments2022 Market Share2021 Shipments2021 Market Share2022/2021 Growth
1. Dell Technologies3,171.241.4%2,979.639.8%+6.4%
2. HP Inc.2,580.433.7%2,549.334.0%+1.2%
3. Lenovo1,860.024.3%1,920.925.6%-3.2%
4. ASUS24.50.3%19.70.3%+24.3%
5. NEC20.10.3%26.10.3%-22.7%
Total7,656.2100.0%7,495.6100.0%+2.1%

Source: https://www.idc.com/getdoc.jsp?containerId=prUS50454823

So, if the Mac Pro is a workstation, it sold less than 20 000 units in 2022.
 
I am just running a project studio/small indie label, so my point of view will be slightly different than his.

I’ve had Performas, G3 Tower and G4 MDD Tower. Then I’ve had Mac minis, after Apple switched to Intel. Never had the need for PCI cards myself. I’ve always had either FireWire, USB or TB audio interfaces.

As for RAM, I think more than 128-192GB is only required for large orchestral templates.

So for majority of professional musicians, new Mac Pro is fine if they need to put Pro Tools cards into it or UAD cards. Otherwise, everyone will go with Mac Studio.

How much RAM do you usually use? I know some studios their workstations go above 300 gb.
 
How much RAM do you usually use? I know some studios their workstations go above 300 gb.
Electronic music production, sample based and virtual instruments. My present Intel mini has 64GB, that’s just about enough for now, I sometimes hit the ceiling and it goes to swap file.

If I was doing orchestral mock-ups and film scores, I would want at least 256-512GB
 
Funny that, they must have a different definition of optimized because a 4090 in a random assortment of hardware sure as hell feels optimized.

These dedicated graphics cards are not optimised against the computer's CPU and internal memory. They're just optimised by themselves.

How would you make a NVIDIA 4090 use the M2's RAM?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.