There is no right answer because GPUs are a hierarchy of "compute units" clustered together to share more and more items. At the lowest level you might have a set of "compute units" that share a scheduler (ie which instructions to send to the "compute units") and register file, at a higher level this "core" might share an L2, but at a lower level this core might be split into four "sub-cores" each with a separate L1 cache. Or other permutations!
These details matter because they determine things like how fast one set of threads can share data or synchronization with another set of threads. Beyond that is the issue of "how fast for WHAT?" Graphics performance depends not just on how many FLOPs, but also on things like how rapidly textures can be read, or on specialized hardware like geometry shaders; whereas AI performance depends on things like what types of numbers (FP16? BF16? 4bit integer?) are supported.
You can get some idea of how the various designs compare by looking at:
https://github.com/philipturner/metal-benchmarks
which (at the very top of the page, click on the arrow!) gives some numbers for Apple vs recent AMD and nV designs. It's clear that there has been some variation in how large "cores" are and how they are balanced over the past 8 years or so, but that all three vendors have now converged onto something very similar.
WOW! Thanks for the link. I’m still reading it and re-reading it!
One thing that really bothers me is that many of the subtopics on the GitHub repository are by programmers expressing a
fervent desire to do more
more GPGPU processing on Mac/macOS!
GPGPU has always fascinated me.
And I wish Apple had a specific software engineering team devoted
solely to combing through macOS and finding as many GPGPU-suitable “compute” instruction refinements to the macOS codebase as possible, including low-level code, all APIs, colossal “Core” APIs and Kits, Frameworks, etc. (I don’t know that Apple doesn’t, but…I doubt it…or it’s within the larger macOS team and not a designated, dedicated group with
one job!)
It would be nice to see existing Apple Silicon Macs (and devices running iOS for that matter) showing faster performance the more that suitable instructions are found that can be handed off the the GPU(s) instead of the CPU.
(What would the reaction be if Apple released a milestone macOS update that was shown to run everything 10% faster! I want
another “Snow Leopard”!)
If done right, all existing software — without the need to even recompile — would inherit every performance refinement.
IDK, maybe a GPGPU-refined Macs/macOS could run 10% faster (depending on the task) or 20% faster (but I’d take 5%).
I agree with deprecating OpenGL — the thing is ANCIENT! And the insistence it be backwards compatible with every crusty legacy version
still in its DNA is weighing it down worse than a ball and chain. (OpenGL seems like the “Flash” of graphics libraries.)
I wish other companies would take on the risk of “ripping off the Band-Aid” all at once — like Apple has done often — instead of nursing old technologies
ad infinitum because they’re too risk averse — the risk that someone, somewhere can’t run their app.
I understand Apple’s conundrum: writing code that translates OpenGL, OpenCL, OpenML, Cuda (the parallel programming API, not the hardware), Vulcan — or at least MoltenVK and MoltenCL, does
nothing to wean programmers off these old or non-Mac-optimized APIs. It only gives them an excuse to use them forever and never give them up. And then the exact same app will run worse on Macs than on all other platforms.
AND, as is obvious, realtime translation/interpretation exacts a performance overhead — pretty much defeating the whole purpose of Metal and any other accelerative Mac APIs that would employ translation/interpretation.
Lastly, is Machine Learning, Deep Learning and A.I.
Using any of these software technologies, can macOS “learn” how a user uses a software app and optimize it over time?
Say a Photoshop user uses it a certain way and never uses certain features of the app — or Final Cut Pro or DaVinci Resolve or Blender, etc.?
Can the OS keep track of what codebases of the software it always seems to load and ones it never does? Can it “cache” or prefetch more optimally, based on what it “learns” over time about how a user uses a software app?
If this were “a thing,” you could do a standard Final Cut benchmark test, and then, 3 months later, that same benchmark would improve! And continue to improve!
Can A.I. or ML find CPU code that’s suitable for handing off to the GPU instead of human coders wracking their brains?!
I remember writing Assembly code back in the day, but somehow the
same app written in FORTH ran faster than in assembled assembly code! (I never understood how or why.) But FORTH was smarter than me — that much I knew.
Software technology is amazing, and Apple, more than anyone else, has (far more than once) made its customers feel like they’ve bought brand new hardware once some milestone OS updates are installed.