Expected GPU performance of M1X/M2

katewes · Oct 17, 2021

leman said:
How does your M1 compare to the 3070 for video editing? The benchmarks I saw using Premiere put them very close.

It's mixed results. I find the M1 great for video editing. But when it comes to motion graphics (Davinci Resolve's Fusion), it is ok for simple graphics - but it slows to a crawl with very complex graphics. Hence I was looking forward to a more powerful GPU.

Traverse · Oct 17, 2021

I’m trying to figure out the relative performance of the Apple M1 chip to my 20 1727 inch iMac with Radion pro 580.

Going solely by the Apple Metal score, the eight core GPU in the M1 has about 50% of the performance of my current machine. That means I should break even with the 16 core model given that GPUs tend to scale linearly. So I will have to pay for a 32 core model to see an improvement in performance.

Serban55 · Oct 17, 2021

"Leaker says new MacBook Pro, Mac mini, and AirPods 3"
So nice again 3 macs on October event

Serban55 · Oct 17, 2021

Traverse said:
I’m trying to figure out the relative performance of the Apple M1 chip to my 20 1727 inch iMac with Radion pro 580.

Going solely by the Apple Metal score, the eight core GPU in the M1 has about 50% of the performance of my current machine. That means I should break even with the 16 core model given that GPUs tend to scale linearly. So I will have to pay for a 32 core model to see an improvement in performance.

yes, but i still think the bigger imac, next year to have even a better option for the gpu, to be better than 5700 XT
but if you want a laptop, then yes, go 32gpu cores option...its crazy that after what, 5 years an high end dGpu back in those days that had around 120W if im remember correctly , now it will be outperformed by this inside an slim, compact laptop with an gpu that draw around 40-50W

tomO2013 · Oct 17, 2021

I’m not entirely sure that performance between M1 GPU implementation and a dGPU such as AMD 5700XT can be compared directly.
The M1 Teraflop to AMD 5700/5800XT Teraflop comparison doesn’t really (IMHO) provide an apples to apples comparison where rubber meets the road in terms of frame rates in games, CAD model complexity, 3d rendering etc.. - one GPU uses a traditional IMR (immediate mode rendering) rendering approach based on a more modern intelligent z-buffer, the other uses a different but arguably much more efficient (per flop) rendering with tile based rendering with Apple Silicon.

Tile-Based Deferred Rendering (TBDR)

The usual rendering technique on most GPUs is known as Immediate Mode Rendering (IMR) where geometry is sent to the GPU, and gets drawn straight away. This simple architecture is somewhat inefficient, ...

docs.imgtec.com

Furthermore, the Apple GPU is intended to work as part of an integrated accelerator and co-processor ecosystem where dedicated accelerators target specific workload types - e.g. ISP, video encode, video decode etc..

When comparing with traditional PC (CPU + dGPU) we need to consider that Apple can farm off some of the ”traditional” compute workloads associated with a dGPU to dedicated accelerators à la neural engine. Much of a modern day graphics cards theoretical compute capability is not solely reserved for graphics rendering in games/3d modelling/CAD work … but also physics calculations etc… In this sense, Apples’ chip design makes a 1:1 difficult to make.

It’s also why I feel that synthetic benchmarks that do not use Apple native API’s such that the the code base is well optimized for the underlying ecosystem, make traditional 1:1 comparisons difficult.

I get a sense personally that “1 teraflop of GPU performance” from Apple silicon goes a lot further than say 1 teraflop on a traditional dGPU simply because the workload that the GPU on apple silicon gets is more focussed (neural engine for machine learning, physics acceleration, intelligent upsampling, etc…). This is before we even scratch the performance per watt conversation.

I’m sure somebody like @cmaier will have a more informed / intelligent take on this than I do.

Certainly open to being told there is something that I’ve missed here

Hope y’all are staying well, happy, healthy and safe.

joelypolly · Oct 17, 2021

tomO2013 said:
I’m not entirely sure that performance between M1 GPU implementation and a dGPU such as AMD 5700XT can be compared directly.
The M1 Teraflop to AMD 5700/5800XT Teraflop comparison doesn’t really (IMHO) provide an apples to apples comparison where rubber meets the road in terms of frame rates in games, CAD model complexity, 3d rendering etc.. - one GPU uses a traditional IMR (immediate mode rendering) rendering approach based on a more modern intelligent z-buffer, the other uses a different but arguably much more efficient (per flop) rendering with tile based rendering with Apple Silicon.

Tile-Based Deferred Rendering (TBDR)

The usual rendering technique on most GPUs is known as Immediate Mode Rendering (IMR) where geometry is sent to the GPU, and gets drawn straight away. This simple architecture is somewhat inefficient, ...

docs.imgtec.com

Furthermore, the Apple GPU is intended to work as part of an integrated accelerator and co-processor ecosystem where dedicated accelerators target specific workload types - e.g. ISP, video encode, video decode etc..

When comparing with traditional PC (CPU + dGPU) we need to consider that Apple can farm off some of the ”traditional” compute workloads associated with a dGPU to dedicated accelerators à la neural engine. Much of a modern day graphics cards theoretical compute capability is not solely reserved for graphics rendering in games/3d modelling/CAD work … but also physics calculations etc… In this sense, Apples’ chip design makes a 1:1 difficult to make.

It’s also why I feel that synthetic benchmarks that do not use Apple native API’s such that the the code base is well optimized for the underlying ecosystem, make traditional 1:1 comparisons difficult.

I get a sense personally that “1 teraflop of GPU performance” from Apple silicon goes a lot further than say 1 teraflop on a traditional dGPU simply because the workload that the GPU on apple silicon gets is more focussed (neural engine for machine learning, physics acceleration, intelligent upsampling, etc…). This is before we even scratch the performance per watt conversation.

I’m sure somebody like @cmaier will have a more informed / intelligent take on this than I do.

Certainly open to being told there is something that I’ve missed here

Hope y’all are staying well, happy, healthy and safe.

Most modern GPUs do have some type of tile rendering already but they are obviously not as optimized as the M1 will be in this case. I think the biggest game changer is the unified memory and the removal of moving data from main memory via the PCI express lanes and back to the GPU memory (AMD and Nvidia do have Resizable BAR support but its still bandwidth constrained to PCIe bandwidth which often on a laptop is still x8 PCIe 3.0 which is only about 8GB/s vs typical memory on GPUs at 120 to 500GB/s). This to me represents the biggest leap to heterogeneous computing on desktop/laptop form factor. In a few years I think you'll see a lot of very differentiated performance as developers learn to better leverage this fact.

leman · Oct 17, 2021

tomO2013 said:
When comparing with traditional PC (CPU + dGPU) we need to consider that Apple can farm off some of the ”traditional” compute workloads associated with a dGPU to dedicated accelerators à la neural engine. Much of a modern day graphics cards theoretical compute capability is not solely reserved for graphics rendering in games/3d modelling/CAD work … but also physics calculations etc… In this sense, Apples’ chip design makes a 1:1 difficult to make.

It’s also why I feel that synthetic benchmarks that do not use Apple native API’s such that the the code base is well optimized for the underlying ecosystem, make traditional 1:1 comparisons difficult.

I get a sense personally that “1 teraflop of GPU performance” from Apple silicon goes a lot further than say 1 teraflop on a traditional dGPU simply because the workload that the GPU on apple silicon gets is more focussed (neural engine for machine learning, physics acceleration, intelligent upsampling, etc…). This is before we even scratch the performance per watt conversation.

You are making valid points, but it is still slightly more complicated.

For rasterization, yes, Apple GPUs are much more efficient than the traditional model, but TBDR is more about efficient use of memory bandwidth than about the efficient use of computational resources (although that too, in some cases).

If some work can be offloaded onto a different coprocessor (say the NPU or the AMX unit), the GPU is indeed free to do something else, but such work split up is not always possible or practical. If your game uses GPU to do its physics simulation, you probably won’t rewrite it just for the Apple Silicon (not to mention that I doubt that the NPU can even be used for that purpose). So I am not sure I agree that a TFLOP on Apple gives you more than the TFLOP on Nvidia (and don’t forget that Nvidia has its own NPU integrated into the GPU).

To make it even more complicated though, TFLOPS just represent the maximal throughtput of certain arithmetic operations you can achieve on the GPU. This number is a bit silly and does not represent any real work. Given the proprietary nature of most GPUs and lack of low-level profiling information, we don’t really know how efficient are different GPUs at utilizing their resources. Personally, I believe (and please note it’s just my guess, not backed by data of any kind) that Apple GPUs will be more efficient on complex, hybrid workloads (since they have larger caches and what seems to be very flexible scheduling), but on straightforward number crunching they will perform similar to traditional GPUs with comparable max TFLOPS metrics. For some applications, where amount of work per data point is limited, traditional GPUs will be more advantageous due to their higher memory bandwidth. Apple is aware of that and they are including new features to their GPUs that allow more efficient memory transfers (like matrix instructions, tile store instructions, or the new shuffle and fill which essentially expose the GPUs as long SIMD processors they in fact are).

The bottomline I think is that we will be getting some very fast GPUs in our Macs without sacrificing form factor, portability or battery. And for some tasks, they will be very fast (UMA, NPU, hybrid algorithm). But don’t expect them to outperform GPUs with comparable amount of computational resources. A 16-core Apple GPU (2048 ALUs) won’t magically be faster than an Nvidia GPU with 5000+ ALUs. What you can expect though is massively lower power consumption for the same performance as Nvidias 2048 ALU GPU

leman · Oct 17, 2021

joelypolly said:
Most modern GPUs do have some type of tile rendering already but they are obviously not as optimized as the M1 will be in this case.

Tile based immediate rendering (that modern traditional GPUs use) and tile based deferred rendering are very different paradigms and should not be confused.

joelypolly · Oct 17, 2021

leman said:
Time based immediate rendering (that modern traditional GPUs use) and tile based deferred rendering are very different paradigms and should not be confused.

I'd love to learn a bit more about if you have any references! I've always been interested in graphics programming but have spent most of my time on the CPU side of things

Boil · Oct 17, 2021

7-core GPU = 896 ALUs (M1)
8-core GPU = 1,024 ALUs (M1)
16-core GPU =2,048 ALUs (M1 Pro)
32-core GPU = 4,096 ALUs (M1 Max)
64-core GPU = 8,192 ALUs (Jade 2C / two M1 Max)
128-core GPU =16,384 ALUs (Jade 4C / four M1 Max)

Kpjoslee · Oct 17, 2021

leman said:
Time based immediate rendering (that modern traditional GPUs use) and tile based deferred rendering are very different paradigms and should not be confused.

Nvidia's GPU since Maxwell do use tile based rendering. As of now, pretty much every architecture out there seems to be using tile based rasterization. ARM Mali, Qualcomm Adreno, AMD, Intel, and Nvidia all uses tile based renderer lol.

Boil · Oct 17, 2021

joelypolly said:
Most modern GPUs do have some type of tile rendering already but they are obviously not as optimized as the M1 will be in this case.

leman said:
Time based immediate rendering (that modern traditional GPUs use) and tile based deferred rendering are very different paradigms and should not be confused.

Kpjoslee said:
Nvidia's GPU since Maxwell do use tile based rendering. As of now, pretty much every architecture out there seems to be using tile based rasterization. ARM Mali, Qualcomm Adreno, AMD, Intel, and Nvidia all uses tile based renderer lol.

Pretty sure @leman meant Tile, not Time; the important differentiator between the two is "immediate" versus "deferred"...?

Kpjoslee · Oct 17, 2021

Boil said:
Pretty sure @leman meant Tile, not Time; the important differentiator between the two is "immediate" versus "deferred"...?

Not sure about others, but Nvidia's approach is close to the TBDR since Maxwell.

mr_roboto · Oct 17, 2021

leman said:
For rasterization, yes, Apple GPUs are much more efficient than the traditional model, but TBDR is more about efficient use of memory bandwidth than about the efficient use of computational resources (although that too, in some cases).

I'd say it's really about both. A lot of the memory bandwidth savings occur because TBDR never shades fully occluded pixels, meaning it doesn't have to fetch texture data for that pixel. But... it also doesn't have to run the shader program against that pixel, so it's saving both at the same time.

Of course, the flip side of that is that TBDR is weak when rendering transparency effects, especially if application devs don't take care to optimize their rendering pipeline the best way to handle transparency on TBDR GPUs.

mr_roboto · Oct 17, 2021

Boil said:
Pretty sure @leman meant Tile, not Time; the important differentiator between the two is "immediate" versus "deferred"...?

Yes, that's the key. NVidia and AMD are doing things with tiles to some extent, but they are not deferred.

Immediate renderers simply draw everything in the order submitted by the application. If your app draws one triangle first, and later on it is partially or fully occluded by other triangles, the work done to shade the occluded pixels of that first triangle is just wasted.

Deferred renderers wait until the application is done submitting draw commands, then use multiple phases of rendering to both pipeline the work and reduce the total amount of it. Roughly speaking, a TBDR GPU will do a first pass where only depth buffer testing is used to determine whether each pixel (or 'fragment') of a triangle is visible on screen. Once it's done with that, it actually runs the appropriate shader program against each visible fragment.

This means TBDR has overdraw (meaning: extra work done on invisible things) on the depth buffer side, same as an immediate mode rasterizer, but none (except in cases of transparency, where it's required) for fragment shading and texture sampling.

disclaimer: I am not a real graphics expert.

rondocap · Oct 17, 2021

Hi, can you guys assure me that my Mac Pro with two w6800x Duos is safe from the new m1x Mac gpus? Please 😅

quarkysg · Oct 17, 2021

rondocap said:
Hi, can you guys assure me that my Mac Pro with two w6800x Duos is safe from the new m1x Mac gpus? Please 😅

I would suggest to not watch the Apple event using your Mac Pro or the two W6800X may turn green with envy and explode

Serban55 · Oct 17, 2021

There is no Face ID, with the notch to house a 1080p webcam, an ambient light sensor, and an indicator light.
There will be a mini-LED display.
The MacBook Pro is "very thick, thick and heavy."
The maximum configuration of "32+4T" will be its biggest selling point.
Apple will add two large fans to the new chip.
"Don't expect too much on narrow borders. It is indeed narrow, but it is not much narrow."
Touch Bar is gone.
MacBook Pro models will feature MagSafe, HDMI Port, and SD Card slot.
No "MacBook Pro" logo on the bottom bezel.
Bezel width remains at the current size, with the bezels at the three sides basically the same width with the bottom bezel thicker.
The MacBook Pro "has various curves" to give people an "intuitive feeling that it is a large rectangle."

turbineseaplane · Oct 17, 2021

Serban55 said:
The MacBook Pro is "very thick, thick and heavy."

that’s GOT to be pretty relative MacBook Airs or something.

I simply can’t fathom Apple releasing something that is particularly “thick and heavy”

altaic · Oct 17, 2021

turbineseaplane said:
that’s GOT to be pretty relative MacBook Airs or something.

I simply can’t fathom Apple releasing something that is particularly “thick and heavy”

The side profile image from the April REvil leak looks to be about twice as thick, and slightly curved. Though, it doesn't show the bottom panel, which might be flat as opposed to the dished bottom that all of the unibody MacBook Pros have.

Edit: One weird thing about that image, btw, is that it depicts what looks a multi-piece chassis. It could just be dimensional reference lines, though.

Edit: 9to5Mac has an article with a rendering based on this leak (though it only sort of includes the curve in the last pic).

chrisdazzo · Oct 17, 2021

rondocap said:
Hi, can you guys assure me that my Mac Pro with two w6800x Duos is safe from the new m1x Mac gpus? Please 😅

I hope not! 😬

OriginalBaki · Oct 17, 2021

quarkysg said:
I would suggest to not watch the Apple event using your Mac Pro or the two W6800X may turn green with envy and explode

30 TF system, so should be safe.

Serban55 · Oct 17, 2021

turbineseaplane said:
that’s GOT to be pretty relative MacBook Airs or something.

I simply can’t fathom Apple releasing something that is particularly “thick and heavy”

i still think that "last minute leak" could be wrong..i mean a very thick and heavier MBP?
What can you do in the 16" to make him even heavier since the display size is the same...battery capacity its already at its legal right...i bet the M1x with 32 gpu cores should not be heavier than the intel cpu+dGpu...what bigger vents and bigger heat pipe?

Serban55 · Oct 17, 2021

altaic said:
The side profile image from the April REvil leak looks to be about twice as thick,

yes, you are right..if that leak will come true...i mean look at that profile compared to the usbC port...
Is Apple going to deliver a monster 16" Mbp...at 5 pounds and twice as thick? like the post retina display era ?!
but what for? since the M1x will be not as hot as the Intel cpu+dGpu....the 16" already have the biggest battery capacity allowed..so whats the point ?!

dgdosen · Oct 17, 2021

When new chips comes out, I like to say they're now the 'crappiest tech' you'll ever use moving forward.

Meaning, no matter how good these new chips may be, they'll soon be surpassed in terms of speed, efficiency, transistor count, cost(?), etc by the next generation of chips (by Apple or others), which are already in the pipeline.

That next generation of technology will soon relegate this tech to the bargain bin of craigslist/swappa/marketplace.

Expected GPU performance of M1X/M2

macrumors 6502

macrumors 604

Suspended

Suspended

macrumors member

macrumors 6502a

macrumors Core

macrumors Core

macrumors 6502a

macrumors 68040

macrumors 6502

macrumors 68040

macrumors 6502

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 65816

Suspended

Contributor

macrumors 6502a

macrumors 65816

macrumors member

Suspended

Suspended

macrumors 68030

Our Staff