AFAIK ( I could be wrong) the M1 Pro and above have it, the base M1 does not.So Wikipedia is wrong on the M1 having a ProRes decoder?
iMHO, it’s the technical threads that add value to this forum far more than the rumour / opinion threads (and yes, that itself is an opinion).Call me crazy but we have a thing called wikipedia that has this sort of stuff already ¯\_(ツ)_/¯
There's also technikales.com, github and for the technically include Arxiv I'm sure there's others as well
Ah unfortunately cpu-monkey is a little unreliable. This is not the first time I've found them making mistakes. The M2 and M3 TFLOPS are right, but the M4 and M5 are wrong. I can't confirm their clock speed for the M5. But even if it is right, the TFLOPS they calculated from it is still wrong.I've edited the TFLOPS for M2, M3 and M4 according to these sources, M5 already matched the source:
Apple M2 (10 Core) Benchmarks & Specs
Apple M2 (10 Core) - 3DMark Time Spy and FP32 benchmarks and specifications for this integrated graphicswww.cpu-monkey.com
Apple M3 (10 Core) Benchmarks & Specs
Apple M3 (10 Core) - 3DMark Time Spy and FP32 benchmarks and specifications for this integrated graphicswww.cpu-monkey.com
Apple M4 (10 Core) Benchmarks & Specs
Apple M4 (10 Core) - 3DMark Time Spy and FP32 benchmarks and specifications for this integrated graphicswww.cpu-monkey.com
M5: https://www.cpu-monkey.com/en/igpu-apple_m5_10_core
No, it's just a compilation of specs rarely found al all together in one place.
A little digging shows that https://www.apple.com/final-cut-pro/docs/Apple_ProRes.pdf only mentions the M1 Pro, Max and Ultra.AFAIK ( I could be wrong) the M1 Pro and above have it, the base M1 does not.
I THINK (patents and some benchmarks I've seen both suggest this) that M5 FP16 throughput is 3x FP32 -- but ONLY for matrix multiply.Ah unfortunately cpu-monkey is a little unreliable. This is not the first time I've found them making mistakes. The M2 and M3 TFLOPS are right, but the M4 and M5 are wrong. I can't confirm their clock speed for the M5. But even if it is right, the TFLOPS they calculated from it is still wrong.
M4: 1280*1.58*2/1000 = 4.0448 TFLOPS
M5: 1280*1.9*2/1000 = 4.864 TFLOPS (again, that's if they have the new GPU clock speed right)
Another problem is that cpu-monkey lists double throughput of FP16 relative to FP32 for all of these chips, but Apple didn't gain the ability to do that until the M5. And finally, pretty sure the number of execution units is 4 per core, so it should be 40 units not 160 units*. Apple has a SIMD width of 32 for a total of 40*32 = 1,280 FP32 units (which cpu-monkey does get right, though we don't know yet about the M5 structure and how Apple doubled FP16 throughput, but CPU-monkey doesn't list that anyway). Hopefully I haven't led you astray, but if so @name99 or @leman can correct me.
*EDIT: so tired I managed to confuse myself. It should be right now. I think where CPU monkey got confused and momentarily confused me, is that Apple used to allow 16*32 = 512 threads per core (I believe since Apple has bumped that up to 32*32 = 1024 threads per core, but that's not the same as the execution units/core counts).
![]()
Metal compute shaders threadgroup & threadExecutionWidth
Can someone explain in simple terms what threadgroup conceptually is in Metal compute shaders and other terms such as SIMD group, threadExecutionWidth (wavefront)? I read the docs but am more confu...stackoverflow.com
Do you remember that "I know more about American Girl than you do, genius!" Vine? That's the vibe here.Why y’all so aggy in here?