Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

trwalp

macrumors newbie
Original poster
Sep 9, 2013
15
0
I am referring to features or capabilities that do not exist in the M1. Please site specific programs and, if possible, how they benefit. The M2 has speed improvements or a few more cores, but in this case I'm interested in whether MacOS or any applications will be able to do things that aren't possible without an M2. The old "not available with M1" exception footnote that might appear sometime in the future.
 
Great question! I cannot comment on the existing software, but I can try to least some of these features:

- bfloat acceleration on the CPU: bfloat is a number representation optimized for needs of machine learning. M2 can natively operate on these types, improving ML performance and accuracy. It is possible that applications using CoreML can already benefit from this

- SIMD shift and fill on the GPU: M2 supports a unique pattern that allows it to exchange data between multiple SIMD lanes. This can be useful for image processing apps that have to load pixel data into GPU registers. I do not know if any apps actually do that, but I would be surprised if Affinity or Pixelmator did. It’s fairly easy to implement.
 
leman has pointed out the most important differences. There are also

- sparse depth and stencil textures: basically the GPU lets you "pretend" to allocate a massive texture, but then only allocate and use the parts of the texture you need.

- Hardware lossy texture compression. You can save a decent amount of memory bandwidth by using compressed textures with algorithms that will throw away unimportant visual detail.

- 64-bit atomics. This is a contentious point because who knows if these actually work or not in practice, but Apple lists them in the metal feature set tables as "varies" on devices in both the Mac2 and Apple8 family (aka M2 macs). The most famous example of software requiring 64-bit atomics is Unreal Engine's Nanite.

I have no idea if any of these things are actually implemented in practice though.
 
bfloat acceleration on the CPU
The M2 supports 16-bit FP numbers in hardware? I completely missed this. Is it an ARM standard or an Apple extension? Is the bfloat16 support in the scalar FPU, the NEON SIMD engine, or both? I should RTFM if I can find one.
 
The M2 supports 16-bit FP numbers in hardware? I completely missed this. Is it an ARM standard or an Apple extension? Is the bfloat16 support in the scalar FPU, the NEON SIMD engine, or both? I should RTFM if I can find one.
NEON does not support bfloat16 but SVE does. It would seem to me that the basic cost of implementing SVE at the 128 bit scale would be pretty low, as much of its functionality is pretty similar to NEON. Several parties have dug into M1 architecture, and I have not heard any mention of SVE being present in the CPU. But it is a SoC, so bfloat16 support could just be a Neural Engine and/or GPU feature.
 
  • Like
Reactions: trwalp
NEON does support bfloat16, introduced in armv8.6-a if I understand it correctly. This is supported on M2 as a standard ARM CPU feature, but they also support it in the AVX coprocessor and the NPU from what I understand.

Sonoma also adds bfloat16 to the GPU, so maybe it's also supported there? No idea about this last bit.
 
While the responses from @leman, @jmho, @Sydde, and @Basic75 leave me feeling awfully humble as a mere "IT guy", thus far it remains unclear if any applications or macOS take advantage of these new M2 capabilities. Or, to go to the core of my interest, I have yet to see a reason to think that an M1 MAX will be obsolete much sooner than an M2 MAX. (currently, buying an M1 MAX MBP can save ~$800).

Thank you to all who have contributed!
 
Well, that's a different question :) What is your primary use of the computer?

M2 so far is significantly faster in rendering workloads (almost 2x in Blender), and will probably see a big improvement in various ML tasks (e.g. Stable Diffusion and friends, although if you care about that you should probably get an Nvidia GPU). But for general use, photo/video editing, audio, software development etc., M1 will still be more than adequate for a long while.
 
NEON does not support bfloat16 but SVE does.
The next revision of the Armv8-A architecture will introduce Neon and SVE vector instructions designed to accelerate certain computations using the BFloat16 (BF16) floating-point number format.

NEON does support bfloat16, introduced in armv8.6-a if I understand it correctly. This is supported on M2 as a standard ARM CPU feature
If M1 and M2 are compatible with Armv8.5-a, how can they have bfloat16 support?
 


If M1 and M2 are compatible with Armv8.5-a, how can they have bfloat16 support?

Apple has a license for the ISA, meaning they can add their own instructions to the base set. In fact, some of Apple's additions have been backported into the ARM instruction set. Consequently, Apple can add features which are not present in the current ARM ISA.
 
@Xiao_Xi From what I understand bflost16 is an optional feature for earlier ARMv8. These feature sets are already confusing enough. The main point is that M2 supports it.
 
  • Like
Reactions: Xiao_Xi
Apple has a license for the ISA, meaning they can add their own instructions to the base set.
Have GCC or LLVM documented any of these new instructions? If not, how can programs take advantage of Apple's instructions if the compilers are not aware of them?
 
Have GCC or LLVM documented any of these new instructions? If not, how can programs take advantage of Apple's instructions if the compilers are not aware of them?


Bfloat support on Apple clang is in header arm_bf16.h and arm_neon.h. The feature flag is not available on my M1 machine, so I can't play around with ut. But anyone with M2 could try seeing whether __bf16 is available in clang and whether __ARM_FEATURE_BF16 is enabled.

@trwalp Sorry so much for derailing your honest question, and thank you for your patience :) I don't think you are going to get any straight answers on software support to be honest, as there is just no information.
 
  • Like
Reactions: Xiao_Xi
Bfloat support on Apple clang is in header arm_bf16.h and arm_neon.h.
That patch shows that Apple can add ARM instructions from a higher version of the version that the SoC supports, but it doesn't show that Apple can add its own instructions.
 
That patch shows that Apple can add ARM instructions from a higher version of the version that the SoC supports, but it doesn't show that Apple can add its own instructions.

Ah, sorry, I misread your question. There are without any doubt Apple-specific instructions on their processors (just look at AMX), but I don't think they are exposed in the compiler.
 
  • Like
Reactions: Xiao_Xi
That patch shows that Apple can add ARM instructions from a higher version of the version that the SoC supports
By the way, ARM let do it.
An Armv8.x-A processor can implement any features from the next .x extension. However, it cannot implement features from any later .x extension.

There are without any doubt Apple-specific instructions on their processors (just look at AMX), but I don't think they are exposed in the compiler.
This is an undocumented arm64 ISA extension present on the Apple M1. These instructions have been reversed from Accelerate (vImage, libBLAS, libBNNS, libvDSP and libLAPACK all use them), and by experimenting with their behaviour on the M1. Apple has not published a compiler, assembler, or disassembler, but by callling into the public Accelerate framework APIs you can get the performance benefits (fast multiplication of big matrices).

If neither LLVM nor GCC have such instructions documented, does this mean that Apple uses a modified version of LLVM to compile libraries that make use of AMX?

If those instructions are executed on a coprocessor and not on the CPU, why are they considered ARM instructions? Couldn't Apple do the same with a coprocessor with custom RISC-V instructions?
 
If neither LLVM nor GCC have such instructions documented, does this mean that Apple uses a modified version of LLVM to compile libraries that make use of AMX?

Your guess is as good as anybody else's. Maybe they use assembly directly.

If those instructions are executed on a coprocessor and not on the CPU, why are they considered ARM instructions? Couldn't Apple do the same with a coprocessor with custom RISC-V instructions?

Because they are ARM instructions which are decoded and issued by the CPU, like any other instruction. It's just that the execution happens on the shader AMX block rather than inside the CPU core. There is a lot of historical precedent for this style of coprocessor usage, such as x87. The point is that the coprocessor does not run a separate program (as it would be the case with a GPU), but is entirely controlled by a program interpreted by the main CPU.
 
  • Like
Reactions: Xiao_Xi
While the responses from @leman, @jmho, @Sydde, and @Basic75 leave me feeling awfully humble as a mere "IT guy", thus far it remains unclear if any applications or macOS take advantage of these new M2 capabilities. Or, to go to the core of my interest, I have yet to see a reason to think that an M1 MAX will be obsolete much sooner than an M2 MAX. (currently, buying an M1 MAX MBP can save ~$800).

Thank you to all who have contributed!
Apple generally will cut off support on systems (whether with Intel processors or with Apple SoCs) based on individual component features that they deem necessary. In the case of Intel based Macs, this can be either because Apple wanted to drop support for certain processor features present in older Intel CPUs, but not newer ones (as was the case with Ventura dropping support for anything older than Kaby Lake) or because Apple can't get an updated driver for a given hardware component due to the original manufacturer of that component dropping support for said component altogether.

For Apple SoCs in particular, we can speculate based on what's known between these SoCs, but for all we know Apple could drop support for an SoC for a feature it has not publicly disclosed. They have sometimes dropped support for consecutive SoCs at the same time or at diferent times - which is to say that Apple may drop support for M1 Max and M2 Max at the same time or one or two years apart from each other and there's not really much between them that would make either possibility more or less likely than the other.

If I was a betting man, I'd say that M1 will probably be dropped before M1 Pro/Max/Ultra. But for all I know, I'm wrong.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.