As of Summer 2023, do any applications benefit from features unique to the Apple M2?

trwalp · Jun 9, 2023

I am referring to features or capabilities that do not exist in the M1. Please site specific programs and, if possible, how they benefit. The M2 has speed improvements or a few more cores, but in this case I'm interested in whether MacOS or any applications will be able to do things that aren't possible without an M2. The old "not available with M1" exception footnote that might appear sometime in the future.

leman · Jun 10, 2023

Great question! I cannot comment on the existing software, but I can try to least some of these features:

- bfloat acceleration on the CPU: bfloat is a number representation optimized for needs of machine learning. M2 can natively operate on these types, improving ML performance and accuracy. It is possible that applications using CoreML can already benefit from this

- SIMD shift and fill on the GPU: M2 supports a unique pattern that allows it to exchange data between multiple SIMD lanes. This can be useful for image processing apps that have to load pixel data into GPU registers. I do not know if any apps actually do that, but I would be surprised if Affinity or Pixelmator did. It’s fairly easy to implement.

jmho · Jun 10, 2023

leman has pointed out the most important differences. There are also

- sparse depth and stencil textures: basically the GPU lets you "pretend" to allocate a massive texture, but then only allocate and use the parts of the texture you need.

- Hardware lossy texture compression. You can save a decent amount of memory bandwidth by using compressed textures with algorithms that will throw away unimportant visual detail.

- 64-bit atomics. This is a contentious point because who knows if these actually work or not in practice, but Apple lists them in the metal feature set tables as "varies" on devices in both the Mac2 and Apple8 family (aka M2 macs). The most famous example of software requiring 64-bit atomics is Unreal Engine's Nanite.

I have no idea if any of these things are actually implemented in practice though.

Basic75 · Jun 12, 2023

leman said:
bfloat acceleration on the CPU

The M2 supports 16-bit FP numbers in hardware? I completely missed this. Is it an ARM standard or an Apple extension? Is the bfloat16 support in the scalar FPU, the NEON SIMD engine, or both? I should RTFM if I can find one.

Sydde · Jun 12, 2023

Basic75 said:
The M2 supports 16-bit FP numbers in hardware? I completely missed this. Is it an ARM standard or an Apple extension? Is the bfloat16 support in the scalar FPU, the NEON SIMD engine, or both? I should RTFM if I can find one.

NEON does not support bfloat16 but SVE does. It would seem to me that the basic cost of implementing SVE at the 128 bit scale would be pretty low, as much of its functionality is pretty similar to NEON. Several parties have dug into M1 architecture, and I have not heard any mention of SVE being present in the CPU. But it is a SoC, so bfloat16 support could just be a Neural Engine and/or GPU feature.

leman · Jun 12, 2023

NEON does support bfloat16, introduced in armv8.6-a if I understand it correctly. This is supported on M2 as a standard ARM CPU feature, but they also support it in the AVX coprocessor and the NPU from what I understand.

Sonoma also adds bfloat16 to the GPU, so maybe it's also supported there? No idea about this last bit.

trwalp · Jun 12, 2023

While the responses from @leman, @jmho, @Sydde, and @Basic75 leave me feeling awfully humble as a mere "IT guy", thus far it remains unclear if any applications or macOS take advantage of these new M2 capabilities. Or, to go to the core of my interest, I have yet to see a reason to think that an M1 MAX will be obsolete much sooner than an M2 MAX. (currently, buying an M1 MAX MBP can save ~$800).

Thank you to all who have contributed!

leman · Jun 12, 2023

Well, that's a different question

What is your primary use of the computer?

M2 so far is significantly faster in rendering workloads (almost 2x in Blender), and will probably see a big improvement in various ML tasks (e.g. Stable Diffusion and friends, although if you care about that you should probably get an Nvidia GPU). But for general use, photo/video editing, audio, software development etc., M1 will still be more than adequate for a long while.

Xiao_Xi · Jun 12, 2023

Sydde said:
NEON does not support bfloat16 but SVE does.

The next revision of the Armv8-A architecture will introduce Neon and SVE vector instructions designed to accelerate certain computations using the BFloat16 (BF16) floating-point number format.

BFloat16 extensions for Armv8-A

The next revision of the Armv8-A architecture will introduce Neon and SVE vector instruction designed to accelerate Neural Networks using the BFloat16 format.

community.arm.com

leman said:
NEON does support bfloat16, introduced in armv8.6-a if I understand it correctly. This is supported on M2 as a standard ARM CPU feature

If M1 and M2 are compatible with Armv8.5-a, how can they have bfloat16 support?

Comparison of ARM processors - Wikipedia

en.wikipedia.org

dmccloud · Jun 12, 2023

Xiao_Xi said:
BFloat16 extensions for Armv8-A

The next revision of the Armv8-A architecture will introduce Neon and SVE vector instruction designed to accelerate Neural Networks using the BFloat16 format.

community.arm.com

If M1 and M2 are compatible with Armv8.5-a, how can they have bfloat16 support?

Comparison of ARM processors - Wikipedia

en.wikipedia.org

Apple has a license for the ISA, meaning they can add their own instructions to the base set. In fact, some of Apple's additions have been backported into the ARM instruction set. Consequently, Apple can add features which are not present in the current ARM ISA.

leman · Jun 12, 2023

@Xiao_Xi From what I understand bflost16 is an optional feature for earlier ARMv8. These feature sets are already confusing enough. The main point is that M2 supports it.

Xiao_Xi · Jun 12, 2023

dmccloud said:
Apple has a license for the ISA, meaning they can add their own instructions to the base set.

Have GCC or LLVM documented any of these new instructions? If not, how can programs take advantage of Apple's instructions if the compilers are not aware of them?

leman · Jun 13, 2023

Xiao_Xi said:
Have GCC or LLVM documented any of these new instructions? If not, how can programs take advantage of Apple's instructions if the compilers are not aware of them?

AArch64: add support for newer Apple CPUs · llvm/llvm-project@677da09

They're roughly ARMv8.6. This works in the .td file, but in AArch64TargetParser.def, marking them v8.6 brings in support for the SM4 cryptographic hash and we don't actually have that. So T...

github.com

Bfloat support on Apple clang is in header arm_bf16.h and arm_neon.h. The feature flag is not available on my M1 machine, so I can't play around with ut. But anyone with M2 could try seeing whether __bf16 is available in clang and whether __ARM_FEATURE_BF16 is enabled.

@trwalp Sorry so much for derailing your honest question, and thank you for your patience

I don't think you are going to get any straight answers on software support to be honest, as there is just no information.

Xiao_Xi · Jun 13, 2023

leman said:
Bfloat support on Apple clang is in header arm_bf16.h and arm_neon.h.

That patch shows that Apple can add ARM instructions from a higher version of the version that the SoC supports, but it doesn't show that Apple can add its own instructions.

leman · Jun 13, 2023

Xiao_Xi said:
That patch shows that Apple can add ARM instructions from a higher version of the version that the SoC supports, but it doesn't show that Apple can add its own instructions.

Ah, sorry, I misread your question. There are without any doubt Apple-specific instructions on their processors (just look at AMX), but I don't think they are exposed in the compiler.

Xiao_Xi · Jun 13, 2023

Xiao_Xi said:
That patch shows that Apple can add ARM instructions from a higher version of the version that the SoC supports

By the way, ARM let do it.

An Armv8.x-A processor can implement any features from the next .x extension. However, it cannot implement features from any later .x extension.

Documentation – Arm Developer

developer.arm.com

leman said:
There are without any doubt Apple-specific instructions on their processors (just look at AMX), but I don't think they are exposed in the compiler.

This is an undocumented arm64 ISA extension present on the Apple M1. These instructions have been reversed from Accelerate (vImage, libBLAS, libBNNS, libvDSP and libLAPACK all use them), and by experimenting with their behaviour on the M1. Apple has not published a compiler, assembler, or disassembler, but by callling into the public Accelerate framework APIs you can get the performance benefits (fast multiplication of big matrices).

aarch64_amx.py

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

If neither LLVM nor GCC have such instructions documented, does this mean that Apple uses a modified version of LLVM to compile libraries that make use of AMX?

If those instructions are executed on a coprocessor and not on the CPU, why are they considered ARM instructions? Couldn't Apple do the same with a coprocessor with custom RISC-V instructions?

leman · Jun 13, 2023

Xiao_Xi said:
If neither LLVM nor GCC have such instructions documented, does this mean that Apple uses a modified version of LLVM to compile libraries that make use of AMX?

Your guess is as good as anybody else's. Maybe they use assembly directly.

Xiao_Xi said:
If those instructions are executed on a coprocessor and not on the CPU, why are they considered ARM instructions? Couldn't Apple do the same with a coprocessor with custom RISC-V instructions?

Because they are ARM instructions which are decoded and issued by the CPU, like any other instruction. It's just that the execution happens on the shader AMX block rather than inside the CPU core. There is a lot of historical precedent for this style of coprocessor usage, such as x87. The point is that the coprocessor does not run a separate program (as it would be the case with a GPU), but is entirely controlled by a program interpreted by the main CPU.

Yebubbleman · Jun 26, 2023

trwalp said:
While the responses from @leman, @jmho, @Sydde, and @Basic75 leave me feeling awfully humble as a mere "IT guy", thus far it remains unclear if any applications or macOS take advantage of these new M2 capabilities. Or, to go to the core of my interest, I have yet to see a reason to think that an M1 MAX will be obsolete much sooner than an M2 MAX. (currently, buying an M1 MAX MBP can save ~$800).

Thank you to all who have contributed!

Apple generally will cut off support on systems (whether with Intel processors or with Apple SoCs) based on individual component features that they deem necessary. In the case of Intel based Macs, this can be either because Apple wanted to drop support for certain processor features present in older Intel CPUs, but not newer ones (as was the case with Ventura dropping support for anything older than Kaby Lake) or because Apple can't get an updated driver for a given hardware component due to the original manufacturer of that component dropping support for said component altogether.

For Apple SoCs in particular, we can speculate based on what's known between these SoCs, but for all we know Apple could drop support for an SoC for a feature it has not publicly disclosed. They have sometimes dropped support for consecutive SoCs at the same time or at diferent times - which is to say that Apple may drop support for M1 Max and M2 Max at the same time or one or two years apart from each other and there's not really much between them that would make either possibility more or less likely than the other.

If I was a betting man, I'd say that M1 will probably be dropped before M1 Pro/Max/Ultra. But for all I know, I'm wrong.

As of Summer 2023, do any applications benefit from features unique to the Apple M2?

macrumors newbie

macrumors Core

macrumors 6502a

macrumors 68020

macrumors 68030

macrumors Core

macrumors newbie

macrumors Core

macrumors 68000

macrumors 68040

macrumors Core

macrumors 68000

macrumors Core

macrumors 68000

macrumors Core

macrumors 68000

macrumors Core

macrumors 603

Our Staff