I'm hoping Apple makes some GPU/NPU improvements to make the Max chips more competitive with nVidia for AI workloads. The main benefit with AS now is (potentially) just having tons of memory to throw at the problem -- up to 128 GB vs a max of 24 GB for consumer GPUs. However, I believe the memory bandwidth isn't really competitive with discrete GPUs and apple has made some strange hardware and driver choices (no support for fp64 or fp8, bf16 is slow). I've been enjoying generating with Flux.1-dev on my M3 Max, but it is very slow (about 8 minutes for a 1MP generation using a specialized ODE solver).