There are architectural differences. The Lunchbox Ultra is 2 lbs heavier because the materials used for cooling are different to the Lunchbox Max.
I think speculating on the noise or thermal throttling is pure conjecture at this point.
Why do you think it’s conjecture?
That’s 2 Lbs of extra fans and heatsinks. It’s clearly going to be more heat. It’s based on what I’ve seen any time a chip company has crammed more cores and more transistors onto a die/substrate. It doesn’t get easier to remove heat. Watch how right I am. Ultra will be louder.
It doesn’t matter how much bandwidth there is or how low the latency between one half and the other half of the cores and RAM are. It’s very much akin to the old 2-socket AMD Opterons and NUMA in a way. You’re looking at two CPU’s glued together with a silicon interposer. There will be some performance hit when a left side CPU core needs to access memory on the right side controller.
Apple has never been able to figure out how to properly implement NUMA, which is why the old 8/12-core 2009-2012 Mac Pro’s had such horrible and inconsistent memory bandwidth. They took the easy way hackjob option and made them use UMA. All RAM on both CPU’s controllers was handed over to the OS as though it was one big contiguous block all the same. Wrong. It’s not. As a result, half the time you’d access main memory, it would be on the local controller at about 18GB/s the other half, it would have to go through QPI and pull at 6-8GB/s while competing with other bit of PCIe and GPU data on the QPI bus. Sounds familiar? It should. It’s the same as the M1 Ultra. The difference may be slight, but there will be some performance penalty when accessing half the RAM on these systems when using the opposite half’s GPU and CPU cores.
Ive been using Apple hardware since the early 90’s. The upper top tier stuff is always drool-worthy until you get it in hand and actually try to put it into production use. One step down from top or the second hardware version of a specific model is always so much better.
Max will be faster in lightly threaded tasks, unless Apple has created a zero latency interconnect with infinite bandwidth using transistors that switch at infinite frequency. Only way it isn’t is if Apple has finally figured out NUMA, and the proudly state they haven’t when they call it Unified Memory. That 800GB/s will never all be available to a single core at once.