I have a question for people who know this better than me.
Here we discuss about the relative performance of M4M, M3U, M2U, etc, sort of based on various existing benchmark tests. Is it true that the multi-core test results somewhat depend on how the testing tasks can take advantage of the multiple cores and multiple threads? For example, it's known that Photoshop does a poor job in utilizing the multiple cores. So if the benchmark tests include some tasks like this, the M chips that have more cores will be penalized at least partially because of this.
On the other hand, on a workstation, it is possible that we throw multiple tasks to it, regardless whether each task is optimized for multiple cores. Even if the individual tasks are poorly optimized for multiple cores, can the sheer large number of cores of Ultra make the Ultra outperform Max more than what the benchmark tests would imply? For example, if multi-core scores of M3U is only 1.2 times higher than M4M (sounds like very poor value) because some of the test tasks are not well multithreaded, then maybe when we simply throw many different tasks on an M3U at the same time, can it be more than 1.3 times faster than M4M?
Short answer is you are correct.
If your workflow is designed to take advantage of every CPU (or GPU) core available, then the M3 Ultra will be faster than the M4 Max every time, even though individual cores on the M4 Max are a bit faster. The slight loss in single-threaded speed of the M3 Ultra is made up for by the presence of twice as many cores...
if your workflow makes use of it.
A grocery store with 32 slightly slower cashiers can service more customers than a grocery store with 16 slightly faster cashiers, even if each individual customer takes a little longer per transaction.
If your workflow is not designed to take advantage of every CPU core available (for example, a Windows 11 virtual machine on Parallels limited to using 4 cores on Apple Silicon, regardless of how many your CPU has; or if you need the fastest HTML rendering, which, I believe, is mostly a single-threaded task), then you will likely find the M4 Max to be faster.
However, if you're doing multiple simultaneous instances of this unoptimized workflow, as a multiuser server might do, (as the above example, running
multiple virtual machines all at the same time, which each one busy doing something, and each one individually only using 4 cores), then the M3 Ultra will then likely be faster with them all going full-out since each VM doesn't need to share cores with other VMs.
You have to pick the machine that's right for your work. If your work is an even mix of single-threaded things and multi-threaded machines, then compromises will have to be made. You have to pick faster single-threaded performance or faster mutli-threaded performance. And that increase in multi-threaded performance needs to be able to pay for the extra cost.
It's not uncommon for high core-count CPUs to run at lower clock speed than CPUs with a lower core-count, even when they're both the same generation, due to on-die heat constraints. Back in the Mac Pro trashcan era, you could get Xeon CPUs with 4-cores at 3.7GHz, 6 cores at 3.5GHz or 12 cores at 2.7GHz. They were all the same generation chip, but the speeds had to be dialed down with more cores to stay within the thermal envelope. Which model you bought would depend on your main kind of workflow.
The M3 Ultra versus M4 Max is really no different.
I personally chose the M4 Max because I wanted the "snappiest" Mac available, and my work is mostly
not really multicore optimized — the majority of my work is text-based, office app type of work. But I want general system responsiveness, large complex PDFs to render fast, fast performance for my (one) Windows VM, etc.,
and top performance out of the two — but less frequently used — apps that do use as much CPU and GPU power as is available (Osirix MD and Falcon MD). While a Mac mini M4 Pro would have suited most of my needs, it would lag behind the M4 Max in Osirix and Falcon (mostly due to the 2x faster GPU in the Max over the Pro), and I also wanted the larger array of ports and display support offered by the Studio. I very strongly considered the M3 Ultra, but decided that it wouldn't have performed any better
for me 95% of the time. And that 5% usage case when it would perform (significantly) better wasn't worth the 2x increase in price.