Benchmark, however flawed it might be, by far is THE most scientific view of a machine’s performance in different metrics and can be compared relatively objectively. Now, whether user takes benchmark seriously or not matters little of other users decisions, but none should dissect the importance of benchmarks recording the progression of our hardware design and evolution.
I haven't and won't watch the video, so I'm not defending Ritchie's point here, just my own point of view.
Benchmarks, when performed carefully and with skill, give you a very precise measure of
exactly one thing.
The problem isn't with the benchmark. The problem is with how benchmarks are being generated, marketed, and misapplied.
So, as I mention above, benchmarks should be performed carefully and with skill. Most of the YouTube yokels I see peddling results are neither careful nor skillful and they aren't incentivized to be. What they want to be is fast and splashy. The first to post something that looks like data on a new product gets paid the most whether or not that data is accurate or meaningful. That doesn't lend itself to careful repetitive testing, of dozens of experiments to isolate variables. What we generally get is "look I did something and now I've proved something". That's garbage.
Likewise, there's incentive to market to an audience by feeding an existing narrative-- which is the antithesis of any sort of scientific analysis. YouTube markets through divisive headlines, and they love to play like they're uncovering secrets that big tech is trying to hide. I guarantee MaxTech for example, has generated a ton of views through threads here on MacRumors using them as an argument that "the M2 Air is a dud" or "Apple is robbing us". Will the MaxTech guys say they're playing to their base? Of course not. But "if you tell me where a man's corn pone comes from, I'll tell you what his politics are."
And then a benchmark which, again, precisely measures exactly one thing, starts to get extrapolated. We're told what that narrow benchmark means in the bigger picture-- but there is no bigger picture, there's a narrow benchmark and you need to be really careful drawing broader conclusions about what it means for performance and even more careful drawing conclusions about what the manufacturer's intent was in their design decisions.
The SSD performance is a classic example of benchmarks gone wrong. What does a throughput test actually tell us? It tells us that if you create 5GB of data procedurally and write it direct to disk, probably on a fresh drive, then read that same data back but don't use it, there is a notable difference in performance between the 256GB and the 512GB SSDs. That is the only thing that benchmark means. It doesn't say anything about file copies, it doesn't say anything about media import or export, and it certainly doesn't say anything about what happens when you're swapping 16kB pages from compressed virtual memory.
But it's being used to support the argument that it will affect performance for casual users who don't quit unused applications, which is utter nonsense and not in any way supported by a throughput test.
Before that, it was arguments about CPU temperatures where someone measured something using a tool that it turned out wasn't even tuned to the hardware they were running it on, and engineers with expertise in silicon design were being dismissed with YouTube soundbites.
So, if you're going to make the argument that benchmarks are the only scientific way to analyze a system then you have to also take the view that they're only meaningful to scientists and a lay audience isn't equipped to interpret them. Benchmarks are incredibly useful to the people who understand them, and just as importantly understand their limits, but they're being used (often unwittingly) to give a false sense of scientific credibility to bad conclusions.