I'm not so sure of that. First we're talking about disabling out of order execution, but this doesn't necessarily limit instruction prefetching or disable macrofusion of compare and branch type instructions.
A lot of the gains over the past decade have been on particular types of workloads, particularly anything that can be reduced to dense arithmetic sequences. We gained implementation of fma3 with independent issuing on ports that previously issued standalone multiply or add instructions. We also saw 256 bit operations added with AVX, up from 128 bit (AVX512 comes in too many variants and it's still too immature). Newer cpus (Haswell and newer) also issue independent loads on up to 2 ports in a given cycle. Newer cpus also allow folding loads that are not guaranteed to be aligned.
There are a few vtest instruction variants, which overlap with the ones I mentioned, but I'm not sure that intel really supports speculative execution based on that. If a more recent compiler is targeting that anything simd based, it's likely to apply partial unrolling anyway.
I could see the low level code taking more of a dive, since you have more branchiness. I would still expect it to be highly dependent on your workflow and how much time is spent in the kernel. I mean I wouldn't expect the need to put up a few memory barriers to wipe out a decade of performance gains in compute bound performance critical cases. The scnearios that look like they hurt based on your link are those with a lot of memory mapping and unmapping or possibly driver code?
You make good points, I didn't took in consideration the amount of kernel/driver code time versus the total application time.
If performance hit most affects kernel and driver code, it's not so horrible when you amount to everything. Storage benchmarks depends a lot on kernel and driver code, so probably it's why the performance hit is bigger there.
After re-thinking this, I'll probably wait for real world tests outside of the Intel lab conditions to reassess my expectations and see what real world scenarios have the most performance hit and with processors models. Maybe it's not so bad after all…
[doublepost=1558067735][/doublepost]
Forgive my ignorance, but is this a simple fix that Apple and/or Intel is simply refusing to address? Is there a way to protect 2009 - 2012 Mac Pros *without* disabling hyper threading that just isn't available yet?
How can Apple or Intel possibly stand by the position of "not fixing" a gaping security hole in one of their products - especially products aimed at professionals? If enough of us make noise, maybe we can change their mind... But again, I have no real knowledge on this subject and defer to the wisdom of Alex and others.
One thing that Intel is doing with this announcement of not supporting Nehalem and Westmere microcode corrections anymore is to indirectly force the replacement of Nehalem and Westmere processors still in use in the enterprise environment. Mac Pros is not the Intel focus, so they probably will easily absorb the flak of this commercial decision.
Microcode is needed to implement buffer cleaning between rings directly on the processor and not to depend on kernel mitigations that have bigger performance penalty. Some processors need both the microcodes updates and the kernel code, so it's not a solution that works for all generations of processors. Without the microcodes updates, the full mitigation for MP5,1 is to disable hyper-threading.