Qualcomm producing anything in the same performance and energy use ballpark as the M1 is an existential threat to Intel and AMD. It's not speculation that ARM is more efficient, it's common benchmarks that it uses one third the power at the same performance levels. Its also clear that process size is not a significant part of the difference, each process step is 10-15% more efficient, not 70%.
That "one third power" is not primarily because of the instruction set. There is a major silicon design component to that. AMD and Intel don't build entirely "laptop first" CPU core designs. AMD is relatively clearly server first with "tickle down" to laptops. ( same chiplet used in desktop and server design. Just clock setting and different I/O chips. ) Intel is a bit more laptop focused but there too the designs are leaning toward "maximum turbo overclocking" so as to win tech press 'porn' benchmarks and have product to sell to the "tricked out super cooler" enthusiast crowd.
"... Usually a wider decode consumes a lot more power, but Intel says that its micro-op cache (now 4K) and front-end are improved enough that the decode engine spends 80% of its time power gated. ... "
https://www.anandtech.com/show/1704...hybrid-performance-brings-hybrid-complexity/5
If the "x86 decoder" is idled (or turned off) 80% of the time how is it going to consume 30% more energy than Arm decoder. It isn't even 'on'
Neither AMD/Intel's x86 nor Apple's M1 series directly execute the instruction sets that are being hyped here. They all do micro-ops. And there, there isn't some humungous gap. Behind the front end decoder is where most of the power is being consumed on "high performance" load. If actually add/subtracting/multiplying/dividing at the highest intense pace ... that is going to consume lots of power. .
Secondly, it isn't really "apples to apples" comparison when AMD, Intel, and Apple are all on different process nodes. And also where are even inside of a node ( do they choose to stretch the clocks past where the baseline for node design is to appeal to the overclocker crowd. )
Node differences highly contribute to differences in the sizes of L1/L2/L3 caches. The main internal bus design. etc.
Thirdly Apple throws some functionality out the window to conserve power. Modular Memory? toast (AMD/Intel systems do DDR4/DDR5 just fine). Modular dGPUs.. so far in M1 line ? toast ( chuck x16 PCI-e bandwidth provisioning out the window).
Finally, x86 is long overdue for retiring some of the "tangent forays" from the 80's , 90's from the instruction set. Either from "high performance execution speed" ( deprecated into "slow speed mode" ) or just all together ( does an instruction set really need 3-4 SIMD opcodes sets? Probably not. MMX could die and it won't really be an issue for modern 64-bit optimized code. ). A contributing factor to the decoder power consumption problem is wasting power on decoding instructions that statistically
almost never come in normal user usage. A contributing reason why Apple went to 64-bit Arm faster than everyone else was not so much to using 64-bit pointers or memory addressing... it was to "throw away" 90's Arm instruction options overhead they saw little point in dragging along.
Yes the variable length x86 opcode prediction 'tree' has more overhead than a more regularly structured opcode design. But also do not have to make the prediction 'tree' pragmatically gratuitously as constipated as possible. legacy x86 is a Constipated Instruction set more than it is a "Complex" Instruction set. It is not the variability that is top problem at this point. It is 'hoarding' of stuff not particularly using. Yes, there will still be instruction decode overhead for the still quite large set, but at least would be for instructions with real "value add" for most users.
[ both AMD and Intel have relatively large number of CPU SKUs. There are some sub populations that are running 90's design embedded OS/apps for equipment and other things. Or 'stuck' in some early 2000's Windows variant for some proprietary app. There could be legacy plain x86 processors for that code. And something slimmed down for Windows 11 (or > 11 ) and post 2010-11 apps that have modern foundations. ]
And Intel/AMD also can't continue to let Graviton and other ARM servers grow over 100% a year in the hosting market either for long. I'm a value investor, Intel is cheap but even if odds of this risk are only one in ten, thats too much risk.
If the market is growing in units enough so that Intel and AMD are not really dropping much in their units sales that it is no where near as dire as you point. Arm servers have high triple digit growth because their relative size is small. Going from 50,000 to 100,000 units year over year is 100% growth, but does that really do major damage a 50M/year market? Not really.
Similar issues here though too. Graviton (and the other Arm Neoverse baseline design cores ) are not single threaded, single user (and non virtualized ) , "drag racing" speed daemons. Is 90's era MMX opcodes really making 64-bit Apache/Ngnix workloads run lots faster?
AMD's upcoming Bergamo isn't completely dropping the x86 instruction set to target cloud. Probably more internal design choice changes than there is dropping/replacing x86 opcodes version the mainstream server. ( wouldn't be surprising to see AVX-512 skipped (or implemented without top speed to completion as a priority), but also it will not surprising to see SMT dropped. No "super duper turbo" mode either. That latter two have nothing to do with instruction sets. Both of those design choices will help to get the core implementation size down, but will be less well rounded across possible general market workloads. That is OK because won't really sell this core into those markets. )
It is not so much AMD/Intel are "letting" the Arm server offerings come into the market as much as the market is somewhat balkanizing. Just don't throwing generic, "does everything", 1U pizza box modules at the whole data center.
The era of having one server chip that does "everything for everybody" is more detached from market realities ( Intel offers. Xeon SP , Xeon D. , Atom C processor packages aimed at different server markets.). It is going to get harder for Intel to try to make "everything for everybody" all 100% in-house.
Graviton is bigger threat to AMD/Intel more so not because it is an Arm instruction set , but that Amazon doesn't care about profit margins on Graviton directly. The service needs to make a profit; not Graviton in and of itself. That is a dual edged sword though. Also means Graviton won't have much of an impact outside the service it is embedded in.