Key M1 Mac Engineer Departs Apple for Intel

deconstruct60 · Jan 10, 2022

Bug-Creator said:
No they couldn't, Intel relies on Windows which they don't control.

If MS would remove 32Bit from Win12 (after making sure that only 64Bit apps can use Win11 features years before) they could start thinking about it AFTER the Win12 adoption rate is >50%.

Remember Apple removed support for 32Bit CPU long before the removed support for 32Bit app long before they removed 32Bit support in HW. Both for macOS and iOS.

Windows 11 moved past 32 bits at the boot layer. ( Yes, Win32 is still dangling around but that isn't the major issue needed for some clean up to happen. ) .

Windows 10 will get desupported in 2025. That primarily what needs to die off. ( as opposed to adoption rate of something after Win11 ) . At that point Intel ( and AMD) don't really need to support BIOS and a bunch 80's and 90's vintage baggage constructs. Saying that "nothing" can happen until. after Windows 12 ships is way too conservative. By 2025 all the 'new' Windows PC systems sold will be Win11 and up based.
There is already been a shot across the bow to the "I need real mode and/or BIOS" folks.

The issue for Windows is that it won't be all that competitive either if keep dragging around first class support for 32-bit mode either. Windows on ARM won't be doing first class 32-bit mode in 2025 either. ( 32-bit mode is being phased out of the smartphone and up class ARM instruction sets also. ).

That is why Windows 11 has begun to chuck some things with "first class" support.

"...
All 16-bit programs run by default in a single virtual DOS machine with shared memory space. However, they can be configured to run in their own separate memory space, in which case each 16-bit process has its own dedicated virtual machine. .....

This subsystem is available in 32-bit editions of Windows NT only. The 64-bit editions (including Windows Server 2008 R2 and later which only have 64-bit editions) cannot run 16-bit software without third-party emulation software (e.g. DOSBox). With Windows 11 dropping support for 32-bit IA-32 processors, development of this subsystem has been discontinued.
.... "

Windows on Windows - Wikipedia

en.wikipedia.org

Move chunks the 32-bit (and older) cruft into emulated virtual machines doesn't need to wait for Windows 12.

So while Intel doesn't "own" Windows . Windows isn't restricted to x86 either. That's just a backwards way of looking at things. If Intel/AMD doesn't give Windows a choice but to go to ARM then it will probably leave over time. Well before any "Windows 12" shows up. It is a little more complicate approach in that need a coordinated " xx years until everyone out of the ancient stuff pool" announcements.

So the countdown clock has already started. Either Intel (and AMD) start doing something or just loose bigger in 3-5 years.

[ They could keep some. embedded and industrial control targeted CPU/SOC variants around where just primarily shrink the processor onto a very mature 4, 3 , 2 "nm" node to get more dies per wafer for "old" OS targets. More focus on running old stuff than bleeding edge performance. ]

Can they flush 100% of the 32-bit opcodes? Perhaps not. But there is a decent amount of now redundant ( 4-5 SIMD instructions sets aren't necessary ) and just plain unused instructions by the vast majority of mainstream users ( Windows 11 isn't going to touch any 16-bit programs). there are some basic 32-bit math ops that would help assist in a faster emulator/virtualization solution. [ 32-bit opcodes get translated to micro-ops anyway in modern x86 implementations. If pumping 32-bit stuff through a 'pre execution' emulation compiler anyway could have some "pre chewed " cud that is 32-bit subset that is cleaner. ( with modern multiple GB system sizes if there is some instruction code expansion that isn't necessary a bad thing. Not looking for max performance out of 18-20+ year old code in 2025 ) ]

Windows 11 cut off much of the x86 CPU line up that is older than 2017 or so. ( so stuff designed specified after around 2015 or so). That was all well into the established and shipping x86_64 era. Again. the 'deprecation' warning already dropped so that by 2025 the warning have been clearly outlined.

The gap from Windows 10 -> 11 was about 7 years. Something like a 12 is probably going to be longer. It would be a huge mistake for Intel/AMD wait around for that to pump some ex-lax/fiber through the instruction set and poop out some of the constipation.

deconstruct60 · Jan 10, 2022

bobenhaus said:
Not really. Intel can add arm instructions to SOC on top of x86 .

They could put an ARM processor into a "x86" package. AMD does this now with the ARM processor regulated to "security" work. Have done this with the Zen line up. Latest Zen 6000 series block diagram

https://videocardz.com/newz/amd-ryzen-9-6980hx-next-gen-6nm-rembrandt-mobile-processor-pictured

As opposed to Intel's approach of having addition instructions and a "security processor mode" on the x86.

However, some user level operating system making a coherent whole out of a both x86 and ARM cores probably isn't worth the hassle with the quirks between the default memory semantics. The security process shouldn't be sharing common resources ( more secure to be 'air gapped' ) .

There was some talk at some point that AMD was going to try to make either a "shared" or "common" micro-operations 'backend' for a processor which would be implemented with wither a x86 and ARM instruction decoder in front. That was probably a reach and wouldn't necessarily have allowed both decoders to feed the same backend at the same time.

Intel already puts non-x86 processors on many of their consumer chips with GPGPU cores. If Intel adds something in the future it is far more likely going to be of the nature of what Apple's. NPU (AI/ML) or image/video processing 'cores' do than ARM. ( not going to necessarily going to be application specific binaries for these cores. )

The other likely addition by Gen 13 or 14 of Intel is Pluton.

Meet the Microsoft Pluton processor – The security chip designed for the future of Windows PCs | Microsoft Security Blog

In collaboration with leading silicon partners AMD, Intel, and Qualcomm Technologies, Inc., we are announcing the Microsoft Pluton security processor. This chip-to-cloud security technology, pioneered in Xbox and Azure Sphere, will bring even more security advancements to future Windows PCs and...

www.microsoft.com

Coming to a laptop near you: A new type of security chip from Microsoft

AMD becomes the first CPU maker to integrate the Microsoft-designed chip into its wares.

arstechnica.com

Qualcomm for Windows SoC just came out with Pluton built-in. AMD just launched their first. If Intel doesn't follow in a decent sized chunk of their mainstream consumer SoC line up , then they will be behind the curve. Is Pluton technically running an ARM core(s)? Probably . However, not likely running windows apps directly either.

P.S. If Intel wants to dogmatically hold onto a discrete security processor then T2 experience would have even deeper traction. Although suspect that the move to "tile"/chiplets will allow Intel with compromise here into a single delivered "package" solution relatively straightforwardly even if technically doesn't make the same exact die.

[ And Microsoft Pluton 2 could be RISC-V is that makes weaving it in easier over the long term. ]

deconstruct60 · Jan 10, 2022

AppleShareholder said:
Intel and Apple don't use the same fab, so I'm not sure what your point is.

Apple is on TSMC 5nm. AMD is on TSMC 7nm. Intel is on Intel. Samsung and Qualcomm are on Samsung. Not an even comparison.

Intel has been a TSMC customer longer than Apple has. The whole Intel ARC product line is composed on TSMC N6.
So yeah they use most the same fab for an intersection of products.

One day Intel may pull most of their GPU product line out of TSMC, but for the next 3-5 years it will be there along with a variety of other Intel products.

Apple does more quantity, but Intel has used TSMC for a long while for less highly visible products.

AppleShareholder said:
Fabrication is the hardest part.

Fabs are the part where far more likely going to run into an unpredictable "show stopper". There are more 'external inputs' that aren't directly under the companies controls.

e.g.

Fire at ASML Could Disrupt EUV Fab Tool Supply

ASML is still assessing the impact of the January 3 fire.

www.tomshardware.com

Intel backported their Gen 11 core designs back to 14nm with 'Rocket Lake' desktop. ( wasn't a particularly good outcome but it was doable. ) . Going 'forward' with the fab tech doesn't have as many possible "work arounds".

It is a bigger dance with a broader set of entities involved. Logistically it is harder. Which is somewhat a different dimension of hard.

The 'shape' of the difficulty is different here , so this threads back-and-forth drifts into apples versus oranges on some unidimensional metric of "hard". Trying to measure two dimensional space with a one dimensions tool doesn't work so well.

throAU · Jan 10, 2022

deconstruct60 said:
They could put an ARM processor into a "x86" package. AMD does this now with the ARM processor regulated to "security" work. Have done this with the Zen line up.

Thing is (not referring to AMD above, but intel supposedly adding ARM chipset(s))... why though? The big problem with x86 derivatives is complexity, bolting on an ARM core is just going to add to the baggage not clean things up.

Apple has proved that "decent" (well beyond decent actually) x86 performance can be obtained via translation so the opportunity is now there for some real innovation in terms of performance. Also, most modern applications are written for the browser and the underlying platform is mostly irrelevant (so long as it runs Javascript fast). Most non-modern business apps run fast enough on hardware from 10 years ago (yes, yes games and niche apps are exceptions).

I'm not saying intel should go ARM necessarily, but I'm sure they have engineers with good ideas that aren't practical to implement on x86 as it is due to the baggage.

Dump the baggage! Go ARM, roll your own, Risc-V, whatever!

ARM does probably make sense due to the existing software library, but if they provide a decent translator for the transition intel are big enough to move the PC industry.

edit:
As far as Alder lake being competitive with or outperforming M1 in single core (at god knows what power budget - will believe it when I see it) goes.... should hope so. Its shipping in volume 18-24 months after M1 did.

ericwn · Jan 10, 2022

cmaier said:
No it isn’t. He agrees with every one of Apple’s political leanings.

Otherwise he wouldn’t have stuck around for like eight years with them anyway.

deconstruct60 · Jan 10, 2022

Intel put a finance major in charge of the Client Platform division ....

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

... so probably did need to beef up the CTO slot of that division.

[ what the frack is a "Chief Revenue Officer". Intel could de-bloat some of that titles they throw out too. ]

deconstruct60 · Jan 10, 2022

throAU said:
Thing is (not referring to AMD above, but intel supposedly adding ARM chipset(s))... why though? The big problem with x86 derivatives is complexity, bolting on an ARM core is just going to add to the baggage not clean things up.

Not necessarily. If running two different OS on split RAM allocations it is more akin to putting two computers on one chip. One dominant OS cores and the other would run a more than very fast sandboxed world.

For example.

https://blogs.windows.com/windows-i...droid-apps-on-windows-11-to-windows-insiders/

People run remote desktops over cloud infrastructure. It is substantially easier to run a virtual network connections (with low latency ) to the other side of the SoC.

it isn't that it is complex as it would be taking up lots of space if trying to cater to folks on some core count performance benchmark.

x86 biggest problem is constipation ( hoarding of relative ancient opcodes ) far more than "complex". Backward compatibility is useful up until a certain point. A 10 year window going back has different utility than going back 30 years. The more baggage drag around at some point turns into a dual edged sword. Keeping some "anti change" customers around , but probably also loosing leading edge ones also.

It would be one thing to a have one "solve polynominal" opcode but to have four versions of it as try to get to a better one over a couple of decades is quite a different problem.

throAU said:
Apple has proved that "decent" (well beyond decent actually) x86 performance can be obtained via translation so the opportunity is now there for some real innovation in terms of performance.

Not really much of a 'huge proof' there. Last 10+ years that is pretty much how most high performance x86 implementations worked. x86 is tranlated into micro-ops. Micro-ops actually dispatched to the internal function units.
Apple is primarily a static translation (i.e., write another binary and store it to disk also) that consumes more nominal storage space but it has tons of overlap.

[ Even ARM instructions can be 'lowered' into micro-ops. ]

Apple has some memory semantic gap handling and a few other mismatches. But the write once - use many is a performance tradeoff for increased storage. If all dynamically translated, then you wouldn't see the same performance. ( it is largely paid for before the app runs. )

throAU said:
Also, most modern applications are written for the browser and the underlying platform is mostly irrelevant (so long as it runs Javascript fast). Most non-modern business apps run fast enough on hardware from 10 years ago (yes, yes games and niche apps are exceptions).

Those "Electron" apps do consume more resources on modern machines. Kind of chuckle when I run one of those at the old time Unix-forums-wars complaints about how Emacs was such a "hog".

Lost of modern applications are pushing computation work off to non CPU cores also. So limits impact the decoder is really having anyway. Work pushed to a matrix unit , Ai/ML unit. GPGPU isn't going to be largely capped by a decoder.

throAU said:
I'm not saying intel should go ARM necessarily, but I'm sure they have engineers with good ideas that aren't practical to implement on x86 as it is due to the baggage.

The notion that the initial decoder to the micro-op cache/register is the only place could come up with some x86 "good ideas" is rather limited. The notion that the decoder all by itself is holding everything back is being a bit overblown in the forums.

If tasking a modern x86_64 compiler to construct a new binary that is targeted to cleaned up subsets of x86 it isn't going to construct a bunch of weird esoteric opcodes in vast majority circumstances. Not loosing all that much. Toss out the self-modifying code antics and then a relatively large micro-op cache will even out the playing field also.

throAU said:
Dump the baggage! Go ARM, roll your own, Risc-V, whatever!

ARM does probably make sense due to the existing software library, but if they provide a decent translator for the transition intel are big enough to move the PC industry.

There-in lies the rub. The x86_64 in the PC space should be at least (if not much more) substantial as the ARM library code. That is why doesn't make much sense for Intel and AMD to dump it because it have substantive inertia. The core issue is when does leaning inertia turn into laziness. Hoarding largely abandoned opcodes because 3.2% of the user base uses it more than a couple times a day. And that number shrinks 0.3% every 2-3 years for the last 10 years.

x86_64 isn't better because it is 64-bits versus 32-bit. It substantively better because it has a decent amount of explicit registers to use. Also other "non complex" additions.

throAU said:
edit:
As far as Alder lake being competitive with or outperforming M1 in single core (at god knows what power budget - will believe it when I see it) goes.... should hope so. Its shipping in volume 18-24 months after M1 did.

That somewhat cuts both ways. Apple is on. N5 and Intel is on N7 . Two substantively different cache sizes and the decoder is a major difference?

Alder Lake is about as heavily optimized for servers as it is for desktop. The AVX-512 subsystem is there just turned off in the P cores. It isn't a primarily laptop optimized implementation. At high SIMD HPC jobs the M1 isn't much to write home about either. It isn't primarily about instruction set codes. It is about focus.

throAU · Jan 10, 2022

deconstruct60 said:
Not really much of a 'huge proof' there. Last 10+ years that is pretty much how most high performance x86 implementations worked. x86 is tranlated into micro-ops. Micro-ops actually dispatched to the internal function units.
Apple is primarily a static translation (i.e., write another binary and store it to disk also) that consumes more nominal storage space but it has tons of overlap.

Yeah but that instruction decoder to micro-ops is not free or without its drawbacks.

deconstruct60 · Jan 11, 2022

throAU said:
Yeah but that instruction decoder to micro-ops is not free or without its drawbacks.

It is a trade-off not a strict drawback. All of the systems that extensive register renaming and extensive instruction reordering translate the opcodes into something new. Even Apple is doing some rewriting.

One of the core issues is soaking up the space and transistor budget for an opcode that only comes by once in a 'blue-moon'. That is budget and space could have assigned to doing something else. If most apps have moved past multiple decade old MMX then can get rid of that and assign those resources into "paying for" the trade-off making. Are you going to get to exact parity with Apple's specific ARM opcode design? No. Neither are other ARM implementation.

Can get something by using more transistors to solve the problem.

"... Notably last year AMD’s Mike Clarke had noted while it’s not a fundamental limitation, going for decoders larger than 4 instructions can create practical drawbacks, as the added complexity, and most importantly, added pipeline stages. For Golden Cove, Intel has decided to push forward with these changes, and a compromise that had to be made is that the design now adds an additional stage to the mispredict penalty of the microarchitecture, so the best-case would go up from 16 cycles to 17 cycles. ...
....
... Intel states that the decoder is clock-gated 80% of the time, instead relying on the µOP cache. "

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

80% of the time in a tight loop that the application code is focused on at the time is held inside the L(0.5) [below the L1 ] instruction cache that is past the decoder. Yeah paying a "price" for the large micoop cache but getting increased performance back by having it.

If go from a 180mm2 die to a 200mm2 but beat the competition on performance is that really the "end of the world". In many cases , no. For a smartphone not as many, but not trying to make a smartphone SoC.

Apple's unusually large reorder buffer "skins the cat" a slight different way but similar "suck the loop inside past the decoder". It isn't "free" either.

hasanahmad · Jan 12, 2022

DeepIn2U said:
He was previously at Intel prior to Apple.

he is just sealing his legacy, he was at Intel when it was in downward trend already and jumped to apple , now that Apple has delivered he goes back to the downward trend for more $

ericwn · Jan 12, 2022

hasanahmad said:
he is just sealing his legacy, he was at Intel when it was in downward trend already and jumped to apple , now that Apple has delivered he goes back to the downward trend for more $

Assumptions, assumptions.

Search

Search

Key M1 Mac Engineer Departs Apple for Intel

deconstruct60

macrumors G5

Windows on Windows - Wikipedia

deconstruct60

macrumors G5

Meet the Microsoft Pluton processor – The security chip designed for the future of Windows PCs | Microsoft Security Blog

Coming to a laptop near you: A new type of security chip from Microsoft

deconstruct60

macrumors G5

Fire at ASML Could Disrupt EUV Fab Tool Supply

throAU

macrumors G4

ericwn

macrumors G5

deconstruct60

macrumors G5

AnandTech Forums: Technology, Hardware, Software, and Deals

deconstruct60

macrumors G5

throAU

macrumors G4

deconstruct60

macrumors G5

AnandTech Forums: Technology, Hardware, Software, and Deals

hasanahmad

macrumors 68000

ericwn

macrumors G5

Our Staff