I don't see any theoretical or practical reason why this should not be doable. You take x86 binary, translate it to LLVM IR, then let LLVM optimise and output the ARM binary. This challenge is equivalent to "classical" compiling and optimising (sans of course type meta information you have in a high-level language). AFAIK, there are projects that attempt to do just that (I don't know how successful they are though). As to vector instructions, they might be slightly different between the architecture sets, but NEON and SSE/AVX are quite similar in spirit — as far as I know (has been some time since I last done any hand-coded vector assembly) they have comparable types of instructions — and I see no fundamental problems into mapping one to another. Now, stuff like AVX-512 with its mask registers etc. is another beast altogether and emulating those would probably be very slow, but thats not a problem one has to deal within the next few years I guess.
To sum it up, I am quite optimistic about binary recompilation of this sort. Again, this is purely theoretic. I am not advocating that Apple moves to ARM. I am merely interested in discussing if/how this could be accomplished and what the practical implications of such a move might be.
There is no x86 binary to LLVM IR, compilers are 1 way operation. You don't go from machine specific code back to any intermediary that can be recompiled without some form of emulation
not to mention you would need translations for ALL the libraries referenced by a particular program, think WINE for linux, and that's for binaries in the same architecture.
And like I already said, ARM doesn't even have equivalent super-scalar instructions used by many professional tools (video editing, computational)
Could you explain what you mean by "super-scalar instructions?" I am not familiar with the term. I know what a superscalar CPU is but I never heard the term be applied to instructions or instruction sets.
Ever heard of dissemblers? Why would you even say that going from x86 to LLVM IR is impossible? Its just another compiler, and a fairly straightforward one (its just the details that are tricky to get). You have one (well defined) language, you transform it to another one (well defined) language.
And anyway, just first two google search results on the topic:
https://github.com/trailofbits/mcsema
http://llvm.org/devmtg/2013-04/bougacha-slides.pdf
Edit: and a x86 to C/C++ decompiler:
http://derevenets.com/examples.html
Third-party libraries could be compiled the same way I have described above. System libraries are just provided by the OS. The compilation to different architecture would be completely transparent to the application.
Could you explain what you mean by "super-scalar instructions?" I am not familiar with the term. I know what a superscalar CPU is but I never heard the term be applied to instructions or instruction sets.
Superscalar instructions are superscalar CPU architectures with additional instructions that support superscalar operations such as Intel's VT-d, VT-x, SSE.
Disassemblers go from machine code to ASM of that architecture ie you will get x86/64 ASM, not cross compatible c/c++.
That alone does not let you convert the binary with 1:1 conversion to a new architecture because you can't guarantee same memory access patterns in the new host. Suppose host code had N bytes reserved for instruction memory and Y bytes reserved for data. You would have to analyze that, simply allocating the same amount in LLVM IR would break because the code size for IR is not 1:1 and the memory usage is not 1:1.
Super-scalar instructions utilize multiple hw datapaths in parallel, for example, if you are doing video encoding since many things from frame to frame can be done in parallel, you can do things like super-scalar addition operations which will instead of doing single register + register = register, will do a full vector of values. This is a very basic case, but instructions like that which are the cornerstone for most high performance applications just won't exist under an ARM ISA.
Intel Atom supports out-of-order execution since Silvermont.5. Again, this is called SIMD and has nothing to do with superscalar execution. Please don't invent new terminology in order to make a point. Intel Atom is not superscalar, but it supports SSE.
Intel Atom supports out-of-order execution since Silvermont.
What about platform specific ABIs? Backend specific behaviors? Differences in consistency models?4. I don't see why it matters. Of course one would recompute the offsets appropriately. Thats trivial. LLVM has very flexible support for data types and can match all native data types used by x86 and ARM CPUs. The only tricky part is when it comes to alignment of data and when code makes assumptions about data sizes. Which, luckily, is not an issue in this particular, because x86-64 and A64 have exact same data size and alignment specs.
1. Sigh... Superscalar is a term applied to CPUs which reorder/reoptimise the instruction stream and are thus able to execute multiple instructions in parallel. This is also often referred to as instruction level parallelism. Both modern Intel x86 and Apple ARM CPUs are superscalar (exception: Intel Atom, which is not superscalar).
2. SSE/AVX are a SIMD/vector instruction set extensions. They have nothing to do with being superscalar. ARM also has its SIMD instructions which are very similar to SSE/AVX. Its called NEON.
3. In my previous post I have linked a number of tools that decompile x86 into different representations, such as LLVM IR and C/C++. I find it mildly amusing that you still deny the possibility of such a tool even after it was presented to you.
4. I don't see why it matters. Of course one would recompute the offsets appropriately. Thats trivial. LLVM has very flexible support for data types and can match all native data types used by x86 and ARM CPUs. The only tricky part is when it comes to alignment of data and when code makes assumptions about data sizes. Which, luckily, is not an issue in this particular, because x86-64 and A64 have exact same data size and alignment specs.
5. Again, this is called SIMD and has nothing to do with superscalar execution. Please don't invent new terminology in order to make a point. Intel Atom is not superscalar, but it supports SSE.
6. ARM has had vector instructions for years: https://www.arm.com/products/processors/technologies/neon.php
7. All Apple ARM fully support advanced ARM vector instructions and Apple even offers the devs a number of well-implemented numeric libraries that take full advantage of such instructions, both for Intel and ARM side.
------------
Final note: I don't think that I want to continue this discussion any longer. You seem have some experience with programming and you seem to have read some tech articles here or there, for all good it did you. But its also clear that you are very adamant about ignoring your lack of basic education, like the fact that you don't know what superscalar means or that ARM has SIMD instructions. I already have to explain some of this stuff in my day job as university lecturer and programmer, so you'll have to excuse me if I get bored quickly if I also have to do it on an internet forum.
You are confusing out of order execution and superscalar, ie if you have a multiply op then a add op that reference completely separate registers, but the multiply is waiting for a previous instruction, you can do the add instruction first and later retire the op in order.
SIMD, and all vector operations, are superscalar execution... SIMD - Single Instruction Multiple Data - can't process multiple data in a single pipeline stage without being superscalar.
ARM superscalar arch is not for high performance, it's for better efficiency for a particular power envelope.
x86-64 don't have the same instruction data size ... x86-64 has variable size instructions, ARM does not
64 bit only refers to memory ALIGNMENT
What about platform specific ABIs?
Differences in consistency models?
Again, the common definition of superscalar is the ability to execute multiple instructions at once. Which is why most relevant superscalar CPUs are out of order. These two things go hand in hand in modern CPU design.
You don't have to be superscalar to have wide ALUs (e.g. older GPUs, I am not sure if modern ones are superscalar designs). SIMD = single instruction — a CPU can have SIMD without the ability to execute multiple instructions in parallel.
Which can be said about any superscalar CPU. Superscalar and out-of-order are there to maximise the utilisation of execution units.
I fail to see why this is relevant. The instruction napping won't be one to one in any case.
It first and foremost refers to basic pointer and register size. Anyway, as I mentioned before, the sizes and alignments of basic data types (chars, ints, longs, *void etc.) are the same for A64 and x86-64, which means that all data structures are binary compatible.
Superscalar is completely independent from out of order, ex: Intel's Xeon Phi. Completely different optimization at play.
Please provide example of any general purpose processor that has wide functional units without superscalar. How would you utilize it in a load/store machine?
Superscalar instructions are used when ILP can replace single issue instructions, for example, two arrays being added together and result put in third array, instead of increment pointer by 1 data value, doing the add, and writing to memory, normal instruction would be replaced by a single SIMD instruction that would load N bytes and do N adds in parallel then write the N results to memory, where N is width of that particular instruction.
Instruction mapping won't work period. If original program uses some memory mapped feature in one instruction set, new instruction set wouldn't know how to re implement it unless it was manually patched. It would pretend like it was a regular memory access and the program would fail. Example: run x86-64 version of perf, convert to arm64 binary -> kernel panic when trying to access memory locations that map to counter data that doesn't exist.
No, chars ints long *voids don't exist to the hardware.
Also pointers don't point to 64 bit data types, both x86-64 and aarch64 are byte addressable, not to mention only lower 48 bits are used for addresses.
why are you so worried and concerned? take a chill pill and enjoy the sun.I am a bit concern that the only true leak for the MacBook Pro was the picture of it which showed an empty hole for the magic function keys.
I am worried that the production and distribution chains are far from ready (so no leaks because nothing to show) and we are gonna to have to wait a couple more months before seeing anything in the stores.
Am I the only one?
[doublepost=1477074207][/doublepost]In comparison, a month before the iphone 7 event, we had countless of pictures, both from the final product and several hardware components plus the full final specs pages...
?I did: Intel Atom before Silvermont.
This is not an optimisation that the CPU usually does by itself, at least, not something I am aware of. Auto-vectorization of the kind you talk about usually is performed by the compiler, and it has nothing to do with ILP.
Thats why one would fix addresses while translating. Its not difficult to do.
They kind of do, because its something that the CPU can directly operate on. As long as the CPU has instructions that can specifically manipulate bytes, words (of different type) etc., these are basic data types that CPU supports. The C language datatypes are a very low-level abstraction and they map more or less directly to what the CPU can do. So it makes a big difference whether a pointer on a platform is four, eight, or 16 bytes.
Where did I clam that pointers point to 64 bit data types? The relevant part is that pointers themselves are 64 bit.
?
Intel Atom was an in order 2-wide superscalar arch prior to Silvermont, out of order was added for Silvermont...
Apple, Qualcomm, NVIDIA do it. They do not implement ARM microarchitecture, VLIW implementations of ARM combine multiple ARM instructions into a single instruction which is decomposed into micro ops. Directly related to ILP.
You can't translate memory mapped addresses from one architecture to another (or even within the same architecture unless its from the same family), memory mapped addresses refer to a SoC peripheral, not a real memory address. You would have to recompile from source with code changes. See perf example above.
There is no such thing as basic data type
Example x86-64:
83C0FF add eax,byte +0xff
You can't do add eax,byte +0xffffffffffffffff, you would have to load it into a register or have it at a memory offset and use a different instruction mode.
Pointers themselves are 48 bit in most modern 64 bit architectures.
Sorry, my mistake. True, Atom is limited dual-issue. Ok, then let's take ARM Cortex-A5 CPUs: single issue, but SIMD instructions.
Why are we talking about VLIW now? I though we were talking about auto-vectorisation like in your example. What CPU can autovectorize loops and turn non-SIMD instructions into SIMD instructions?
Is it something a user space application needs to do? Sounds more like driver-side stuff to me.
Most CPUs can't natively operate on 3-byte words. If they could, then the C int on that platform would most likely be 3 bytes. Thats what I mean with "C types being a low-level abstraction".
I fail to see how this is relevant. Its just implementation specifics. The fact remains that the CPU can operate on bytes, 16-bit words, 32-bit words and 64-bit words. Thats it. And SIMD datatypes of course.
I don't even know how to reply to this. True enough, only 48 bits are used. But the pointers themselves are still 64bit. Good luck storing a 48bit pointer.