Re: Performance in 32-bit mode?
I wish I hadn't missed this one...
Originally posted by wdodd
I understand that the PPC970 will be able to run existing 32-bit powerpc binaries by switching into a compatability mode. I also understand that this is dynamic in that you can alternately execute 64-bit code and 32-bit code.
Is there a performance penalty on the 970 for executing 32-bit instructions when it really wants to be fed 64-bit instructions? I'm thinking back to the days when we switched to PowerPC and had to deal with fat binaries, compatability mode, context switching performance penalities, etc. That really held back the advantage of PPC for a time. Will we see the same thing as we go to the 970 (where there is no improvement and actually a slight penalty until everything is 80%+ 64-bit native)?
1. The compatability mode for running 32-bit PPC code on a 64-bit PPC processor is actually insanely simple. It disables all 64-bit integer instructions, and all architectural registers that are 32 - or 64-bit depending on the base architecture are presented to the program as 32-bit. When a 64-bit PPC implementation is running in 32-bit mode, all of these 32/64-bit registers have their top 32-bits cleared to zero (in hardware). The performance hit is infintesimal if existant.
2. There isn't a concept of 'being fed 64-bit instructions' and is a prime source of confusion. When we talk about 32-bit instructions on a PPC we typically mean instruction size. When we talk about 64-bit instructions on a PPC, we typically mean instructions that are only defined on 64-bit implementations of the PowerPC processor. Both standard PowerPC instructions and 64-bit implementation only PowerPC instructions are 32-bits wide. In fact simply due to the law of averages, a 64-bit PowerPC implementation will likely be running instructions that don't operate on 64-bit integer data types more often than those that do.
3. The context switch from 32-bit to 64-bit and back is mostly a matter of allowing the hardware to operate on 64-bit integer data types natively. Back when we first switched to the PowerPC processor the context switch performance hit was due to having to switch to and run in an emulator (of the 68K processor series). Since switching to PowerPC-32 is not an emulated state, the cost should be near none unless it is done very rapidly (which is not likely).
Originally posted by wdodd
I ran across this article which implies that Apple would have to rewrite portions of Mac OS X for it to work on the PPC970 and arbitrate for 32-bit apps so they can run unmodified on this hardware.
Makes me think that there might be significant performance problems trying to run 32-bit apps on this new chip.
1. The portion needing rewrite would be minimal. The entire requirement is that for a 32-bit OS to work, it would simply need to switch the CPU to 32-bit mode on startup (since it starts in 64-bit mode) and for the OS to have to deal with anything specific to PowerPC 64-bit implementations (of which nothing major comes to mind).
2. There are no performance issues on running 32-bit applications. Moving an application from 32-bit to 64-bit may be trivial or not depending on how the programmers wrote their code. In general it is not just a recompile, but it is also not a huge reoptimization effort either (beyond optimizations for the processor itself). The main problem in running 32-bit applications on a 64-bit architecture is what happens to the program when the register unexpectely goes beyond 32-bits of useful information. If the register is holding a pointer, then the program will crash, and otherwise the program may start producing bad data (which may lead to a crash later).
Originally posted by wdodd
My question isn't so much, "can they do it?" as it is "can they do it in a reasonable time frame and achieve acceptable performance for 32-bit apps so it still feels like a leap forward for Mac users, particularly for professional users."
If it takes 64-bit native code to realize all the potential performance of the 970, then Apple has a much bigger problem on their hands.
64-bit native code is only required to fully utilize the integer registers and to access memory beyond 4GB. Context switching will likely be a rather minor issue that I would expect Apple to currently place on the shoulders of programs that want to run in the 64-bit address space rather than those that want to run 32-bit.
Originally posted by wdodd
Do you have a source for this? The articles that I've read are pretty ambiguous when it comes to how much work is actually required to retool the OS to support 32-bit apps.
Nothing specific has been said on what needs to be done (publically). However, every source states that the required changes are miminal, and were designed that way. Therefore I'm inclined to think that this is something that IBM has thourougly documented and which they expect an engineer to be able to finish in a week.
Originally posted by ddtlm
wdodd:
Well IBM has always claimed "native" 32-bit compatibility and the PPC instruction set has always been said to have been designed for seamless 32-bit and 64-bit versions. Since I've never programmed anything in PPC assembly I can't tell you if the instructions for 32-bit and 64-bit are in fact exactly the same such that there has to be a mode-swtich instruction for going between them, but even if that is the case that won't be a big deal since once a 32-bit app starts running it remains in 32-bit till its done, not jumping in and out all the time (so don't worry about switching).
I've also never worked on an OS kennel, but you can trust me, things more difficult than Apple's task are done every day by volunteers working on Linux.
The PowerPC instruction set is (currently) logically divided into 3 sets. 1) The PowerPC Core Instruction set. 2) The PowerPC 64-bit Instruction Set. 3) Altivec Instruction Set. (1) is implemented in every PowerPC implementation and includes everything for operating on integers up to 32-bit and 32- and 64-bit floating point types. (2) extends the integer unit and some related architectural registers to 64-bits and includes operations on 64-bit integers. (3) is of course the Altivec unit implemented on the MPC74xx series and now on the IBM PPC970. And you are correct that a 32-bit application will likely run in 32-bit mode for all of it's life, however if there is even a single 64-bit application running on the system, then there will be context switches (since this is a preemptively tasked operating system). A context switch does not limit one to 64-bit instructions (and prevent one from using non-64-bit instructions). This would make the processor fairly useless. This is not MMX on an x86

.
And I have never worked in an OS kennel either, but I can imagine that the dogs in there can get pretty loud

.
Originally posted by wdodd
I've heard some similar things too. I just wonder about the implementation in the 970. Guess we'll have to wait to see.
The reason I'm on context switching is because this is what killed the first-generation PPC's performance. We're comparing emulation of another chip to 32-bit/64-bit modes, but context switching drove performance on PPC's down below what we were getting on older systems. It wasn't until more of the OS and more of the apps were native PPC that the context switching penalty subsided. It was brutal in the early days.
And as you say, context switching killed us because we had to emulate another processor. This context switch will for the most part just lop off the top half of a few registers. Since the OS will be arbitrating the context switch, it shouldn't cause any problems in User code. OS code however will likely need to be aware of which environment it is running in and adapt. This will cause a few cycles here and there, but the advantage would be the ability to run 64-bit applications. How much of an advantage 64-bit applications will be in the near future will depend entirely on Apple and it's developers.
Originally posted by ddtlm
wdodd:
I can't imagine any sort of switching that could counterbalance the huge architectural improvements in a 970 vs a 7455. There could be penalties but I rest assured that they are going to be unimportant overall.
Surprisingly few people know the sort of thing you are seeking.
You are almost certainly right about this. The 970 is so much faster than the 7455 (going on SPEC scores alone) that context switches shouldn't slow it down at all. I wouldn't think that a 32/64 context switch would cost appreciably more than the standard task switch cost that occures when a pre-emptive tasking OS switches between tasks. I expect that IBM would be aware of this issue and would take steps to make sure that the cost is small, and I expect that Apple will take steps to make sure that their OS software runs at maximum performance.
And perhaps surprising few people who do know have been checking this message thread

.