Originally posted by ddtlm
One: You see, while its true that a 64-bit computer can easily work with 64-bit integers whereas a 32-bit computer can only easily work with 32-bit integers, each integer is still only a single number. You've simply spent twice as many bits storing it. Is that really better? Only if you are trying to store something that won't fit in 32 bits.
No, of course, that is likely to be much worse if you just blindly change your 32-bit numbers to 64-bit numbers when you don't need to, because then you're (1) using twice as much memory (and cache) storing your data and (2) using twice as much memory-to-CPU bandwitdh popping those numbers in and out of memory. You would think that programmers wouldn't use 64-bit ints when 32-bit ints would do, especially not for processor-intensive code (or code that intermingles with processor-intensive code, hence bumping everything out of cache). However, experience with 32-bit processors shows that far too many programmers use long (32-bit) ints when a short (16-bit) or even byte (8-bit) would have done quite well.
Two: Games a typically floating-point intensive, and both 32-bit and 64-bit computers support 64-bit floating-point (aka double-precision) math.
Well, generally speaking, many floating point activities could actually be done using 64-bit integers, and so a "good" programmer would intermix floating point operations with long-long int operations to keep both pipelines full at all points. Granted, as above, *most* programmers don't pay enough attention to such details, but having a single-op 64-bit math processor there alongside your screaming FPU doubles your ability to streamling bottleneck code.
Of course, again, if the memory bandwidth isn't up to snuff (as is the case on the G4), no matter how well you pipeline int/FP instructions on the chip you're still constrained by the latency and throughput in pulling those bits from main memory.
Three: If the idea is to process data in 64-bit chunks instead of 32-bit, as fussily as "process" is used, why not use AltiVec? It "processes" in 128-bit chunks.
Exactly. The altivec unit is rarely logjammed on current apps (and certainly not on current games), and is a great way to process 8 shorts or 16 bytes at a time (assuming you want to do the same process to them). Using a 64-bit int register to do this instead of a SIMD instruction set is much less efficient in that you have to handle overflow conditions (ie, you have 8 bytes that you are incrementing ... if one of those bytes held an unsigned 255, incrementing it will push it to 0 and double-increment the byte next to it), which robs you of the efficiency you were trying to get by operating on a multi-byte int in the first place. Unless, of course, you "know" that you can never have overflow conditions, in which case the 64-bit int register can "stand in for" a half-sized Altivec register if your bottleneck is actually in the Altivec unit (same memory arguments apply as before; >90% of the time on a Mac the bottleneck is to memory, NOT in the CPU or its pipelining! The 970 helps with this in a much larger CPU-memory bus, but you also have to remember that you will likely have much more data being shoved through that bus simply due to the fact that most of the time 64-bit ints will be used where they could have been 32-bit).
A 64-bit processor offers highly-expanded memory capacity, and the ability to do integer math in many cases where before only floating point (double-precision at that) would work. The 64-bit processor also allows more efficient use of 64-bit ints where they are required for, as an example, database operations (all current processors can do 64-bit math, but it's not terribly efficient; gcc and CodeWarrior offer the "long long" data type for this purpose, while MS VC++ offers the "_int64" data type for this ... incrementing a 64-bit int is a three-cycle process instead of a single-cycle process as it would be on a 64-bit processor ... right-shifting a 64-bit int on a 32-bit processor is something like 5 cycles instead of a single cycle; left-shifting is incredibly inefficient on Intel chips as it is, and is even moreso when you are dealing with a 64-bit int). 64 bit integers will not help at all on data which is naturally 32-bit or 16-bit or 8-bits per discrete chunk (for instance, strings, which even in Unicode are in 16-bit chunks).