Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Re: 64 bit, hmmmmmm...

Originally posted by copperpipe
But, I just don't understand why Intel, AMD, and IBM are all investing billions of dollars in the race for 64 bit?
Remember how many years ago Bill Gates said "640k memory should be enough for anyone", or something like that? Well, we are quickly reaching the point where 4GB of memory will not be enough for anyone. When we reach that point, the big chip manufacturers like Intel, AMD, and IBM don't want to be stuck for a couple of years trying to engineer their 64-bit products. They want to be ready ahead of time, because whichever vendor has the most solid 64-bit implementation will win major market share.

Another consideration is large enterprise databases. These suckers live on 64-bit, take for example a Sun Fire 15000 which supports up to 576GB :eek: of RAM! That's over half a terabyte of memory... The Intels and AMDs of the world would love to be able to capture some of that market share of the servers that sell for a few cool million a piece...

64-bit really doesn't matter too much for the desktop for the next year or so, but I can see by about 2005 it will be a differentiating factor.
 
Originally posted by ffakr
NO!

You still aren't getting it. This has been explained many, many times.
If you require operations with 32bit precision math, it doesn't matter whether you use a 32bit, 64bit, or 256bit processor, and it doesn't matter whether you use a 32bit, 64bit, or 256bit OS... it will still only fetch 32bit words!!!
Why is this so difficult to understand?

Yes, this should be pointed out. 64 bits registers speed up computations only if you need to work on 64 bits data, and quite often (most often indeed) CPUs work only on 8, 16 or 32 bits data.
Eg. your doing a sensus survey and want to identify each person with a given number, excepted if your survey covers the entire Earth population you do not need 64 bits numbers, it's the same with most common task, you'll never have more than 4 billions songs in your play list, music samples are 16 bits wide, pixels are 24 or 32 bits elements, and you could handle a couple or two using a SIMD extension (the G4 is actually a 32/128 bits processor)
Only a few tasks require 64 bits figures, mostly sciences, accountancy... and if your program needs to handle a 64 bits data from time to time a 32 bits CPU can do the job, it will only require more time (we are speaking in tenth of microseconds here), and this is the only time you whould actually see a benefit having a 64 bits CPU.
Computing the size of a file on a 64 bits file system, is not the same job as crunching through a 1000 by 1000 matrix in Mapple.
64 bits CPUs are interesting if you have a heavy usage of such datas, or if you need to point to more than 2 GiB of memory (as would the database of all the parts of a jet plane, or eventually keep an entire movie clip in memory, not your personal stamps collection, nor an MP3 file); in many many other cases a SIMD engine can bring you much more benefit.

128 bits SIMD = handle 4 CD quality music samples at the same time.
64 bits CPU = handle a single sample that is I cannot figure howmany times more precise and that not even a Vulcanian needs ;)
 
Originally posted by mim
The 970 does not have a single 64 bit data bus (each direction) feeding it, but 2x32bit ones. It also has not one, but 2 load/store units, 2 fixed point units, 2 floating point units, and 1 simd unit - that just happens to have 2 sub-units.

Now, I know next to nothing about proc design so these are just my thoughts.

What I do know is that every other chip I've seen data on has 1X data bus (64 bit procs like the AMD have 1x64bit data buses, even the power 4 does too), they all have 1 floating point unit, and I've never heard of a a simd unit that has 2 sub units.

Is it possible that the reason that IBM chose 2x32bit busses rather than 1x64bit (which must be less effecient for 64bit data read/writes?) is that it >can< call and send 2x32bit words each cycle?

Apple is using two uni-directional 32 bit busses because that is what Hypertransport is... Hypertransport.org. Hypertransport (as used in the 970) sends 32bit blocks in packets... kind of in the way that a serial ethernet cable delivers data in packets. This is why you hear about 6.4GB of effective bandwidth... 2buses*32bit/8bits_per_byte*900MHz=7.2GB/sec... but once you subtract off the overhead associated with routing (and packetizing) the Hypertransport data, you get around 6.4GB/sec. :)

As for the altivec sub-units... the current G4s have no less than FOUR subunits that make up Altivec. :)
Probably the coolest feature of the 970 that most people don't mention is that it can issue up to 8 instructions per clock cycle (though 3 of those are load/store). The 970 has so many discrete units because it can use so many of them at once!
 
Originally posted by mim
The 970 does not have a single 64 bit data bus (each direction) feeding it, but 2x32bit ones. It also has not one, but 2 load/store units, 2 fixed point units, 2 floating point units, and 1 simd unit - that just happens to have 2 sub-units.

I'm pretty certain you are correct about that (not looking at the spec sheet, and I don't quite remember the functional unit counts...). None the less, functional unit count doesn't have anything to do with 64-bitness, but does with performance.

And it does have 2 32-bit busses, but they are each unidirectional - one is used to read(load) and the other to write(store).

What I do know is that every other chip I've seen data on has 1X data bus (64 bit procs like the AMD have 1x64bit data buses, even the power 4 does too), they all have 1 floating point unit, and I've never heard of a a simd unit that has 2 sub units.

Actually, the Power4 does use dual busses that are very similar to the 970 - but they are 128 bit instead of 32. And the 745x G4s also have the same (or similar) division of the Altivec unit.

Is it possible that the reason that IBM chose 2x32bit busses rather than 1x64bit (which must be less effecient for 64bit data read/writes?) is that it >can< call and send 2x32bit words each cycle?

No, each bus can only be used to read or write, not both. This is the same reason why you won't see much of a benefit from Dual Channel DDR RAM on a single proc PowerPC 970 system. Unless you are communicating with different RAM chips you can't use the full read & write bandwidth at the same time (on a dual proc system you could be reading or writing to the same RAM chips with both CPUs and thus gain some benifit).
 
Originally posted by Rincewind42

No, each bus can only be used to read or write, not both. This is the same reason why you won't see much of a benefit from Dual Channel DDR RAM on a single proc PowerPC 970 system. Unless you are communicating with different RAM chips you can't use the full read & write bandwidth at the same time (on a dual proc system you could be reading or writing to the same RAM chips with both CPUs and thus gain some benifit).

Sorry, you're right (I just checked the data sheet which I should have done in the first place...<sigh>). I had thought that it had 4 busses (2 read and 2 write), but it does only have 2 and they are unidirectional.

Thanks for the clarifications, and the details on hypertransport from ffakr. :)
 
I was more interested in this quote from PowerJack:

"I also have a pile of Apple documentation here describing some mind-bending capabilities lurking inside the new iPods."
 
Originally posted by chicagdan
I was more interested in this quote from PowerJack:

"I also have a pile of Apple documentation here describing some mind-bending capabilities lurking inside the new iPods."

Everyone said that about the original pods too (a whole unused chip?) - but nothing ever came of it.

Not saying that these capabilities don't exist - but Apple have to let someone develop them (if they don't themselves).

a.
 
A question from the ignorant

Since some of you here are so well informed I'm going to phrase this as a question, since I do not know the answer to it.

Why wouldn't Altivec benefit largely from a 64 bit processor? If Altivec is designed to facilitate the crunching of multiple operations at once instead of one at a time (in select cases) would a 64-bit processor allow Altivec to fit twice as many of these instructions into a clock cycle?

I really don't know, because I am Altivec stupid, so someone who does please enlighten me.

And try to refrain from flaming people who think a 64 bit processor will be twice as fast. Just tell them it won't. Although -- from what I understand it will be not only twice but several times as fast in 64 bit operations as a 32 bit processor ends up having to cycle these things not twice but a few times. I could be wrong...it's only what I've heard.
 
Originally posted by mim
Everyone said that about the original pods too (a whole unused chip?) - but nothing ever came of it.

Not saying that these capabilities don't exist - but Apple have to let someone develop them (if they don't themselves).

a.

There is at least one hidden ability here, recording. You might have to rig something to get it to work, but with the curtain pulled back, Apple will probably end up releasing a new attachment and SW update with this generation....eventually...
 
Re: A question from the ignorant

Originally posted by BaghdadBob
Why wouldn't Altivec benefit largely from a 64 bit processor? If Altivec is designed to facilitate the crunching of multiple operations at once instead of one at a time (in select cases) would a 64-bit processor allow Altivec to fit twice as many of these instructions into a clock cycle?

Altivec is designed to work with 128 bit quatities as 16 8-bit, 8 16-bit, or 4 32-bit values. That's it, nothing more or less (although that something is pretty sweet in the right hands! :D).

A 64-bit processor is one that is defined (typically) as manipulating 64-bit pointers. Since pointers are almost always manipulated in the integer unit of the CPU, this means that a 64-bit CPU can manipulate 64-bit integers.

Since the Altivec unit doesn't interface with the integer unit in this manner (the only effect that a 64-bit integer unit has on the altivec unit is that vectors can be loaded beyond the 4GB value - just like any other data type).

Now, since all instructions on the PowerPC platform are 32-bits you may thing that you could load 2 instructions at once and execute them both. This isn't how it works. In reality some larger number of instructions (that depends on the actual CPU architecture) is loaded and dispatched at once. This is completely independent on the integer unit - it is completely uninvolved in instruction dispatch & execution (aside from actually executing instructions dispatched to it).

And try to refrain from flaming people who think a 64 bit processor will be twice as fast. Just tell them it won't. Although -- from what I understand it will be not only twice but several times as fast in 64 bit operations as a 32 bit processor ends up having to cycle these things not twice but a few times. I could be wrong...it's only what I've heard.

Well, I don't think that any of us are delibrately trying to flame anyway - we just don't want others to be confused about what is going on. Since inevitably this leads to "Why isn't X twice as fast on this chip? 64-bit is supposed to be twice as fast as 32-bit!". It's a preemptive strike on ignorance :D.

And yes, 64-bit number crunching will be more than twice as fast on a 64-bit CPU vs a 32-bit CPU :cool: . But this is probably not as big a deal as some people want to make it :D.
 
Originally posted by mim
The 970 does not have a single 64 bit data bus (each direction) feeding it, but 2x32bit ones. It also has not one, but 2 load/store units, 2 fixed point units, 2 floating point units, and 1 simd unit - that just happens to have 2 sub-units.

Now, I know next to nothing about proc design so these are just my thoughts.

What I do know is that every other chip I've seen data on has 1X data bus (64 bit procs like the AMD have 1x64bit data buses, even the power 4 does too), they all have 1 floating point unit, and I've never heard of a a simd unit that has 2 sub units.

Is it possible that the reason that IBM chose 2x32bit busses rather than 1x64bit (which must be less effecient for 64bit data read/writes?) is that it >can< call and send 2x32bit words each cycle?

Just a though, just a thought.

Flame proof suit is on! Go for it boys (and girls).
You have to know that putting a lot of wires on a mobo renders its drawing difficult, prone to interferences... thus IBM choosed to reduce the bus width and dramatically increased its clock frequency.
The announced 900 MHz bit rate is in fact a 450 MHz bus with DDR (Double Data Rate). In comparison the Athlon 3200+ has a 200 MHz DDR (400 MHz bit rate) FSB, the Pentium 4c has a 200 MHz QDR (800 MHz bit rate) FSB. Both x86 CPUs have a 64 bits bus, the 970 has two 32 bits buses, and nominally clocked more than twice as fast (the result is that it should be in the same ballpark as the P4).

The Athlon has 3 FP units ;)

On the bus datas do not travel alone, they move with their entire neighbourhood in what is called a cache line. This is due to the cache logic, it would be damn hard to track bytes one by one, having flags telling if they have been modified and so on. And chances are that if you requested a byte you'll also need the one next to it.
Cache lines are usually 32 bytes wide (I'm not sure about the PowerPC 970 it may be 64 bytes), this is wider than any bus be it 64 bits or 128 bits, a single byte read can produce many waves of data transmission on the bus.
You also have to know that the buses between the CPU core and its L1 and L2 caches are way wider than the CPU front side bus, for instance the PowerPC 970 can read 8 32 bits instructions from its L1 instruction cache at once.

The 970 is a fast and wide design, faster and wider than the G4.
 
Why are people knocking on LoopRumors so much???

They have a MUCH BETTER track record than MacWhispers

LoopRumors, after all, was the only site to get info on the Steve Jobs speech at INTEL.....with all due respect to TS and MacRumors (who are pros at this game), every other rumor site snoozed on that one

MacWhispers's rumors have not come to fruitation yet (as far as I know)....as a matter of fact, wasn't MacWhispers going to shut their website off completely after so many, self confessed, inaccuracies???

I love reading all rumors and I love checking all rumor sites.....but at this moment in time, I have more faith in LoopRumors than I do in MacWhispers
 
Originally posted by HasanDaddy

I love reading all rumors and I love checking all rumor sites.....but at this moment in time, I have more faith in LoopRumors than I do in MacWhispers
I have more faith in Steve Jobs showing up at my doorstop tomorrow with a batch of 970s and unrelased NeXT-pyro based machines than I do in either of the above sites. They have no credibility, none.

Also as to altivec being on the 970, doesn't it appear that the functionality was tacked on to the 970? (My limited understanding from following the huge thread at Ars).
 
Re: Re: A question from the ignorant

Originally posted by Rincewind42
Well, I don't think that any of us are delibrately trying to flame anyway - we just don't want others to be confused about what is going on. Since inevitably this leads to "Why isn't X twice as fast on this chip? 64-bit is supposed to be twice as fast as 32-bit!". It's a preemptive strike on ignorance :D.

And yes, 64-bit number crunching will be more than twice as fast on a 64-bit CPU vs a 32-bit CPU :cool: . But this is probably not as big a deal as some people want to make it :D.
Well, some people are treating those who are not in the know a little harshly. If someone gleefully exclaims "Blahblahblah will be twice as fast on a 64 bit chip!" I'm quite sure it is out of ignorance, not blatantly misinforming the public, as some are treating such comments with their "WHY are you perpetuating this myth, dumbass?" remarks.

Anyway, as far as what will benefit from 64-bit processors, I am sure that once optimized code is widespread that there are a great many popular uses it will have, such as with rendering and graphic manipulation. And really, what else do you need from a processor once you get past a GHz or two?

Thanks for answering my Altivec question, here's another, and it's stupid too: am I wrong in assuming that before instructions get to the Altivec hardware they have to pass through the processor? If I am not, lets say you have a 64 bit processor, does that allow you twice the path through which to pass Altivec instructions, assuming you have twice as much Altivec hardware? Did I misunderstand that Altivec is hardware based? Does the pair of 32 bit unidirectional busses basically eliminate this possibilty as a 32-bit processor would have access to these instructions to pass them along just as quickly?

Should I just read up on this since the technology's been around what...8 years or more? If so just give me a link and I'll shut up and figure it out for myself.
 
Re: Re: Re: A question from the ignorant

Originally posted by BaghdadBob
Anyway, as far as what will benefit from 64-bit processors, I am sure that once optimized code is widespread that there are a great many popular uses it will have, such as with rendering and graphic manipulation. And really, what else do you need from a processor once you get past a GHz or two?

Optimized code for the 970 will likely come from one of 3 sources: Updates for programs that can make wide usage of 64-bit integers (which would only run on 64-bit hardware), Upgrades that add Altivec code, or Updates to compilers that allow for optimized code generation for the 970 (which would run on other PowerPCs unless the program is also compiled for 64-bit, but would also possibly make this code slower on older processors). In general, the latter two optimizations would make for larger performance gains than the first.

Thanks for answering my Altivec question, here's another, and it's stupid too: am I wrong in assuming that before instructions get to the Altivec hardware they have to pass through the processor? If I am not, lets say you have a 64 bit processor, does that allow you twice the path through which to pass Altivec instructions, assuming you have twice as much Altivec hardware?

Altivec instructions are just like any other instructions, it's just that they are only available on certain CPUs. The Altivec unit is exactly like any other processing unit. Like the FPU & Integer units, it has it's own set of registers that it shares with the Load/Store unit (for the sole purpose of moving register contents to and from main memory).

The integer width has nothing to do with how many instructions you process on a particular clock cycle, as instructions do not pass through the integer unit for dispatch. Basically, the hardware fetches instructions and from those instructions determines which unit they should be sent to - Integer, Floating Point, Altivec, Branch, or Load/Store. By the time an instruction enters an execution unit, more instructions have already been loaded and prepared to execute immediately afterward.

The PowerPC 970 can fetch up to 8 instructions per cycle (that's 32 bytes - 256 bits) dispatch/complete 5 of them per cycle. It would be possible to create a 32-bit (or 16-bit, or any other bit-ness) CPU that could fetch even more instructions per cycle. So in the end, the bitness and bus archetecture of the CPU have nothing to do with how many instructions can be fetched for execution per cycle (although the latter can affect how fast this can happen).

Should I just read up on this since the technology's been around what...8 years or more? If so just give me a link and I'll shut up and figure it out for myself.

Altivec technology has been around for about 4 years now if I'm remembering correctly. You can find loads of information on it at http://www.simdtech.org/apps/group_public/documents.php

edit: edited for clarity
 
If, after all this rumors and high hopes we get yet another G4 incranation I will laugh my guts out.

What a crazy sick laugh that will be.
 
Is it the kind of laugh that starts out hysterically and degenerates into sobbing, dropping your gun, and trying to remember what this whole crazy "Mac" thing was all about in the first place?

Yeah, that sounds about right...
 
Originally posted by HasanDaddy
Why are people knocking on LoopRumors so much???

They have a MUCH BETTER track record than MacWhispers

LoopRumors, after all, was the only site to get info on the Steve Jobs speech at INTEL.....with all due respect to TS and MacRumors (who are pros at this game), every other rumor site snoozed on that one

MacWhispers's rumors have not come to fruitation yet (as far as I know)....as a matter of fact, wasn't MacWhispers going to shut their website off completely after so many, self confessed, inaccuracies???

I love reading all rumors and I love checking all rumor sites.....but at this moment in time, I have more faith in LoopRumors than I do in MacWhispers

It makes no difference if the rumor sites are true or not. Machines come out when they come out.
 
Re: Re: must have missed it

Originally posted by maxvamp
what is your take on any speed impact when the processor has to manage the upper and lower set of bits of a given file.

Updated after reading the excellent Ars Technica article just posted http://arstechnica.com/cpu/03q1/ppc970/ppc970-0.html In particular, that article has the line

...but to summarize, Mac users should not expect any inherent performance benefits from the move to 64 bits. The 970's performance advantages will come from the many microarchitectural features that I'll cover in this article, and not from the fact that it's a 64-bit processor.


----

Well, the simple integer instructions execute in a single cycle. Add 32-bits, then "add with carry" the 2nd 32 bits (the way 64-bit synthetic integer additions are done) would take two serial instructions.

That's 2 cycles, or 1 nanoseconds for a 2GHz CPU.

But that's really worst case - in a superscalar, o-o-o (out-of-order) CPU both adds could be started at about the same time, along with many other instructions (the 970 and P4 can have 1 or 2 hundred instructions in progress at once). So other things are happening at once, so the 1 nanosecond could be hidden by other things that need to be waited on.


----

What if the data is not in cache? Latency on a cache miss is often from several dozen to a hundred or so cycles - suddenly the 2 cycles for the arithmetic is 50 or 100 cycles. At this point - 1 cycle for a 64-bit add vs. 2 cycles for a synthesized add is hardly relevant.

----

Suppose we're doing a 64KB read and the data is in the filesystem cache. If the system bus is 8GB/sec, then reading and writing 64KB will take about 16 usec, or 32,000 cycles.

----

Suppose the disk head has to move - that's about 8 msec - 8,000 usec - 8,000,000 nsec - or a whopping 16,000,000 cycles.

----

So, that's why I claim that an extra cycle to synthesize 64-bit integer arithmetic won't be measurable in actual I/O operations.
 
Re: Re: Re: A question from the ignorant

Originally posted by BaghdadBob
Well, some people are treating those who are not in the know a little harshly. If someone gleefully exclaims "Blahblahblah will be twice as fast on a 64 bit chip!" I'm quite sure it is out of ignorance, not blatantly misinforming the public, as some are treating such comments with their "WHY are you perpetuating this myth, dumbass?" remarks.

When multiple people repeat the same misinformation in the same thread thread, it is pretty tough to second through nth person with kid gloves.

This gets especially hard when you are dealing with the fifth or sixth person in the thread to say it in such an asinine way.

Okay, I am not perfect or all knowing, but I at least expend the effort to read the posts in a thread before saying something stupid.
 
H, Hmmm

Folks, does anyone here know that Foxconn is a trade-name that a large Taiwanese manufacturer uses?

That manufacturer is Hon Hai.

Go look them up (http://www.foxconn.com, http://www.foxconn.com.tw , http://hr.honhai.com.tw ) They DO made boards for Apple.

While were at it, Quanta in Taiwan makes the PowerBooks for Apple.

I think the reports are true to a certain degree. I think we're going to see 970 machines, soon.
 
Originally posted by macdong
You are way over your head, dude.
I don't want a steaming fried pan on my laps. :rolleyes:

Eh, it's not entirely unimaginable that there wouldn't be a dual 970 17" PowerBook. Just make sure it only runs 1 processor when running on battery! Wooha! That would drain the battery pretty fast otherwise.

With the amount of area that the 17" PB provides, and the negligible heat it puts out thanks to Apple's great designs, I highly doubt heat would be a problem. I am all about the dual 970 PB.
 
Went to Foxconn's site and found this

Went to Foxconn's site and found this:

http://www.foxconn.com/products/alphabet.asp?fmletter=M

What's an "Apple Monster Bus" (Customer Special)?

*feeds the fire* :D

leo


p.s.: At least Apple has been working with Foxconn in the past to some extent. Some of the connectors in my b/w G3 are labelled "Foxconn". I know, it's just connectors...
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.