PPC 970 for Apple... Confirmed?

AidenShaw · May 22, 2003

Re: Re: sometimes the algorithms demand it...

Originally posted by Rincewind42
Caches are designed to take advantage of linear access patterns, not effectively random ones that you get with linked lists and trees (especially when you are allowed to delete and insert at any location within the data structure).

More to the point, caches are designed for access patterns where repeated accesses are made to the same data.

They also optimize for short stretches of linear access - for example 256bits (32 bytes) is a common "cache line" size. When you get a cache "miss", the 32 bytes containing the missed item will be moved from memory into the cache.

If one is looking at other data nearby (locality), then the large read is good. If one is truly bouncing around, then memory bandwidth is wasted (and extra latency introduced) by these large transfers.

A bad case for a cache is a purely linear access pattern - when you are making a single pass sweeping through memory that's much larger than the cache. In this cache each level of cache just adds latency, without helping (kind of like a layer of middle management that just has to approve and sign paperwork).

mathiasr · May 22, 2003

Re: PPC 970 for Apple... Confirmed?

Originally posted by AidenShaw
Using 64-bit pointers, however, does increase the runtime footprint of a program. For some programs, the pointers represent a fair proportion of the total data. Programs that keep their data in lists, double-linked lists, or trees often have that issue.

The larger data size lowers performance in two ways:

- more memory bus bandwidth is needed to move the pointers
- the pointers occupy more space in cache, thereby reducing the effective size of the caches

So, the original poster had more or less the right idea, but made a mistake by referring to the size of the binary image.

I agree, but there is another case where the code could actually schrink.
This could happen if your app does a lot of 64 bits computing.

A small example, that:
- reads two 64 bits values from 2 different memory location
- compute their sum
- store it in another location
- prepare to do the same on the next value in memory.

Consider r6, r7 and r8 as pointers to the begining of the memory locations (could be arrays or matrices), and r4 as an index inside those locations.

in a high level langage (C like):
r8[r5]=r6[r5]+r7[r5]
r5++

PowerPC64:

Code:

ldx r10,r4,r6      loads the 64 bits value stored at memory location r6+r4 in register r10
ldx r11,r4,r7      loads the 64 bits value stored at memory location r7+r4 in register r11
add r12,r10,r11    adds both registers and puts the result in r12
stdx r12,r4,r8     stores the sum in memory location r8+r4
addi r4,r4,8       adds 8 (64 bits = 8 bytes) to the index to point to the next values

Total size of the code: 5 instructions (20 bytes)

This time consider r6, r7, r8 as the sum of rx+r4

PowerPC32 (based on code produced by GCC 3.1):

Code:

lwz r10,0(r6)     loads the high word of the 64 bits value stored at memory location r6 in register r10
lwz r11,4(r6)     loads the low word of the 64 bits value stored at memory location r6+4 in register r11
addi r6,r6,8      adds 8 to point to the next value
lwz r12,0(r7)     loads the high word of the 64 bits value stored at memory location r7 in register r12
lwz r13,4(r7)     loads the low word of the 64 bits value stored at memory location r7+4 in register r13
addi r7,r7,8      adds 8 to point to the next value
addc r14,r11,r13  adds the two low words in register r14 keeps the carry bit
adde r15,r10,r12  adds the two high words and the carry bit in register r15 
stw r15,0(r8)     stores the high word of the sum in memory location r8
stw r14,4(r8)     stores the low word of the sum in memory location r8+4
addi r8,r8,8      adds 8 to point to the next value

Total size of the code: 11 instructions (44 bytes)

When you work with 64 bits datas on a 32 bits CPU, you need 2 loads or 2 stores to move them from/to memory, the values are split across the 32 bits registers (this could lead to register starvation) and require at least 2 times more operations.

mathiasr · May 22, 2003

Re: PPC 970 for Apple... Confirmed?

Originally posted by jettredmont
The only thing compiling specifically for the 970 would do would be allow you to use native 64-bit int registers (and memory addresses).

This gives you access to the PowerPC64 Instruction Set Architecture, and some new instructions like:
fcfid (Floating-point Convert from Integer Doubleword)
fctid(z) (Floating-point Convert to Integer Doubleword(with Round to Zero))
fsqrt (Floating-point Square Root)

The others are ment to support the wider registers:
cntlzd (Count Leading Zeros Doubleword)
divd(u) (Divide Doubleword (Unisgned))
extsw (Extend Sign Word)
ld(u)(x) (Load Doubleword (with Update)(Indexed))
ldarx (Load Doubleword and Reserve Indexed)
lwa(u)(x) (Load Word Algebraic(with Update)(Indexed))
mfasr (Move from Address Space Register) supercedes mfsr
mtasr (Move to Address Space Register) supercedes mtsr
mulhd(u) (Multiply High Doubleword (Unsigned))
mulld (Multiply Low Doubleword)
rld(i)cl (Rotate Left Doubleword (Immediate) then Clear Left)
rld(i)cr (Rotate Left Doubleword (Immediate) then Clear Right)
rldic (Rotate Left Doubleword Immediate then Clear)
rldimi (Rotate Left Doubleword Immediate then Mask Insert)
sld (Shift Left Doubleword)
sr(a)d(i) (Shift Right (Algebraic) Doubleword (Immediate))
std(u)(x) (Store Doubleword (with Update)(Indexed))
stdcx. (Store Doubleword Conditional Indexed)
td(i) (Trap Doubleword (Immediate))

[Edit removed a mnemonic]

mathiasr · May 22, 2003

Re: Re: Re: sometimes the algorithms demand it...

Originally posted by AidenShaw
A bad case for a cache is a purely linear access pattern - when you are making a single pass sweeping through memory that's much larger than the cache. In this cache each level of cache just adds latency, without helping (kind of like a layer of middle management that just has to approve and sign paperwork).

Latency will not strictly add, for instance L2 and L3 caches are polled simultaneously; data coming from memory is forwarded to the Load/Store Units first.
And chances are that datas that belong to the same cache line will be avaible in the L1 cache, if you do not fire a load after another and do some computations in between.

Never heard of data streams and cache hints ?
You can request a cache line before you actually need it:
http://developer.apple.com/hardware/ve/performance_memory.html
http://developer.apple.com/hardware/ve/caches.html

Even if it will not help once the bus has reached peak bandwidth.

AidenShaw · May 22, 2003

Re: Re: Re: Re: sometimes the algorithms demand it...

Originally posted by mathiasr
Latency will not strictly add, for instance L2 and L3 caches are polled simultaneously; data coming from memory is forwarded to the Load/Store Units first.

If the multi-level cache logical introduces zero latency, you're right. I haven't seen any of these mythical beasts, however, it seems that real levels of electronic circuitry do need some time.

Never heard of data streams and cache hints ?

Of course, who hasn't?

Even if it will not help once the bus has reached peak bandwidth.

BINGO! Give that guy the prize!

jeffosx · May 22, 2003

So what do 1000 ancient NT boxes have to do with modern PCs?

It (NT) creates modern PCs that suck...

The only option for us is Win2000 not XP and noone is eager to go through an MS OS upgrade when the server upgrade problems are still going on almost 1 yr after rollout and the team of support staff contracted was huge. WRT to XP noone here will even look at it due to the licence.

My point is that the CPU is only part of the speed issue whereas the bulk of it IMO is the sw and the OS. When arguing about getting Macs here we "won" because of stability and software equalling speed/efficiency not CPU related SPEC scores.

Cheers

mim · May 22, 2003

Originally posted by jeffosx
My point is that the CPU is only part of the speed issue whereas the bulk of it IMO is the sw and the OS. When arguing about getting Macs here we "won" because of stability and software equalling speed/efficiency not CPU related SPEC scores.

Cheers

Exactly. Spec scores only tell part of the storey. I sat here last week with a brand new mint 1Ghz 12"PB, and a mint Dell dual xeon 2.4ghz, and guess what. The PB felt like it was flying next to the Dell. Open a window, move things around, minimize a browser, open photoshop - you name it the PB did it faster. Now, actually doing some "hard" work in photoshop, well ofcourse the dell was faster. But redraws when zooming, paning etc were better on the PB!

My point (and your) - doing the things that most people spend their time on everyday - the Mac is clearly superior. If you ever have that chance, just do it - get two machines side by side and try it.

But yes, the G4 is underpowered for intensive work. But the only time my computers are sitting still with a full processor load, not dealing with multiple windows opening and closing is when they are rendering. This is >not< when your average user needs speed!

I'm not trying to exuse the G4. I just think that some people who want to go to a Pentium system because of a perceived lack of speed in the current Macs will be >sorely< dissapointed. Bring on the 970's and OSX.

Snowy_River · May 22, 2003

Re: Re: Re: Re: AltiVec

Originally posted by jettredmont
The effects of any dimension above 3 is manifested as time....

I just have to say that this statement is absolutely absurd. There are a number of theories (more or less practical and applicable) that include up to, I think it is, 27 spatial dimensions, and they all only have one (1) time dimension. I've even done some research, when I was working for the high-energy group here, into detecting higher compactified spatial dimensions through very close gravity detection.

On the other hand, the only theories that I've ever heard of that have more than one time dimension have been pie-in-the-sky theories that lack applicability.

Okay, let's drop the physics now except as it applies to the 970 processor ...

Alright. Back to the 970. That is, the PPC970 processor, not the year 970, as we all know that it's impossible to travel back in time...

80...

Snowy_River · May 22, 2003

Originally posted by mim
...I sat here last week with a brand new mint 1Ghz 12"PB...

I think you meant an 867 MHz 12"PB. Or do you have a line on a new upgrade to the PB line that we don't know about?

As to everything else you said about OS X, I couldn't agree more. I've compared it to both Windows and Linux, and I have consistently enjoyed the smoother feel of the system under OS X. That is the real reason that I stick with Macintosh computers...

79...

jettredmont · May 23, 2003

Re: Re: Re: Re: Re: AltiVec

Originally posted by Snowy_River

Originally posted by jettredmont
The effects of any dimension above 3 is manifested as time....

Click to expand...

I just have to say that this statement is absolutely absurd. There are a number of theories (more or less practical and applicable) that include up to, I think it is, 27 spatial dimensions, and they all only have one (1) time dimension. I've even done some research, when I was working for the high-energy group here, into detecting higher compactified spatial dimensions through very close gravity detection.

On the other hand, the only theories that I've ever heard of that have more than one time dimension have been pie-in-the-sky theories that lack applicability.

Okay, sorry to bring dimensioning back up, but just to clarify: I don't mean that all of the infinite number of conceptual dimensions are "time" in their own rites, but, from the 3-d perspective, we observe the effects of all such dimensions on our own as the passage of time (imagine a 1-D world lying on a piece of string ... that string being whipped around our 3-D universe sees its state change dramatically, but in and of itself could attribute all such changes to just the passage of time ... one second it has "gravity" flowing in the positive direction, the next in the negative direction, that is all it can directly observe). Which pretty much goes without saying when the only dimensions we can observe directly are the 3 spatial dimensions, and the passage of time (ie, the change in the state within those three dimensions).

But, yes, we can conceptualize about multiple, countless, other dimensions to explain phenomenon in these 3+1 ... In the same way our "string world" might be able to theorize that these strange shifts in "gravity" at their root are the consequence of movement through a 3-D universe where gravity is (fairly) constant.

Which is why, later, I said that there are only two correct answers to how many dimensions exist: 3+time (the number of directly observable dimensions) or infinite (the number of conceptually observable dimensions). "4+time" is not correct as an overall rule, although a particular theory might well only make use of 4 dimensions+time to explain its phenomenon.

OTOH, I have to say I haven't come across the theory using 27 dimensions yet!

pdickins · May 27, 2003

PPC 970 PowerMacs boxed and ready to go

Just looking on the macbidouille site they have a new rumor of PPC970 Powermacs apparently boxed and ready to ship after the WWDC. Apologies for the google translation:-

Before arriving at the heart of the subject, remember that what follows is a rumour, without tangible proof. However if we chose to publish it, it is that it has very good chances to be true.

First Computers PPC 970 left the production lines and are even already packed on pallets. Pallets are covered with the opaque and sealed film. There are above stickers with following information:

"tamper proof seal, confidential property inside, prosecution may result personal yew opened by unauthorized"

They will start to be delivered to the wholesalers of confidence very soon with absolute order of not the défilmer before June 23. APPLE had already used this method during launching of the Cube and more recently of the iMac G4.

mim · May 27, 2003

Originally posted by Snowy_River
I think you meant an 867 MHz 12"PB. Or do you have a line on a new upgrade to the PB line that we don't know about?

As to everything else you said about OS X, I couldn't agree more. I've compared it to both Windows and Linux, and I have consistently enjoyed the smoother feel of the system under OS X. That is the real reason that I stick with Macintosh computers...

79...

Whoops, ofcourse you're right. Just makes it even more impressive

Also now, this translation above is interesting. MacB are going to be rumors gods, or scum of the earth. And the best thing is that we'll know for sure in under 1 month now.

As much as I was impressed by the PB, I am holding off buying one until I see what goes down at the end of June.

This count-down is really starting to get intense Snowy. Couldn't you have started at 10 or something?!

Search

Search

PPC 970 for Apple... Confirmed?

AidenShaw

macrumors P6

mathiasr

macrumors regular

mathiasr

macrumors regular

mathiasr

macrumors regular

AidenShaw

macrumors P6

jeffosx

macrumors newbie

mim

macrumors 6502

Snowy_River

macrumors 68030

Snowy_River

macrumors 68030

jettredmont

macrumors 68030

pdickins

macrumors newbie

mim

macrumors 6502

Our Staff