Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Originally posted by mathiasr
It should be entirely transparent... unless Apple shifts to a NUMA architecture where apps would have to avoid storing to much data in remote location memory, since each CPU would have local (fast) and remote (some what slower, remote is indeed local to another CPU) memory.
In an SMP design memory has always the same access time (as long the bus is not saturated).
CC-NUMA is programmer-transparent, too. A NUMA system typically has a scheduler that knows how to migrate pages from node to node and that understands processor affinity. So the program doesn't have to worry about memory allocation at the hardware level at all, except in the very most extreme cases.

In other words, if you start a process on processor 12, that process is generally going to stay running on processor 12, even after dropping out of the run queue for things like blocking I/O. That's processor affinity.

When a process running on processor 12 allocates a block of memory, the system allocates that memory in the closest bank of RAM, topologically speaking. If that's not on the local node, then the system will page-migrate the memory to the local node as local RAM becomes available. This happens completely invisibly from the perspective of the application.

Now, NUMA systems are designed in multi-dimensional cube topologies, which means that even in a thousand-processor system a processor is never more than seven router hops away from the furthest bank of memory. In a 1,024-processor SGI Origin 3000 system like NASA's Chapman, for example, the average memory latency is 480 ns and the worst-case memory latency is only 640 ns. Best-case latency is 170 ns. So the total variance from best-case to worst-case is only about 3.7. And that's across a thousand processors. Oh, also that's with standard memory modules. And remember this is CPU-to-main-memory. In the real world, latencies are much lower thanks to fat caches and predictive fetching. Particularly in sequential-read applications like video processing for example (read a byte, write a byte; or read a vector, write a vector), the vast majority of the time the cache line that the processor reaches for next has already been loaded into cache predictively, so sequential applications are very cache-friendly.

For comparison, the round-trip time to main memory in a Power Mac G4 is about 95 ns. Throw a NUMA memory controller in there, and you would see memory latencies pretty much in line with the fastest computers in the world, give or take a few percent. In other words, you would NOT have to worry about local versus remote memory.

Now, in HPC/TC applications, programmers can directly manipulate the memory topology of a NUMA system to suit their needs. They can allocate given blocks of memory on given nodes if they so choose. That kind of application tuning is saved for the sort of long-running jobs where tweaking the memory allocations might save you a week or two of computer time over the course of the run. You won't see that kind of optimization on anything less than a supercomputer for a long, long time, because it just won't be worth it. Hand-coding the memory handling in a NUMA version of After Effects might save you three minutes over the course of a year. Hardly worth the effort.

Now, none of this means anything, except this: if Apple were to decide to implement small-scale NUMA (< 16 processors) using chips that are instruction-set compatible with the PowerPC G4 family, it would not be necessary for application vendors to go back and change their code to run on the new systems.
 
Originally posted by cb911
i think that HT is a relatively new technology. having a quick look around the HT.org site it looks like only the NVIDIA nForce 3 Pro and the AMD Opteron are the only ones to use this technology in computing. not sure about this, but it looks like IBM (and then Apple) might be the first to use it.

i guess that the first 970's could have HT... if Smeagol is released will apps like Photoshop 7 etc be able to run on a HT 970? if the OS needs a slight change to operate properly, won't that mean that all apps will also need an update to be able to run on a HyperTransport 970?

HyperTransport is used in the XBox between the graphics processor and the I/O controller.

It is also used in the nForce, nForce2 and nForce3 chipsets for various AMD processors.

AFAIK HT doesn't need drivers, it works transparently to the operating system. Any changes being made to 32-bit MacOS will be adding drivers, etc, for other system components that are new on the new platform.
 
Hypertransport as I understand it

Hypertransport as I understand it does not necessarily need to be used throughout the system. I can see where they'd use it to connect the two processors in a dual chip computer but let the front-side bus be something different.

Though it is interesting that they picked the name "Smeagol" for the OS revision that allows thee 970 to be compatible, because the whole idea behind HT is to allow all the chips to speak the same language so nothing has to be translated from chip to chip. "One bus to bind them" perhaps?
 
Re: Hypertransport as I understand it

Originally posted by COS
Though it is interesting that they picked the name "Smeagol" for the OS revision that allows thee 970 to be compatible, because the whole idea behind HT is to allow all the chips to speak the same language so nothing has to be translated from chip to chip. "One bus to bind them" perhaps?

Can we all drop the Smeagol speculation? They named it Smeagol because they are, surprise surprise, fans of LOTR. End of Story. The people at Apple don't have the time to waste thinking of clever code names for everything.
 
ZDNet also

Also, with cnet reporting the use of HT in Apple's new hardware, http://zdnet.com.com/2100-1103_2-1016770.html, ZDNet is also reporting that Apple will be adopting it as well.

"Apple Computer plans to discuss how it will incorporate HyperTransport, a rapid chip-to-chip communications technology, into future computers later this month at its developer conference.

The Cupertino, Calif.-based company will use HyperTransport as a high-speed link between the two processors that make up the chipset in new desktop Macintoshes, sources said. A chipset is a group of chips that manages the internal functions of a computer."
 
Re: ZDNet also

Originally posted by macphisto
Also, with cnet reporting the use of HT in Apple's new hardware, http://zdnet.com.com/2100-1103_2-1016770.html, ZDNet is also reporting that Apple will be adopting it as well.

"Apple Computer plans to discuss how it will incorporate HyperTransport, a rapid chip-to-chip communications technology, into future computers later this month at its developer conference.

The Cupertino, Calif.-based company will use HyperTransport as a high-speed link between the two processors that make up the chipset in new desktop Macintoshes, sources said. A chipset is a group of chips that manages the internal functions of a computer."
That is the SAME article...

It's getting reprinted everywhere in all the parent company's publications.

---

Confirmation from several sources using the same article.
attachment.php
 
Re: Re: Hypertransport as I understand it

Originally posted by SpamJunkie
Can we all drop the Smeagol speculation? They named it Smeagol because they are, surprise surprise, fans of LOTR. End of Story. The people at Apple don't have the time to waste thinking of clever code names for everything.
(Like normal, I'm veering way off topic here! Sorry Arn!)

These things usually do have some kind of meaning to someone. Whether it's a personal idea of one person or a marketting idea that sounds, well, marketable (Jaguar).

I worked at one place that named a release SkyLine - it was a reference, so I'm told, to a "line in the sky" of a sky high dream of how great the product would be. (yawn). The precursor to it was BARPH. Looked like an acronym but it actually was named that so that marketting would keep their mitt's off of it and not talk about it with customers!

Another company named their pre-releases of a new version of their software after various bridges in Portland, OR because: A.) That's where we worked and B.) They were "bridges" to new the new platform

My favorite one (to hate that is) was when I worked on a system for a large airline here in DFW. Our company (and the airline's), at the time, had a bunch of cool sounding code names like: ASP, T-REX, VIPER, Raptor, etc. Almost all were acronyms of some sort, so then the airline comes up with this new reservations platform and the code name is: AACoRN. I forget what the C and N stood for (the R was reservations - you can guess what airline AA was for ;) ) What a wussy name! The funniest part was arround the same time, the TV Dilbert show was airing and they had an episode where Dilbert was trying to come up with a code name so bad that the project he was on would get cancelled. What did he come up with? Acorn! We all had a big laugh about that!

So, yet again, what were we talking about? Oh yeah, I'll bet that by the time 10.3 is in the stores, we'll all 'get' what Smeagol meant and it'll probably be rather geek-cool.
:cool:
 
Jeff Harrell:

For comparison, the round-trip time to main memory in a Power Mac G4 is about 95 ns.
I'm very interested in knowing what version of PMac this is for, and what your source is.

COS:

Hypertransport as I understand it does not necessarily need to be used throughout the system. I can see where they'd use it to connect the two processors in a dual chip computer but let the front-side bus be something different.
The processors have to have pins+hardware supporting the HT link of course, which the PPC970 apparently does not have. Perhaps the PPC980 will use HT for interprocessor communication.
 
Originally posted by ddtlm
Jeff Harrell:I'm very interested in knowing what version of PMac this is for, and what your source is.
This could be mesured, trash all the caches, read the lower part of the time base, do a load followed by an isync, read the lower part of the time base again.
 
mathiasr:

I'm not a fan of making measurements like that myself. Seems like there's always something overlooked.
 
Originally posted by ddtlm
Jeff Harrell: I'm very interested in knowing what version of PMac this is for, and what your source is.
Got it from the Power Mac Technology Overview, January 2003, page 9. It's certainly available on the Apple site somewhere, but I'm looking at a PDF right now.

The table in question compares latency to main memory in the G4 to an unspecified 3 GHz P4 system with 20K of L1 cache and 512K of L2 cache and no L3 cache. The G4 reference system was a dual 1.42 with 64K L1, 256K L2, and 2 MB L3. It's divided up into L1 cache miss latency, L2 cache miss latency, L3 cache miss latency, and total time to main memory.

L1:
G4: 5 ns
P4: 10.1 ns

L2:
G4: 23.3 ns
P4: 135.8 ns

L3:
G4: 64.1 ns
P4: n/a

MM:
G4: 94.5 ns
P4: 146.6 ns

Thanks for calling me on it. I should have cited my sources. (The sources for the SGI numbers all come from techpubs.sgi.com, in various places. FYI.)
 
Jeff Harrell:

I wasn't calling you on it so much as looking for more information. I have found the Developer Note from March 14, 2003 at Apple's site but not the document you have. I'll keep looking around.

In other news, I just noticed in the developer's note that they do state the L3 cache speeds: 250mhz for 1.0ghz and 1.25ghz machines, 236mhz for the 1.42ghz Mac. I figured that they were doing that but hadn't ever seen it printed anywhere.
 
Originally posted by Jeff Harrell
Got it from the Power Mac Technology Overview, January 2003, page 9. It's certainly available on the Apple site somewhere, but I'm looking at a PDF right now.

The table in question compares latency to main memory in the G4 to an unspecified 3 GHz P4 system with 20K of L1 cache and 512K of L2 cache and no L3 cache. The G4 reference system was a dual 1.42 with 64K L1, 256K L2, and 2 MB L3. It's divided up into L1 cache miss latency, L2 cache miss latency, L3 cache miss latency, and total time to main memory.

L1:
G4: 5 ns
P4: 10.1 ns

L2:
G4: 23.3 ns
P4: 135.8 ns

L3:
G4: 64.1 ns
P4: n/a

MM:
G4: 94.5 ns
P4: 146.6 ns
It's available here:
http://www.apple.com/powermac/pdf/PowerMac_TO_012003.pdf

I'm trying to figure out what these figures actually mean... Why the hell do they provide miss latency and not hit latency?
Why don't they show:
L1 hit (3 or 4 cycles, 3 is for GPR, 4 is for FPR and VR)
L1 miss, L2 hit (9 cycles)
L1-L2 miss, L3 hit (more than 33 cycles depends on L3 config, 33 is for 4:1 frequency ratio, DDR L3 bus, 5 and 0 sample points, which is not the case on a 1.42 GHz machine)
L1-L3 miss, main memory access

[edit]
I've found the "Calibrator" tool they used to produce the figures:
http://homepages.cwi.nl/~manegold/Calibrator/calibrator.shtml
 
Originally posted by Cappy
Folks, I'm interpretting alot of misinformation here on what HT really is. Read up on it if you're interested. I'm certainly not an expert on it enough to describe it and give it justice but much of what I've seen here hasn't been too accurate. Remember there are many buses in a computer. The pci bus is not all there is to a computer and HT certainly is not replacing it.

If you're referring to my post, I didn't say that HT would replace PCI--of course any Power Mac would have to have some form of PCI to be practical. I said that HT would *partially* replace PCI, which is true.

HTH
WM
 
Originally posted by ddtlm
In other news, I just noticed in the developer's note that they do state the L3 cache speeds: 250mhz for 1.0ghz and 1.25ghz machines, 236mhz for the 1.42ghz Mac. I figured that they were doing that but hadn't ever seen it printed anywhere.
It suppose that SRAM ships running at more than 250 MHz would have been to expensive.
The L3 memory frenquency is linked to the CPU frequency, it is divided by a integer factor.
1.00 GHz divided by 4 gives 250 MHz
1.25 GHz divided by 5 gives 250 MHz
1.42 GHz divided by 6 gives 236 MHz
 
Originally posted by WM.
If you're referring to my post, I didn't say that HT would replace PCI--of course any Power Mac would have to have some form of PCI to be practical. I said that HT would *partially* replace PCI, which is true.
The PCI bus whipout is a reality, AMD has kept PCI slots behind the HT I/O hub (that offers a PCI bus) and PCI-X tunnels, but all the main chips on the motherboard are connected by HyperTransport. The PCI bus is history (it has become too slow) it's no more at the center of the motherboard, HT is logicaly compatible with PCI (wires are gone but software stays the same).

http://www.amdboard.com/opteron_chipsets_amd.html

"HyperTransport technology is the primary bus used in the AMD Athlon 64 and AMD Opteron processors and inside its supporting devices including the AMD-8111 HyperTransport I/O hub, the AMD-8131 HyperTransport PCI-X tunnel and the AMD-8151 HyperTransport AGP 3.0 graphics tunnel."

http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_5707_5733~62571,00.html
 
Originally posted by mathiasr
The PCI bus whipout is a reality, AMD has kept PCI slots behind the HT I/O hub (that offers a PCI bus) and PCI-X tunnels, but all the main chips on the motherboard are connected by HyperTransport. The PCI bus is history (it has become too slow) it's no more at the center of the motherboard, HT is logicaly compatible with PCI (wires are gone but software stays the same).
Right. I think we're on the same page. :)

WM
 
You can do it without HT, and it's being done already

Originally posted by mathiasr
The PCI bus is history (it has become too slow) it's no more at the center of the motherboard, HT is logicaly compatible with PCI (wires are gone but software stays the same).


But, isn't this a bit obvious?

Although the PCI bus is "history", what I/O cards are you going to use? PCI, of course.

Systems haven't used the PCI bus as the "main bus" for a long time - it's a peripheral bus. The Intel 7505 chipset for Xeon DP has up to six PCI-X (133MHz) busses coming out of the north bridge. (Not six slots, six 64-bit 133MHz PCI busses.)

HT is good - it's a cheaper way of moving lots of data around (and even better, it's a cheaper way for moving modest amounts of data around). But, it doesn't bring anything revolutionary to the finished product.

You can have several 1GB/sec PCI-X busses on a 7505. You can do the same with HT. You may be able to come in faster and cheaper with HT (since you don't need 64-bit wide traces on the mobo), but it's not revolutionary.

Before you wet yourself over what Apple may be doing, check out the Intel mobos to see what's already available at a reasonable price. If you want to see what's possible (without worrying too much about the price), look at the IBM Summit chipset.

I'm not saying that HT and the 970 rumours are bunk - but even the most extravagant stories don't do much more than put Apple "back in the ballgame". On the Intel side many of these innovations are already in the stores, and on people's desks.
 
Originally posted by rice_web
I do remember C|Net jumping the gun on a few rumors in the past, so, as always, understand that this may not be entirely true.

I'll try to dig up C|Net's previous foul-ups on Apple rumors.

So far, it hasn't been Cnet the only source. I think I read something similar in sites such as The Register, Zdnet, The Inquirer, and some other sites I don’t remember now.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.