Hyperthreading in next gen PowerPCs

Discussion in 'Macintosh Computers' started by topicolo, Apr 5, 2005.

  1. topicolo macrumors 68000

    topicolo

    Joined:
    Jun 4, 2002
    Location:
    Ottawa, ON
    #1
    A recent blurb about the triple core PowerPC chip that will be used in the Xbox 2 claims that each of these cores will have a technology similar to intel's hyperthreading.
    This means that there would effectively be 6 logical cores to do the processing!
    Could this be a sign for what is to come on the mac in a next gen powerpc?

    For those of you who don't know what I'm talking about, hyperthreading is a technology intel introduced that allows a chip to use more of its processing units at once. Previously, a program thread that calls on the integer units on a chip to work will only use those units, while the floating point units sit idle.
    With hyperthreading, the threads are organized in such a way that while the integer units are chomping away, floating point threads are also being used to keep the floating point units on the chip busy. This means that more of the chip is used at once and intel claims that performance gains of up to 30% can be seen with this technology.
     
  2. Sun Baked macrumors G5

    Sun Baked

    Joined:
    May 19, 2002
    #3
    There are also threads on SMT, Power5-UL/GR-UL and a pdf floating around I know I've linked to several times discussing this.

    On the current Power4-UL (aka PPC 970xx/GP-UL) most likely not.

    But SMT is only one of the nifty features of Power5, there's also the dynamic thermal/power balancing features built into the CPU.

    Makes PowerTune look crude.
     
  3. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #4
    i recall reading somewhere that the reason hyperthreading works for the pentium 4 is because of the pipeline lenght. It's long, if for some reason something is "canceled" or awaiting information then it could potentially take a long time to spit the answer out if it had to wait.. remember this is nanoseconds we're talking about, but in the end it adds up. Hyperthreading allows thread switching to take place in the middle of a particular process in case something is "stalled" due to reasons listed above and others.

    The PPC doesn't necessarily have a need for hyperthreading because it's pipeline is still short enough that it doesn't hurt performance. In other terms, since it isn't as long as the p4's the gains from doing thread switching equals the same as what would happen if you just waited.. so no benefit.

    WHat you're thinking of is Multiple CORES. essentially the main component of a processor with 2 in each processor, effectively making it a 2 processor processor. Totally different than HyperThreading.
     
  4. daveL macrumors 68020

    daveL

    Joined:
    Jun 18, 2003
    Location:
    Montana
    #5
    No, actually, he's thinking of the Power5 SMT capability. As mentioned above, there are PDFs from IBM floating around. In case you aren't aware, the Power5 processor is already being deployed in production server environments; it's not exactly new.

    Dual core has nothing to do with Hyperthread or SMT.

    I'm not sure where you got the impression that the G5 has a short pipeline. It's shorter than the P4, for sure, but it's a lot longer than the G4.
     
  5. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #6
    again, since the pipeline is short enough.

    you switch threads and it takes x amount of time

    you wait for the current execution to finish and it takes x amount of time +/- ...

    in the end the trade off isn't worth it. there's no real need because as soon as you switched threads the data would be there and it would finish and not stall the current thread. get the point?

    you have 2 operations. 1 to wait for the data to be available, or 2 to switch threads. in the end they take roughly the same amount of time. so the hyperthreading provides no real advantage. I'm not real familiar with the Power5, and it may have a longer pipeline where doing hyperthreading may be beneficial. but the only reason it works well on the P4 is because of the pipeline length. it is simply so long that if something were to stall it would be faster to switch threads and allow the processor to do something instead of nothing. hence the reason it is just so much better in a multithread environment. It makes up for it's shortcomings in the long pipeline department.
     
  6. mkrishnan Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #7
    How does programming for hyperthreading work in the Intel world? Is it enough to program threads, and if they happen to use FP or INT calcs, then they activate hyperthreading? It seems like, in the real world, and outside of video games and maybe graphics software, that well-defined threads that have almost all FP calcs or almost all INT calcs would be rare...so I'm thinking I misunderstand something. :rolleyes:
     
  7. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #8
    in the real world the processor just does it. there's no programming required for hyperthreading beyond just threading your application. Which is typically why a P4 hyperthreaded proc outperforms a dual processor machine in XP due to the fact that most applications simply dont take advantage of SMP but there's no code required for hyperthreading. at least this is my understanding of it. I am however not a processor guru.
     
  8. mkrishnan Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #9
    Hmmm...interesting.... But if a program were not specifically written for HT, then I'd think your earlier comment about the wasted time in switching hyperthreads would be a dominant factor, negating a lot of the gains.

    WRT multiple processors, I thought that with most compilers, threaded applications are automatically able to take at least some advantage of the multi-processor setup?
     
  9. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #10
    I can't really comment on this as it's way beyond my knowledge here.

    The big issue with the P4's 20 stage pipeline is that to speed things along, and this is true of nearly every processor now, is that it tries to predict some of what you are going to do. So the calculations are done before they're needed. THis is something the AMD Athlon's kick serious butt they have a better predictive algorithm which may have been lessened in the newer P4s (I heard they were supposed to fix it with the newest core.. not the recently announced but the newest one's out in full scale quantities). The hyperthreading end of things is related to this prediction algorithm. If the data is not available and it's far along in the execution cycle then the hyperthreading can switch threads and allow another thread to execute instead. But since the pipeline is so long (only 4 stages longer than the PPC970 which has 16) then in the end switching threads is much faster than actually waiting for the data. Notice this happens in nanoseconds, it's very fast. But the way the PPC970 is designed allevates some of the pains you get in the P4 and those design decisions make hyperthreading pretty much unneeded at this time. I forget exactly what it is that the 970 does that counter acts this... i'm reading some stuff right now about it if i find the answer i'll respond.

    Think of it this way, applications don't have to know that 2 processors exist with hyperthreading. But with a dual processor system they do. Being hyperthreaded the data is maintained on the same cache, while the dual processor system doesnt share the data in the cache. as such the processor can actively switch between threads.
     
  10. ddtlm macrumors 65816

    Joined:
    Aug 20, 2001
    #11
    So far these rumored Xbox2 cores sound very similar to the single normal core in a Cell. Both are high-clocking PPC, 2 instructions per cycle, and support SMT. Hmmmmm.

    Logik:

    Performance with SMT depends on a lot of things.... pipeline length, type of the workload, number of execution units, the way that SMT is implemented, etc etc. Note that IBM did use SMT on the Power5, and Sun is going crazy with SMT on their upcoming low-clocked "Niagra" processor (8 cores, 4 threads each == 32 threads).

    SMT of every sort is just a cheap version of SMP, in both cases the only way to get more performance from them is to run more than one thread (or more than one program).

    Incorrect, for example a Hyperthreaded P4 shows up as two processors to the OS and all programs.

    Some SMP designs share cache, some don't. For example, the Power4 and Power5 share an L2 between two cores and share L3 between as many as 8. But that is more difficult than just giving each core its own L2, and it adds access latency, so most companies have thus far gone with dedicated L2's, such as in the UltraSparc4, dual-core P4's, and dual-core Opterons.

    daveL:

    Yeah I think its longer than Opteron too, but I don't know.

    mkrishnan:

    As far as I know there is no auto-threading compiler out there. Threading can be very tricky. Its not hard to do, its just easy to do wrong. Threading errors can be very subtle, and getting good speedup from threads can be very tricky (depending on what you are doing).
     
  11. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #12
    Sorry I was comparing P4 - 970.. if you want to start getting into Power and Sparc stuff that's way beyond my realm of knowledge but ya.

    There are a lot of factors in whether HT will work well on a processor, i was trying to dumb it down enough to get the general idea of how it works across. If you want to understand processors in more than a "consumer" type way, go read the articles by Hannibal on arstechnica.com he can explain a lot of stuff very well.

    It's actually really interesting. Check it out if you're at all interested.
     
  12. mkrishnan Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #13
    I think I've heard of such a thing for Fortran or possibly C on massively parallel machines, but I'm not sure. At any rate, I was thinking more of apps that are written in threads already. Although I guess many apps out there still are not. Ad you're right, of course, getting the thread setup to work right for multiple processors can be tricky. The only time I've ever played with threading was in Java, and it wasn't really for speed. There, threading seemed the most straightforward way to get what I wanted....
     
  13. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #14
    java makes it pretty easy until you get into asynchronous-vs-synchronous threading.. then ramp up the complexity of the application... sound/graphics/control/network in a game for example.
     
  14. mkrishnan Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #15
    Yeah, I really doubt Java is the best environment for that. ;)
     
  15. Logik macrumors 6502a

    Joined:
    Apr 24, 2004
    #16
    i'll second that. i think C# has relatively similar methods to do threading. however once again not exactly the greatest environment for games. but you get the idea, threading can be a complicated thing (which you stated before i said anything so i'm not lecturing, just making it clear to other readers).

    there are many other needs for threading, typically when you don't want 2 processes blocking the other. particularly in operating system level stuff, freebsd and linux struggle to get proper locking of threads. it's not an easy thing.
     

Share This Page