Any Truth To This Multi-CPU Theory?

Discussion in 'Mac Pro' started by iPadPublisher, Jun 19, 2013.

  1. iPadPublisher macrumors 6502

    Joined:
    Apr 14, 2010
    #1
    A good friend of mine is a researcher for a financial company. He's one of six people on a team who researches tech companies for investment reports that come out. He's mostly just a rumor collector, who then tries to prove or debunk as much as he can. His information gets passed up the chain where more senior folks make decisions about what parts of the data set they want to run with in their official reports, blah blah blah.

    Anyhow, before its unveiling, he had strongly suggested the new Mac Pro would be modular and "very expandable" but had very little to offer otherwise. The theory he picked up on, and ran with is this... because Thunderbolt is a direct path to the CPU, and is seen as a single pipeline to the OS, that it could be possible to see CPU expansion boxes on the Thunderbolt chain that basically give you add-on CPU power, and somehow the box just sees it as a second proc and you're off and running.

    This seems sloppy to me on a number of levels, but also may make sense why Apple went with a single CPU design, if they knew that even CPU expandability was somewhere down the road...

    What say you?
     
  2. mcnallym macrumors 6502a

    Joined:
    Oct 28, 2008
    #2
    Only slight issue with that theory is that Thunderbolt Chip attaches via the PCI-Express Bus into the CPU. PCI-Express support is directly on the CPU however, so perhaps that is why is saying has direct connection to the CPU, in that it doesn't go via the C600 chipset.

    CPU to CPU connectivity is currently on Intel Xeons via a seperate bus the QPI Quick Path Interconnect, which has nothing to do with the PCI-Express bus ( or Chipset )

    Nothing to indicate Xeons of the Ivybridge-E/EP series which is what are going in the Mac Pro are changing from this QPI for the CPU connect.

    1600 series has 1 QPI link, 2x00 series has 2 QPI for Dual CPU support etc, hence why you can't run two 1600 series CPU's in a Single System.

    Not saying that couldn't be done further down the Processor roadmap, however I suspect wouldn't happen until Thunderbolt logic is moved into the CPU the same as the Memory Controller and PCI-Express logic has been done. Ivybridge however doesn't appear so far from what is available to have done this.

    It would certainly be interesting however if the Thunderbolt logic went straight into the the CPU as could expand the Thunderbolt capacity further then just using the PCI-Express bus.
     
  3. Tesselator macrumors 601

    Tesselator

    Joined:
    Jan 9, 2008
    Location:
    Japan
    #3
    Technically it's possible to have an "external CPU" connected to TB/TB2 or PCIe card edge connectors. Such kinds of "co-processor" cards already exist in fact. It's also possible to backplane multiple "single card computers" (SBC) over PCIe/PICMG and these also already exist.

    The question of Apple marketing something like this is an interesting one. No one will have any honest notions about such a move until Apple announces it. I think there is some probability, yes, but I guess not this year for sure - if indeed ever. <shrug>
     
  4. iPadPublisher thread starter macrumors 6502

    Joined:
    Apr 14, 2010
    #4
    All good info, thanks!
     
  5. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #5
    It is NOT. In the vast majority of Thunderbolt host devices sold to day the Thunderbolt controllers is hooked to the much lower bandwidth IOHub/Southbridge chipset; not the CPU.

    The Mac Pro may have one of the three TB controllers hooked directly to the CPU but that is really only because they ran out of PCI-e lanes to connect it to the normal place you hook it up.

    As a CPU interconnect Thunderbolt sucks. Anyone proposing this as a CPU interconnect who is a tech analyst should be deeply and utterly embarrassed.
     
  6. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #6
    Not all problems are IO bound. In grid computing scenarios ethernet or infiniband is commonly used.
     
  7. Tesselator macrumors 601

    Tesselator

    Joined:
    Jan 9, 2008
    Location:
    Japan
    #7
    Yup, such a system configuration could be extremely effective for more than a few applications. Not at all embarrassing!
     
  8. thekev macrumors 604

    thekev

    Joined:
    Aug 5, 2010
    #8
    It's no more direct that PCI slots on the current mac pro. Windows vendors sometimes have cto options for things like Tesla cards on those. Nothing shown so far really supports your friend's theory if we're talking about standard X86. As others have mentioned various co-processor solutions are nothing new.

    I'm surprised no one mentioned that you would essentially have to build half a computer even if such a thing worked with standard X86 chips. You would need the cpu, heatsink, power supply, chassis, and a logic board with a thunderbolt chip. You're looking at a full computer without a gpu or hard drive.
     
  9. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #9
    CPU intereconnet is IO bound.

    [/quote]

    Thunderbolt is not Infinband. It is also not Ethernet in terms of topology and extensibility/expandibility. Grid computing is largely taking OS instances that are independent and having them work together through protocols. That is not what a CPU interconnect is.


    Thunderbolt has diddly squat to do with why Apple went single CPU design.

    The fact that Intel is not putting a "good enough" number of cores into a single CPU package is why the Mac Pro has just one. The number of individual that need more than 12 and liklely 14 and 16 over the next two years is smaller than the folks who need 4. If the clueless analyst hasn't noticed the number of mainstream desktop/laptop cores has stalled at 4 with Intel products. AMD is pissing around with throwing core count at the desktop more so as marketing gimmick because their design is behind the curve.
     
  10. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #10
    The problem you have to solve may not be IO bound, that's my point. Consider that you have to do a computation that takes a week to finnish, if you can solve that in one hour on a grid, how much does a hypothetical extra seconds in latency matter?

    I know, I brought it up as a comparison on bandwidth, and what is possible and currently actually in use.
     
  11. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #11
    I don't know where this rumour comes from, but "single CPU" is very very unlikely, considering that Apple promised 12 cores and Intel doesn't have a 12 core CPU. And you can't use Thunderbolt to add CPUs. You surely can use Thunderbolt to create some fast connection between CPUs, but you would still have independent computers.
     
  12. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #12
    Which is completely tangential to CPU interconnect and CPU expansion.

    Go ahead have relatively high latency of seconds ( relatively to CPU clock times which is in more ranged in nano-seconds ) and try to do locking spread across a grid.


    It is only comparative bandwidth if looking at the bottom end of all three. At the top ends they don't compare at all.
     
  13. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #13
  14. MeFromHere macrumors 6502

    Joined:
    Oct 11, 2012
    #14
    You _could_ interconnect multiple computers via thunderbolt. The result wouldn't be as tightly coupled as a multi-CPU system using QPI. (Once it's configured by the firmware, QPI doesn't need software assistance to operate.) But with thunderbolt you could make a "cluster" with much better interconnect performance than clusters connected via ethernet. This has been done before, using older technology than thunderbolt. A couple of examples:

    "Memory Channel" was an interconnect developed by Digital. It's a PCI-based (not PCI-Express) point-to-point connection between computers, or between a computer and a hub. In the most common configuration, two computers had a region of shared memory for communication. This was just normal memory to the host computer, and the other computer accessed the memory via the Memory Channel link. DMA reads and writes to the Memory Channel PCI adapter in one computer turned into memory reads and writes in the other computer. High bandwidth (limited by the PCI speeds of that time) and very low latency compared to ethernet. Note that there was a fair amount of OS kernel and device driver software needed to make it work. For example, see http://h18002.www1.hp.com/alphaserver/download/es45_ts.pdf (page 12).

    "InfiniBand" is another high-speed interconnect that can be used similarly. It's still an active product line. http://en.wikipedia.org/wiki/InfiniBand
    An interesting thing I learned from this wikipedia page: Intel (maker of thunderbolt) bought Qlogic's InfiniBand business last year. Hmm.

    Thunderbolt could provide similar functionality to Memory Channel or InfiniBand, probably at much lower prices. But I haven't heard of anyone bringing this kind of product to market. It sounds far outside of Apple's area of expertise; they've never shown any interest in clustering their computers. (Most of the top-notch clustering intellectual property is languishing at companies like HP. If Apple wanted it, they could probably license or purchase it for pocket change.)
     
  15. Moonjumper macrumors 68000

    Joined:
    Jun 20, 2009
    Location:
    Lincoln, UK
    #15
    But Intel will have the 12 core Xeon E5-2695 v2 and Xeon E5-2697 v2 by the time the Mac Pro launches.
     
  16. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #16
    If Apple had said they were going to ship with an existing Intel offering you might remotely have a point. They did not though. The overtly refer to the next generation CPU. So your FUD doesn't really hold alot of water.


    way back in October....
    " ... Chipzilla is said to also have a 12-core processor in the pipeline as well. ... "
    http://www.engadget.com/2012/10/17/intel-roadmap-reveals-10-core-xeon-e5-2600-v2-cpu/

    And a steady drumbeat of comfirming leaks since then.

    "...eon E5-2600 v2 series is coming a quarter earlier than the Xeon E7 v2. The CPUs will have up to 12 cores (24 threads), and they are going to support Intel Secure Key and OS Guard features. ..."
    http://www.cpu-world.com/news_2013/...upcoming_Intel_Xeon_E5_v2_and_E7_v2_CPUs.html

    ----------

    You think grabbing a lock isn't I/O bound ? Or doing a store/load from memory? Please.
     
  17. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #17
    For locking you would use an appropriate framework for this, OpenCL, OpenMP etc. Memory operations will only make the problem IO bound if the amount of memory accessed is higher than the available bandwidth. Computing digits of Pi is CPU bound for example, it's not determined or improved by IO speed.

    What you are proposing would make grid computing impossible, it isn't. Not all problems are possible to solve this way though, which is why the pretext is problems that are not IO bound, which is what I started by saying when I entered the discussion.
     
  18. MeFromHere macrumors 6502

    Joined:
    Oct 11, 2012
    #18
    Locking, and memory sharing in general, are difficult problems to solve when the number of competing nodes is large. "Competing" means they are trying to access the same thing at the same time. Low-budget grid computing is feasible when the nodes rarely compete; they mostly work independently of each other.

    Don't pay attention to frameworks like OpenCL. Look at what those frameworks have to DO at the hardware level. Taking a lock on a shared resource requires a MINIMUM of one round-trip communication to the shared resource. That determines the MAXIMUM number of lock operations you can do in a unit of time. If there's competition for the lock, it takes much more than one round-trip. For many workloads, lock contention is a serious bottleneck even in SMP systems with 16-32 CPUs with fast (QPI-style) interconnect. PCIe and its descendants fare much worse than QPI.
     
  19. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #19
    I don't believe I said it was easy or without problems, and as you say it's a problem shared even on many core systems. Apple had Xgrid implemented over Ethernet, my only point is that a hypothetical use of Thunderbolt for the same purpose is not going to be limited by bandwidth for the same basic use.
     
  20. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #20
    You could rewrite OS X 100% in assembler. It isn't going to happen.


    and CPU and Memory clock speed were what back then?





    With enough kludges piled on top almost anything can work. Whether it is effective and economical is another issue.


    Not really if look at overall strategy. Intel also bought Cray's CPU interconnect.

    http://www.theregister.co.uk/2012/04/25/intel_cray_interconnect_followup/

    Intel already was a sizable player in Ethernet. > 10Gb/s Ethernet and Infiniband largely has many of the same issues. Like the Cray interconnect and the Xeon Phi line Intel is setting things up so that systems vendor who wants to build large grids can just come to them for the major subcomponents. Buy the CPU , GPGPU , CPU interconnect , storage network ( Ethernet or Infiniband ) , etc (i.e., all the high margin components ) all from Intel. You get your much more commodity drives and memory from someone else.

    Thunderbolt really doesn't play in that HPC and/or grid context at all.


    With respect to Inifinband the prices are lower because the performance is much lower. Real host-to-host throughput of Thunderbolt is roughtly x4 PCI-e v2.0.

    The other problem is that Thunderbolt compilance standards means dragging GPUs around with all of these TB connections. TB does two things. If really only need just PCI-e it is actually a higher cost than relatively short range external PCI-e would be.


    Probably because it needs the same sort of specialize, nonstandard drivers that Memory Channel needed . PCI and PCI-e are all naturally much more hiearchial than a peer-to-peer network. Thunderbolt in essence from a narrow PCI-e viewpoint is a switch. Not a peer-to-peer network. The host and peripherals have different compliance standards to meet.





    XGrid ( at NeXT Zilla ). XGrid was dropped last year (and always was mainly focused on homogenous Mac clusters/grids. )

    A sustained primary interest no. There was a time in the 2nd Jobs era at Apple when desktops where declining and overall Mac PC market share was steadily imploding in which Apple threw multiple initiatives out into significantly different directions to see what would stick. That is an era when Macs powered an Va Tech supercomputer and Apple was rolling out things like XRaid. None of that really caught on in a sustainable fashion and the market force shifted much more to what particularly no value-add box could you cobble together into a makeshift grid/cluster.

    Apple got out of that market. It is extremely doubtful they will want to jump back in now. Especially with the design choices with the 2013 Mac Pro.
    Inifinband ( at least where most folks are going with Infiniband ) won't work at all. ( would need minimally PCI-e x8 v3.0 and more desirably PCI-e v3.0 slots to work. Mac Pro has none free either as slots or embedded. )



    Intel didn't get Cray's for pocket change. It would not make any finanicial sense at all for Apple to absorb that kind of tech for a single minor part of the Mac product line up.

    Intel's server group is bigger than Apple's whole desktop product line. Probably a decent portion of the laptop product line too. For them they can get economic return on investment for these Infiniband, Cray interconnect, etc. acquisitions they are making because they are still going to sell that stuff directly to many vendors.
     

Share This Page