PDA

View Full Version : NVIDIA Working on GPGPUs for Macs?




MacRumors
Jan 24, 2008, 06:19 PM
http://www.macrumors.com/images/macrumorsthreadlogo.gif (http://www.macrumors.com)

Appleinsider claims (http://www.appleinsider.com/articles/08/01/24/nvidia_working_on_first_gpgpus_for_apple_macs.html) that NVIDIA is working on bringing "general-purpose computing on graphics processing units" (GPGPUs) to the Mac.

GPGPUs are described as a new type of graphics processors that can perform complex computations typically reserved for the system's primary CPU.
The technology -- in Nvidia's case -- leverages a proprietary architecture called CUDA, which is short for Compute Unified Device Architecture. It's currently compatible with the company's new GeForce 8 Series of graphics cards, allowing developers to use the C programming language to write algorithms for execution on the GPU.
According to Appleinsider, the GPGPUs can be beneficial in a number of applications with complex mathematical requirements, such as raytracing, scientific applications, cryptography, and audio and image processing.

NVIDIA's $1500 Telsa card (http://en.wikipedia.org/wiki/NVIDIA_Tesla) is the first example of this class of graphics card. When launched for Mac, these GPGPUs will likely be a high-end build-to-order option for Mac Pros.

Article Link (http://www.macrumors.com/2008/01/24/nvidia-working-on-gpgpus-for-macs/)



herr_neumann
Jan 24, 2008, 06:25 PM
Maybe we will finally get SLI now....

diamond.g
Jan 24, 2008, 06:27 PM
Maybe we will finally get SLI now....Sadly the only card that can fit in the Mac Pro is the 1 GPU card. The other GPU configurations are separate boxes.

Umbongo
Jan 24, 2008, 06:45 PM
Don't get too excited about new cards, the Quadro FX 5600 and 8800GT already support CUDA.

exabytes18
Jan 24, 2008, 06:57 PM
The CUDA SDK for windows is awesome. It includes some small demos of CUDA implementations such as a real-time mandelbrot generator, particle simulation, stable fluids model, and a bunch of other command-line based tests and such. I ran the demos on an 8800GTS which was sustaining several hundred GLOPS during the demos and tested to about 60 GB/s of internal bandwidth.

There is some extreme power in GPUs that's just waiting to be unlocked.

Edit: While it's not SLI, CUDA does support multiple GPUs per configuration.

twoodcc
Jan 24, 2008, 07:05 PM
sounds good to me. we'll see if they really get into the mac pro anytime soon

ricosuave
Jan 24, 2008, 07:07 PM
I knew this day would come! I have no idea of what you guys are talking about. :eek:

haunebu
Jan 24, 2008, 07:12 PM
In the olden days, we greybeards called this AltiVec. ;)

Abstract
Jan 24, 2008, 07:23 PM
This GP GPU sounds like another CPU. I mean, mathematical tasks such as image and sound processing? If they're really interested in another CPU, just add another CPU. It doesn't have to be from NVIDIA. It can be from Intel, no?

I don't really get it. Or is this just like what AltiVec used to be?

exabytes18
Jan 24, 2008, 07:35 PM
GPGPU is more-or-less for highly parallel operations. It can achieve an order of magnitude higher performance than a normal CPU.

I suppose it's like altivec in the sense that it's not x86... GPGPU is more like a single processor with hundreds of more specialized cores.

MrCrowbar
Jan 24, 2008, 07:38 PM
Well GPU's are good in dumb parrallel processing, which is what rendering 3d images is all about. Your CPU can't do it that fast because it has to be able to do all kinds of differnet stuff.

bigwig
Jan 24, 2008, 07:50 PM
Specialized coprocessors were all the rage in the 80s and into the very early 90s. Rapidly advancing speeds of general purpose CPUs rendered them moot and that line of development was dropped. Now we're back to repeating history. No different than high-speed interconnects, really. They started out serial, went parallel for the next round of speed bumps, and now we're back to serial.

Kingsly
Jan 24, 2008, 08:02 PM
It's currently compatible with the company's new GeForce 8 Series of graphics cards, allowing developers to use the C programming language to write algorithms for execution on the GPU.

So wouldn't my 8800GT technically be capable of running as a GPGPU in my Mac Pro, assuming Appe issues the correct SW update?

cgc
Jan 24, 2008, 08:08 PM
My old Amiga 3000 had a CPU and a Math Coprocessor (http://en.wikipedia.org/wiki/Motorola_68881)...seems like the same thing to me...

asphalt-proof
Jan 24, 2008, 08:46 PM
Yes, yes yes, but what kind of framerates am i going to get on Counterstrike?;) Had to be said. :D

EagerDragon
Jan 24, 2008, 08:48 PM
Sorry Intel, sounds like the days of x86 instruction set CPU(s) are coming soon to an end. In a few years these babies maybe be providing most of the horse power for general purpose computing.

bête noire
Jan 24, 2008, 09:23 PM
Sounds like just the thing for BOINC tasks.

When launched for Mac, these GPGPUs will likely be a high-end build-to-order option for Mac Pros.

Yeah, that'd be right. If it's any good, Apple will overprice it. :mad:

Nichod
Jan 24, 2008, 09:24 PM
This is one advantage of the Apple marketshare growing. Increased support from the hardware community.

SirOmega
Jan 25, 2008, 12:14 AM
The only issue I have is that I think the current GF8 series of cards are only FP16 - 16 bit floating point aka single precision. This may not be enough for most apps. It would be great for running test runs, but I know for scientific calcs FP16 isn't sufficient, they need FP32.

enygma
Jan 25, 2008, 12:23 AM
Maybe we will finally get SLI now....
CUDA doesn't need SLI to support multi GPU. The application developer addresses the threading to individual GPUs in the application itself, this way, they can access all memory on all GPUs, where as SLI just uses the additional GPUs only and approximates the splitting of frames in DirectX and OpenGL.
Don't get too excited about new cards, the Quadro FX 5600 and 8800GT already support CUDA.
Yah, but the 8800 GTX only has half the memory and the QuadroFX 5600 is twice as expensive.
This GP GPU sounds like another CPU. I mean, mathematical tasks such as image and sound processing? If they're really interested in another CPU, just add another CPU. It doesn't have to be from NVIDIA. It can be from Intel, no?
That is essentially what it is. Treating the GPU like a CPU in terms of compute intensive tasks. The problem with just adding another CPU is that the performance of Tesla for certain applications can exceed that of many CPUs. From experience, the 8800 GTS in VMD (molecular dynamics) benched about 200 GFLOPS, while a 2.6GHz Intel processor (only one core) was 5 GFLOPS. The application supported multi GPU, so even if you decided you should add another processor, might as well add another GPU while you're at it. Heck, I built an entire workstation around this concept, and have 6 GPUs successfully running in a box, and got just over 1 TFLOP with 6 8800 GTS GPUs.

http://www.ocia.net/articles/tycrid/page1.shtml

ungraphic
Jan 25, 2008, 12:29 AM
Thats great, but will these cards be compatible with 'older' mac pros or is this gonna be another disappointing flop?

AidenShaw
Jan 25, 2008, 12:50 AM
The only issue I have is that I think the current GF8 series of cards are only FP16 - 16 bit floating point aka single precision. This may not be enough for most apps. It would be great for running test runs, but I know for scientific calcs FP16 isn't sufficient, they need FP32.

I don't understand your post at all.

Standard "single" floating precision is 32-bit floating, "double" precision is 64-bit floating. 16-bit floating is an oddity that has very little traction (it is not supported by the x86 architecture, for example).

See http://en.wikipedia.org/wiki/Floating_point

The "half-floats" in CUDA probably aren't interesting at all. http://forums.nvidia.com/index.php?showtopic=36286

In fact, since modern CPUs run 64-bit floating at almost the same performance as 32-bit single precision - the use of 32-bit float is dropping. Using "half-precision" float would not be interesting for very many applications.

qtx43
Jan 25, 2008, 02:02 AM
I don't understand your post at all.

Standard "single" floating precision is 32-bit floating, "double" precision is 64-bit floating. 16-bit floating is an oddity that has very little traction (it is not supported by the x86 architecture, for example).

See http://en.wikipedia.org/wiki/Floating_point

The "half-floats" in CUDA probably aren't interesting at all. http://forums.nvidia.com/index.php?showtopic=36286

In fact, since modern CPUs run 64-bit floating at almost the same performance as 32-bit single precision - the use of 32-bit float is dropping. Using "half-precision" float would not be interesting for very many applications.I disagree. Granted, as time goes on, these video cards will get more memory, but like graphics, sometimes you don't care a whole lot about the precision. For example, imagine doing real time image processing on a high resolution video, using fft or other transforms. On a bargain basement PC. It's coming.

Johnny Mosrite
Jan 25, 2008, 02:28 AM
CUDA is fundamentally awesome, it really does deliver 10x-100x performance of your Intel CPU on floating-point heavy operations, if those operations are amenable to parallelization. Which of course pretty much anything is if it burns enough CPU cycles.

This gets potentially very exciting for users of Logic Pro, Garageband etc - if the heavy duty compute kernels in these apps were ported to CUDA it would introduce a big step function in performance for music production, and be a big incentive for people to upgrade to an NVIDIA-powered Mac from whatever they currently have. Looking forward to the first proof of concept AU effect or instrument that uses CUDA.

winterspan
Jan 25, 2008, 03:38 AM
The CUDA SDK for windows is awesome. It includes some small demos of CUDA implementations such as a real-time mandelbrot generator, particle simulation, stable fluids model, and a bunch of other command-line based tests and such. I ran the demos on an 8800GTS which was sustaining several hundred GLOPS during the demos and tested to about 60 GB/s of internal bandwidth.
There is some extreme power in GPUs that's just waiting to be unlocked.

Damn you! I need one NOW! This may push me over the edge for a new Mac Pro with the 8800GT! :)


This GP GPU sounds like another CPU. I mean, mathematical tasks such as image and sound processing? If they're really interested in another CPU, just add another CPU. It doesn't have to be from NVIDIA. It can be from Intel, no?
I don't really get it. Or is this just like what AltiVec used to be?

Think Copressor. "GPGPU" is just a concept. It refers to a method of programming the shaders (sort of like simple "cores") of a video card to carry out "general-purpose" computationally -intensive calculations, instead of 3d graphics calculations. Due to the highly-parallel nature of 3d graphics processing, the hardware built to calculate it is also very efficient when used on other types of highly-parallel calculations generally seen in the High-performance computing arena (supercomputers). Think of uses such as digital signal processing, digital imaging, Ray-tracing, digital audio and video processing, scientific simulations such as molecular dynamics, computational chemistry, weather modeling, neural networks, etc.

The main article is sort of misleading by making it appear as if a "GPGPU" is only a discrete item seperate from existing graphic cards. Granted, nVidia is now making seperate "GPGPU" cards that are basically an 8800GTX without a DVI port and some other tweaks. My point is that "GPGPU"is just a concept, and can be done on existing high-end Nvidia (and ATI) graphics cards, namely the 8800 series. Originally, people were trying to adapt the shaders in GPUs to process general data using the GPU shading language, which was incredibly difficult. Now both nVidia (with CUDA) and ATI (Close to Metal) offer SDKs for simpler programming of the GPUs in a c-like language.

However, GPGPU won't be replacing your Core 2 Duo anytime soon, as it is not capable of the general tasks your processor does now. It will probably be used as a type of coprocessor on to which specialized applications will off-load their data processing.

So wouldn't my 8800GT technically be capable of running as a GPGPU in my Mac Pro, assuming Appe issues the correct SW update?
In theory Nvidia would just have to release their CUDA SDK for OSX.

Sorry Intel, sounds like the days of x86 instruction set CPU(s) are coming soon to an end. In a few years these babies maybe be providing most of the horse power for general purpose computing.
I would definitely not go that far. GPUs can't do anything other than extremely parallel calculations. You'll still need an x86 for all the general processing tasks.

The only issue I have is that I think the current GF8 series of cards are only FP16 - 16 bit floating point aka single precision. This may not be enough for most apps. It would be great for running test runs, but I know for scientific calcs FP16 isn't sufficient, they need FP32.
Single prec is 32-bit, double is 64. I believe only the new "dedicated" GPGPU cards from Nvidia support double precision CUDA.

Cromulent
Jan 25, 2008, 05:12 AM
In the olden days, we greybeards called this AltiVec. ;)

This is nothing like AltiVec.

Edit : AltiVec was a way to pass 128bit instructions through the CPU in 1 cycle rather than having to split them up into separate instructions. This is a method of passing instructions to the graphics card which execute them faster and in parallel.

takao
Jan 25, 2008, 08:07 AM
this is nothing like altivec .... looks like the greybeards don't know what they are talking about ;)

in fact similiar stuff has already been able before using custom C libraries etc. but it was mostly too difficult to do and too tied to plattforms (thus nobody invested serious money into it)
speedup for scientific apps can be huge if the amount of parallelism can be really used

thou actually i would think the great breakthrough will come once the GPu can be directly intigrated into the operating as an another "pseudo - cpu" which would make it much more easier to get the performance out of it... or at least try to get MPI/OpenMP to work with it

edit: to those saying "why not add a second CPU"
well the GPU has to be there anyway on a desktop machine and most of the time will be sitting around idling when no 3D apps are running so why not use it for something ...
also GPUs are more parallel than another CPU and much, much better at floating point operations

Evangelion
Jan 25, 2008, 08:15 AM
In the olden days, we greybeards called this AltiVec. ;)

Except Altivec was few orders of magnitude slower... Altivec is for all intents and purposes identical to SSE. Sure, it was better at somethings while being worse at some others. But the basic idea was/is identical. So if you can compare Altivec to this, you can compare SSE as well. And I don't think you can do that, since there are significant differences between this and SSE/Altivec.

Azurael
Jan 25, 2008, 08:29 AM
Single prec is 32-bit, double is 64. I believe only the new "dedicated" GPGPU cards from Nvidia support double precision CUDA.

Although if that's the case, it's obviously an intentional software limitation as these cards use exactly the same GPU as their video output-equipped counterparts.

blackcrayon
Jan 25, 2008, 09:00 AM
After the intel switch, I thought Apple might be thinking of putting something like this inside every mac, to sort of "differentiate" the hardware from the more generic PCs... Sortof like the old Quadra AVs with their DSP chips on board. Of course I wouldn't want it to increase the price of every mac by another $999, but it would be cool to be able to "expect" a processor like that on board... But I guess it would be just as easy to "expect" it on the graphics card. Apple's advantage is being able to much more quickly leverage stuff like that at the OS level, since they can get OS updates out so much more quickly than the main competition...

avkills
Jan 25, 2008, 09:06 AM
This is nothing like AltiVec.

Edit : AltiVec was a way to pass 128bit instructions through the CPU in 1 cycle rather than having to split them up into separate instructions. This is a method of passing instructions to the graphics card which execute them faster and in parallel.

This guy needs more grey!

AltiVec was 128bit wide, but it was usually used to process 4 32bit vector calls per cycle. So yes this is like AltiVec but with a lot more parallelism.

-mark

headfuzz
Jan 25, 2008, 09:19 AM
So am I right in thinking this is along the lines of the x87 maths coprocesssor architecture from the 80s? :confused:

diamond.g
Jan 25, 2008, 09:26 AM
So am I right in thinking this is along the lines of the x87 maths coprocesssor architecture from the 80s? :confused:

It is probably closer to CELL than to a Coprocessor. Think of many coproccessors on one die running in parallel real fast.

kps
Jan 25, 2008, 10:24 AM
And the Wheel of Reincarnation (http://www.catb.org/jargon/html/W/wheel-of-reincarnation.html) turns another ten degrees....

(Note the date on the citation)

tapaul
Jan 25, 2008, 11:19 AM
CUDA is unfortunately , currently of no relevance to either software users or developers.Its been "out" for a while now for PC, and there is pretty much zero take up and/or interest in it, or its SDK.
It is a great idea, but for it to happen, there has be a range of killer mainstream applications, demonstrating why, beyond all doubt, the end user must have CUDA.
Nvidia recently purchased Mental Images , which might just be the where the CUDA puzzle starts to come together.Perhaps this is NV's way of saying "well if you won't do it then we will do it for you".However , I suspect they will have to make a few more key acquisitions before people start to take notice.

the key problem is that CUDA is not and will never be, compatible with any ATI (or other vendors) offerings.This is, imho, the kiss of death for this technology. For this type of streaming process to succeed, there needs to be a common API , (much like OpenGL for graphics).Which any hardware vendor can produce a driver for.Then the software developer can be sure that if they use the API, it wont be a waste of time and effort.

cheers

iSee
Jan 25, 2008, 12:59 PM
This GP GPU sounds like another CPU. I mean, mathematical tasks such as image and sound processing? If they're really interested in another CPU, just add another CPU. It doesn't have to be from NVIDIA. It can be from Intel, no?

I don't really get it. Or is this just like what AltiVec used to be?

This is really an alternative to things like Altivec. It is a really good development to my mind because it might be better. If the GPU guys apply all they know about highly-paralllelized computation, maximizing throughput, etc. to more general computation, they might make some incredible things possible.

It's also interesting because it decouples the coprocessing unit from the CPU, which could allow for more flexible, customized computing environments.

There are some interesting problems just dying for coprocessors and this might be a way to address that. For example, there was talk at one point about "Physics" coprocessors. There's been some work in that area, but nothing mainstream really caught on. Some other possibilities: AI processors. Even sound processing seems to be moving back to the CPU. Individually, these things don't seem to create a viable market for a coprocessor. But a most general purpose coprocessor might help some of this stuff to take off.

On the other hand, it seems likely that computers will be getting more and more CPU cores, which can be utilized for the same purposes.

So we've got a competition between the CPU guys and the GPU guys to provide us with the fastest, most useful, cheapest processing power. And that's got to be good for us users.

iSee
Jan 25, 2008, 01:12 PM
And the Wheel of Reincarnation (http://www.catb.org/jargon/html/W/wheel-of-reincarnation.html) turns another ten degrees....

(Note the date on the citation)

That's funny.

But possibly not true anymore. Typical systems have had both a CPU and GPU for a long time now. GPUs are now making a bid to take on more of the general purpose computing tasks, so really the wheel might be rolling backwards.

However, tapaul is right (look back a few posts). If there isn't a common API that developers can feel confident in, they simply won't spend their development resources to support a coprocessor. It will be a tiny and expensive niche product that will never turn in to anything cheaper and more widely available.

gkarris
Jan 25, 2008, 01:34 PM
I knew this day would come! I have no idea of what you guys are talking about. :eek:

Whatever it is they're talking about, I hope we can get it in the Mini... :eek:

exabytes18
Jan 25, 2008, 04:51 PM
Tesla is expensive, but CUDA's not limited to just Teslas. You can execute it on any 8-series graphics cards. I don't think it was ever intended to be a mainstream technology, at least that's the sense I get from the marketing of it. Your average Joe is never going to use 300 GFLOPS.

It's there for those who need such compute power. Just because it's not supported by major software vendors doesn't mean that there's not any developers using it.

Cloudane
Jan 25, 2008, 05:30 PM
Here was me hoping it was a Game Player's GPU :p

(Actually for my uses I don't mind the one in the iMac too much but still)

MacFly123
Jan 25, 2008, 06:48 PM
I ran some pretty hardcore diagnostics on these suckers and I couldn't believe how snappy Safari was :) hehe!

hugodrax
Jan 25, 2008, 11:08 PM
The only issue I have is that I think the current GF8 series of cards are only FP16 - 16 bit floating point aka single precision. This may not be enough for most apps. It would be great for running test runs, but I know for scientific calcs FP16 isn't sufficient, they need FP32.

You really need 64bit precision. I thought the GF8 did 32bit precision for its GPGPU work.

hugodrax
Jan 25, 2008, 11:14 PM
This is nothing like AltiVec.

Edit : AltiVec was a way to pass 128bit instructions through the CPU in 1 cycle rather than having to split them up into separate instructions. This is a method of passing instructions to the graphics card which execute them faster and in parallel.

Wow, being clueless there. Altivec and SSE is the same crap, Single Instruction Multiple Data. Each implementation have their own strengths and weakness but overall they provide the capability to do high performance Vector calculation. Stuff that required multimillion dollar Crays.

hugodrax
Jan 25, 2008, 11:16 PM
CUDA is unfortunately , currently of no relevance to either software users or developers.Its been "out" for a while now for PC, and there is pretty much zero take up and/or interest in it, or its SDK.
It is a great idea, but for it to happen, there has be a range of killer mainstream applications, demonstrating why, beyond all doubt, the end user must have CUDA.
Nvidia recently purchased Mental Images , which might just be the where the CUDA puzzle starts to come together.Perhaps this is NV's way of saying "well if you won't do it then we will do it for you".However , I suspect they will have to make a few more key acquisitions before people start to take notice.

the key problem is that CUDA is not and will never be, compatible with any ATI (or other vendors) offerings.This is, imho, the kiss of death for this technology. For this type of streaming process to succeed, there needs to be a common API , (much like OpenGL for graphics).Which any hardware vendor can produce a driver for.Then the software developer can be sure that if they use the API, it wont be a waste of time and effort.

cheers

Eventually it will happen. But the beauty of OS X is you are already doing GPGPU stuff on your mac. Like using core image etc.. and Aperture. etc..

diamond.g
Jan 26, 2008, 08:39 AM
Eventually it will happen. But the beauty of OS X is you are already doing GPGPU stuff on your mac. Like using core image etc.. and Aperture. etc..

That still counts as drawing something (in both cases), not really general purpose as a GPU is designed to draw stuff. GP is more like Folding or the SETI project.

SPUY767
Jan 26, 2008, 09:24 AM
This GP GPU sounds like another CPU. I mean, mathematical tasks such as image and sound processing? If they're really interested in another CPU, just add another CPU. It doesn't have to be from NVIDIA. It can be from Intel, no?

I don't really get it. Or is this just like what AltiVec used to be?

GPU's are already capable of vastly more processing power than today's CPU's. The problem is that the GPU's are very specific in what they can do. an 8800GT has 112 Pixel pipelines, you can view that as 112 seperate thread execuition paths. It can run highly specialized processes like a 112 core processor. GPU's only run in the 6-700 MHz range, but with parallel processing capability like they have, who needs clock speed. With AMD snatching up ATi, I see intel potentially forming a quasi-partnership with nVidia, with the combination of a processor like the Yorkfield, and a GPU that is GP capable, bringing the next wave of computer graphics. On-Demand ray-tracing.

skunk
Jan 26, 2008, 09:35 AM
but with parallel processing capability like they have, who needs cock speed.Are we in the right thread? :confused:

SPUY767
Jan 26, 2008, 02:05 PM
Are we in the right thread? :confused:

The "L" key on my keyboard is damnable. It only works about half the time.

Analog Kid
Jan 27, 2008, 12:39 AM
This is essentially an acknowledgement by nVidia that there's no room for growth in their core business so they're hoping their name will give them an opportunity to sidestep into a new business. The problem is they're going it alone. As long as CUDA is a proprietary API, nobody is going to adopt it. Neither Dell nor Apple want to tie themselves and their high end applications to another single source.

If a standard emerged, I could see this going somewhere in the short term-- until Intel woke up and built their own Cell. In the GPU space, the video cards are largely abstracted through either OpenGL or DirectX. If nVidia or ATI disappeared as so many have before them, the OS and application vendors wouldn't blink. They keep writing to the same interface and new hardware picks up the load.

Here, Apple (or whoever) would need to mark in their system requirements: "Requires an nVidia CUDA processor". So now you need a specific Intel CPU and a specific nVidia GPU. They won't do it.

Frankly, it's silly to have all that silicon sitting out on a separate card doing nothing unless you're running the absolute latest games with updated drivers. nVidia is realizing that. AMD figured it out a while ago and bought ATI when they had the chance.

The only issue I have is that I think the current GF8 series of cards are only FP16 - 16 bit floating point aka single precision. This may not be enough for most apps. It would be great for running test runs, but I know for scientific calcs FP16 isn't sufficient, they need FP32.
Are you familiar with char, short, long, long long, float, double, long double? All of those types exist on all processors-- but may be decomposed into smaller sub-operations if the processor doesn't support the width natively. Same would go here, I presume. From nVidia's standpoint, there's no need for anything wider than 16bit calculations. They could put 64bit data paths in but then all those extra bits would be wasting silicon and power for any calculation with less precision. Make the little pieces run fast and the people who need more precision will still see the benefit.
This is nothing like AltiVec.

This is exactly like AltiVec. It's bigger and it's off chip, but it's the same idea: fast, dedicated, vector processing.
It is probably closer to CELL than to a Coprocessor. Think of many coproccessors on one die running in parallel real fast.
Careful, I've brought this up in other threads, and people get their panties in a bunch about it... Because Apple went Intel rather than Cell they think Cell is a dead concept. Cell is exactly where we're all heading. Because of inertia, we may see this wasteful division of labor for a while before the cost of doing it this way is too prohibitive, but eventually it'll all get brought to the motherboard and then into the chipset, then into the main processor.

Mac Pros have 8 cores on them now-- and each core is wasting a huge amount of logic. Each CPU is handling a single purpose thread, but carrying all the logic necessary for any type of thread that might be thrown at it. I've got a whole core handling a stream of integers coming off the network, but I've got all the floating point and SSE logic sitting there idle. Meanwhile I'm compressing a video stream and only have one SSE unit available to that thread.

And through all this my high powered GPU is being taxed with nothing more than a progress bar...

imbored
Jan 27, 2008, 12:44 AM
I wonder if macs will support the 9000 series when it comes out

diamond.g
Jan 27, 2008, 08:10 AM
If a standard emerged, I could see this going somewhere in the short term-- until Intel woke up and built their own Cell. In the GPU space, the video cards are largely abstracted through either OpenGL or DirectX. If nVidia or ATI disappeared as so many have before them, the OS and application vendors wouldn't blink. They keep writing to the same interface and new hardware picks up the load.
If Nvidia or AMD(ATI) dissapeared, we would be in for a world of hurt. Name there hasn't been a true third player in the GPU business for a long while. Any other company that makes GPU's nowadays is gunning for the low end sector (the same sector Intel dominates). Very few places can do what Nvidia and AMD(ATI) do best.

This is exactly like AltiVec. It's bigger and it's off chip, but it's the same idea: fast, dedicated, vector processing.
SSE is closer to AltiVec than GPU's are. How many AltiVec units are there? I thought there was only 1. In a GPU you have waaaay more than one (otherwise it would be slow at drawing pixels).

Careful, I've brought this up in other threads, and people get their panties in a bunch about it... Because Apple went Intel rather than Cell they think Cell is a dead concept. Cell is exactly where we're all heading. Because of inertia, we may see this wasteful division of labor for a while before the cost of doing it this way is too prohibitive, but eventually it'll all get brought to the motherboard and then into the chipset, then into the main processor.

Mac Pros have 8 cores on them now-- and each core is wasting a huge amount of logic. Each CPU is handling a single purpose thread, but carrying all the logic necessary for any type of thread that might be thrown at it. I've got a whole core handling a stream of integers coming off the network, but I've got all the floating point and SSE logic sitting there idle. Meanwhile I'm compressing a video stream and only have one SSE unit available to that thread.

And through all this my high powered GPU is being taxed with nothing more than a progress bar...
Cell is faster than people give it credit for. The problem is having Coders change how they code. Larabee isn't something Intel cooked up because they though Cell architecture was going away.
I wonder if macs will support the 9000 series when it comes out
They will support the new GPU's as fast as they supported the 3x00 and 8x00 series.

Cromulent
Jan 27, 2008, 01:24 PM
Wow, being clueless there. Altivec and SSE is the same crap, Single Instruction Multiple Data. Each implementation have their own strengths and weakness but overall they provide the capability to do high performance Vector calculation. Stuff that required multimillion dollar Crays.

Where did I say AltiVec was nothing like SSE? I never mentioned SSE in my post. I said GPGPUs were nothing like AltiVec.

Analog Kid
Jan 27, 2008, 08:51 PM
If Nvidia or AMD(ATI) dissapeared, we would be in for a world of hurt. Name there hasn't been a true third player in the GPU business for a long while. Any other company that makes GPU's nowadays is gunning for the low end sector (the same sector Intel dominates). Very few places can do what Nvidia and AMD(ATI) do best.
Oh I seem to remember names like 3Dfx and S3 getting a fair amount of attention not too long ago... Companies come and go, especially in markets like this. No one would be in a world of hurt if they went out of business because someone else would have had to unseat them. I think we're reaching a time in the market where a disruptive technology could come in out of nowhere and unseat the biggies. That was my lead-in point above: Nvidia knows their existing business model is under pressure and this is how they're looking to adapt.

My point was merely that relying on a single vendor to make your stuff work isn't a good idea-- and that concern will play a role in slowing the adoption of a technology like this. Apple could hide the CUDA interface by rolling it into something like their Accelerate framework, but unless they could get similar performance out of other GPU vendors, then they would be forced to rely on nVidia (Nvidia? I hate company names in all caps...) as their sole supplier of video cards.
SSE is closer to AltiVec than GPU's are. How many AltiVec units are there? I thought there was only 1. In a GPU you have waaaay more than one (otherwise it would be slow at drawing pixels).
I don't understand your reasoning here... When someone makes lasagna with chicken, then someone else makes lasagna with beef, and a third person says "that's kind of the same concept", I don't follow the "turkey is more like chicken" argument...

Yes, SSE is a vector processor too, and in the grand continuum of vector processors it is probably closer to Altivec than a GPU is. I wasn't talking about SSE I was talking about this being an extension of the same concept-- offload vector computations to specialized hardware.

As far as unit counts, I think you can look at AltiVec as 4 units rather than one: it handles 4, 32bit operands at a time. It all depends on where you draw your "unit" boundaries...
Cell is faster than people give it credit for. The problem is having Coders change how they code. Larabee isn't something Intel cooked up because they though Cell architecture was going away.

Yeah, I think that's why we'll see incremental movement towards non-symmetric multiprocessing. Have to ease into it. AltiVec/SSE is kind of a first step in that direction, but eventually the scheduler is going to have to figure out how to hand out threads based on the resources of individual execution units. Or they'll give up on that and let the compiler handle the complexity of it.

That's the first I've seen of Larabee, but you're right-- it seems to be designed a lot like Cell, but with slimmed down x86 cores rather than PPC. For now though, it seems to be targeted primarily as a GPU, not a CPU. It'll be interesting to see if I'm right in guessing where we go from there-- onto the motherboard, then the chipset, then the CPU. Intel seems to have designed this architecture to be able to do that quite easily.

Thanks for pointing me at it-- I've got a new place to pin my hopes for the future.

diamond.g
Jan 28, 2008, 12:52 PM
Oh I seem to remember names like 3Dfx and S3 getting a fair amount of attention not too long ago... Companies come and go, especially in markets like this. No one would be in a world of hurt if they went out of business because someone else would have had to unseat them. I think we're reaching a time in the market where a disruptive technology could come in out of nowhere and unseat the biggies. That was my lead-in point above: Nvidia knows their existing business model is under pressure and this is how they're looking to adapt. Ah, okay. I seem to remember 3Dfx being bought by nvidia in like 2000 or so. S3 is still around, but as far as performance is concerned, they are a non-player. But I can see where you were going.

My point was merely that relying on a single vendor to make your stuff work isn't a good idea-- and that concern will play a role in slowing the adoption of a technology like this. Apple could hide the CUDA interface by rolling it into something like their Accelerate framework, but unless they could get similar performance out of other GPU vendors, then they would be forced to rely on nVidia (Nvidia? I hate company names in all caps...) as their sole supplier of video cards.True, that is always the downside. Think of it this way, the cards ATI and nV have have extensions in OpenGL. There are things that the ATI card can do that the nV can't and vis-versa. So Apple only uses ARB commands otherwise they would have to write workarounds for both ATI and nV. So yeah I understand the reluctance towards adding something like CUDA, Apple would have to hope ATI would adopt it. Of course maybe Apple could get CUDA added to OpenGL (ARB-ized).

I don't understand your reasoning here... When someone makes lasagna with chicken, then someone else makes lasagna with beef, and a third person says "that's kind of the same concept", I don't follow the "turkey is more like chicken" argument...

Yes, SSE is a vector processor too, and in the grand continuum of vector processors it is probably closer to Altivec than a GPU is. I wasn't talking about SSE I was talking about this being an extension of the same concept-- offload vector computations to specialized hardware.

As far as unit counts, I think you can look at AltiVec as 4 units rather than one: it handles 4, 32bit operands at a time. It all depends on where you draw your "unit" boundaries...

Okay, I may have misunderstood where you were going with your comparison. My bad. I agree with what you were saying though.

Yeah, I think that's why we'll see incremental movement towards non-symmetric multiprocessing. Have to ease into it. AltiVec/SSE is kind of a first step in that direction, but eventually the scheduler is going to have to figure out how to hand out threads based on the resources of individual execution units. Or they'll give up on that and let the compiler handle the complexity of it.
Yup. It would behoove coders to take the Cell approach to existing code. Split as much as possible. Make everything as small and lean as possible. It is tedious, but that kind of code would be way more portable than it looks.

TMay
Feb 2, 2008, 08:06 PM
So, I'm looking at a mac pro dual quad, and I want to run MCAD applications on the side. Currently, Apple only has the Quadro 5600 that would be "certified" for these applications, and that still might only be under Boot Camp.

So now that I have that card ($3K!), I can see that FC Studio will get some love, but, what might I expect from Aperture? Is this the card that might bring Aperture the performance that makes it the killer app?

diamond.g
Feb 2, 2008, 09:29 PM
So, I'm looking at a mac pro dual quad, and I want to run MCAD applications on the side. Currently, Apple only has the Quadro 5600 that would be "certified" for these applications, and that still might only be under Boot Camp.

So now that I have that card ($3K!), I can see that FC Studio will get some love, but, what might I expect from Aperture? Is this the card that might bring Aperture the performance that makes it the killer app?

The Quadro is overkill for Aperture.

michaelverdin
Feb 4, 2008, 06:27 AM
Why oh why can't apple cut a deal with nvidia like it did with intel so I don't have to buy a Mac Pro. AMD/ATI for graphics.....pleeeeeese