5-Fold Increase in Video Encoding with OpenCL-like Technology

jctevere · Apr 24, 2009

SirOmega said:
I'm skeptical. The x264 guys looked at CUDA and found it wasn't worth it. The badaboom app required a $600 video card to encode a movie as fast as a Core i7 940. And then the quality output from badaboom wasn't even nearly as good as x264 output, and this test was back in November, and there have been some x264 speed improvements since then. Show me the benchmarks!

Well, while I agree with you. I think the claims that Nvida and NERO made were when they tried to encode video on a lets just say for our purposes a Core Duo or Core 2 Duo machine (or a computer with a popular processor not very good at parallel processing) without CUDA, and then compared the total time when the same exact machine using CUDA and its graphics card to perform the encoding. So while it may be true that OpenCL may not be as good as we make it all out to be, only a fraction of computer owners currently own core i7 processors, which were made to perform much better at parallel processing. However, the majority of computers that use core duo or core 2 duo are not very good at this, so chances are that they would benefit very much off this technology.

But I do have one question. When OpenCL and more specifically, CUDA is used, does it switch all of the encoding operation over to the graphics card and free the processor, or do the processor and graphics card work simultaneously to work through the process? Because if it just switches over all of the task to the graphics card... Well that is pretty much pointless and I agree with SirOmega that if you have a high end or multi-core processor that it is possible you could barely benefit from such a technology, if at all. However, if it utilizes the processor WITH the graphics card to encode... Well then we have ourselves a recipe for success. Imagine having OpenCL/CUDA on a machine with a high-end graphics cards with a processor capable of vast parallel processing. This could DRASTICALLY shorten the amount of time used to encode a video, along with a lot more applications/processes.

Edit: I found the answer to my own question, "NVIDIA CUDA technology dramatically reduces the time it takes to transfer video files to portable devices, while freeing up the CPU to perform other tasks." It appears as though it will be used to free up the processor and not used in part with the processor. While from a practical point of view for desktops and notebooks, it would draw less power not having to use both the GPU and CPU to perform the task and just one or the other. It also would allow the user to continue using their computer as they normally would as the CPU is free to do whatever the user wishes (just not anything too graphically dominant). But from a idealistic point of view, I think that I would rather sacrifice power and/or the ability to be able to use a CPU that is strain-free to have my video encode much faster. Chances are if I was using a notebook that I wouldn't be encoding on the fly and would have access to a power source; and if I were using a desktop I wouldn't mind using extra power.

SydneyDev · Apr 24, 2009

dernhelm said:
If they were doing the same operation on both the Core i7 CPU with a non-CUDA application, and a "lesser" CPU combined with an NVIDIA card and a CUDA enabled application, the resulting video would be the same, all that should differ would be the time it took to get there.

Maybe they can't be exactly the same operation. To use a highly parallel architecture, you have to divide the work in to chunks, e.g. each row of the image.

In that case the program processing chunk X would not have available information about the end result of processing chunk X - 1, since it is happening at the same time. In a serial architecture, that information might be used to make a better picture.

NightStorm · Apr 24, 2009

DELLsFan said:
Hmmm ... Better than Handbrake?

This type of technology would likely find its way into Handbrake if/when the x264 dev team implements it into their encoder. As stated earlier, their initial efforts into using CUDA did not return very promising results. At this point, we have no idea what settings nVidia/Nero used in their encode... this could be much like the turbo.264 vs. Handbrake arguement -- marginally faster encode times at the expense of quality.

Enuratique · Apr 24, 2009

jctevere said:
Well, while I agree with you. I think the claims that Nvida and NERO made were when they tried to encode video on a lets just say for our purposes a core 2 machine (or a computer with a popular processor not very good at parallel processing) without CUDA, and then compared the total time when the same exact machine using CUDA and its graphics card to perform the encoding. So while it may be true that OpenCL may not be as good as we make it all out to be, only a fraction of computer owners currently own core i7 processors, which were made to perform much better at parallel processing. However, the majority of computers that use core 2 or core 2 duo are not very good at this, so chances are that they would benefit very much off this technology.

But I do have one question. When OpenCL and more specifically, CUDA is used, does it switch all of the encoding operation over to the graphics card and free the processor, or do the processor and graphics card work simultaneously to work through the process? Because if it just switches over all of the task to the graphics card... Well that is pretty much pointless and I agree with SirOmega that if you have a high end or multi-core processor that it is possible you could barely benefit from such a technology, if at all. However, if it utilizes the processor WITH the graphics card to encode... Well then we have ourselves a recipe for success. Imagine having OpenCL/CUDA on a machine with a high-end graphics cards with a processor capable of vast parallel processing. This could DRASTICALLY shorten the amount of time used to encode a video, along with a lot more applications/processes.

In short, the bulk of the work is handed off to the video card. This is a gross oversimplification but video encoding is almost entirely matrix manipulations (linear algebra). GPU hardware is optimized for those types of operations seeing as how you can rotate a 3D object by performing one matrix multiplication. The CPU during this process just feeds the GPU with data - so the faster the CPU the more data it can supply quicker but it's not really the bottleneck. So yes, having a dual core system doesn't really benefit you in this type of operation. I suppose having a dual core would mean that one core could be supplying the GPU with data to crunch while the other handles your other processes (like surfing the net, etc) so you can do meaningful work while the encoding is going on.

jctevere · Apr 24, 2009

Enuratique said:
In short, the bulk of the work is handed off to the video card. This is a gross oversimplification but video encoding is almost entirely matrix manipulations (linear algebra). GPU hardware is optimized for those types of operations seeing as how you can rotate a 3D object by performing one matrix multiplication. The CPU during this process just feeds the GPU with data - so the faster the CPU the more data it can supply quicker but it's not really the bottleneck. So yes, having a dual core system doesn't really benefit you in this type of operation. I suppose having a dual core would mean that one core could be supplying the GPU with data to crunch while the other handles your other processes (like surfing the net, etc) so you can do meaningful work while the encoding is going on.

Oh, ok. I didn't know how the CPU handled the data during the encoding process. I thought that maybe it handled all of the data itself and didn't realize that is simply feeds the data (or a majority) to the GPU to crunch. So then, if this is true then why is it that core i7 can perform this task much faster than an older CPU or even rather a GPU that does this by itself. It would make sense that a software that cuts out the middle man from operation (CPU) and goes straight to the genius (GPU) would be faster and more efficient. But this does not seem like the case...

GregA · Apr 24, 2009

wrldwzrd89 said:
That's what makes OpenCL great. If applied to a task it's particularly well-suited to, such as video/audio encoding, it speeds the task up enormously. I can only dream about what this will enable in iTunes... imagine transcoding an iTunes library in ALAC to AAC, that has about 8000 songs in it. Today, this takes about 5 hours. With OpenCL, only 30 minutes will be required.

Forget iTunes... I just asked iMovie to analyze a 2.5hr wedding video for image stabilisation on my Core2Duo.... it estimates it'll take 48hrs. And what about the performance hit we're getting with "faces" in iPhoto... it'd be nice to have the GPU speed that up

Of course... we do want it everywhere, right?

inkhead · Apr 24, 2009

Try encoding a h.264 video under snow leopard with an unibody machine. Takes probably 10 minutes for something that normally took 2 1/2 hours.

Seriously.

HyperZboy · Apr 24, 2009

Ok, I had to be the first to say...

OpenCL could easily be implemented for the last generation POWERPC Macs.

But Apple says NOPE!

The complete abandonment of expensive PowerPC Macs by Apple and other developers like Google along with completely out of line pricing during a bad economy, not to mention, the Microsoft ANTI-MAC ads are a perfect storm of bad Mac news this year and all are going to come back to haunt Apple as bad decisions.

Yes, abandon people who bought expensive iMac G5s and $3000+ Powermac G5s in this bad economy. Think they'll give back the love?

I think not.

But, as we know, Apple now makes most of its money from iPhones and iPODS and barely cares about Mac sales, so maybe they have the perfect plan after all?

gnasher729 · Apr 24, 2009

wrldwzrd89 said:
That's what makes OpenCL great. If applied to a task it's particularly well-suited to, such as video/audio encoding, it speeds the task up enormously. I can only dream about what this will enable in iTunes... imagine transcoding an iTunes library in ALAC to AAC, that has about 8000 songs in it. Today, this takes about 5 hours. With OpenCL, only 30 minutes will be required.

That is more than optimistic.

With video encoding, the most expensive operation is motion prediction. For motion prediction, you need to take any little part of the image and find a matching part somewhere in the previous image. It is very little code, but very very time consuming and very well suited for doing it in graphics hardware.

I don't think there is anything similar in the area of sound encoding. And anyway, 8000 songs took you many many days to import from 800 CDs, so who cares about 5 hours to encode them? That's five hours _once_ for a large library.

ChrisA · Apr 24, 2009

Enuratique said:
In short, the bulk of the work is handed off to the video card. This is a gross oversimplification but video encoding is almost entirely matrix manipulations (linear algebra). GPU hardware is optimized for those types of operations seeing as how you can rotate a 3D object by performing one matrix multiplication. The CPU during this process just feeds the GPU with data - so the faster the CPU the more data it can supply quicker but it's not really the bottleneck. So yes, having a dual core system doesn't really benefit you in this type of operation. I suppose having a dual core would mean that one core could be supplying the GPU with data to crunch while the other handles your other processes (like surfing the net, etc) so you can do meaningful work while the encoding is going on.

There is a little bit more to it. What really happens depends on exactly which GPU and CPU you have. When your software calls a function in OpenCL the OpenCL software does it "the fastest way" so if you happen to have a slow graphic card and a very fast CPU some functions may be faster on the CPU. Apple Core Image does this too. It looks at your hardware. Under the hood there are implementations for using various methods and the software at run time can decide which to use. This way if you upgrade the hardware the software will adapt.

twoodcc · Apr 24, 2009

sounds great. so someone running windows on a mac that has a nvidia card can run this software, correct?

RichardI · Apr 24, 2009

Does that mean that Apple might be thinking about using more powerful video cards in their desktop and mobile products? Or just trying to take advantage of what is already there. Because unless you own a Mac Pro with a honkin' video card there isn't much to take advantage of.

Rich

Mackilroy · Apr 24, 2009

HyperZboy said:
Yes, abandon people who bought expensive iMac G5s and $3000+ Powermac G5s in this bad economy. Think they'll give back the love?

Few people have bought PPC Macs from Apple for years. And if they did buy them after the first Mac Pro was announced, then they either had a very good reason to do so, or are fools.

I would love to see HandBrake be able to take advantage of my GPU. I encode an awful lot of video, and doing it in hours instead of days would be great.

jctevere · Apr 24, 2009

MikeDTyke said:
Because for this particular scenario they need a specific windows driver 32 or 64bit, an Nvidia card that supports the CUDA language. This thing was built with Nvidia`s help to show off the performance they could get out of a GPU. It`s highly customized an inflexible solution.

OpenCL takes it a couple of steps further, It abstracts away from the specifics of the hardware, you don`t need to learn what function each card may or may not support or the underlying language supported. It`s an open standard so if and when people get around to it, there`ll be an implementation of OpenCL on linux and possibly even windows.

In other words it`s like moving from assembly language to Java.

Does this mean that when a hackintosh version of Snow Leopard comes out that most all graphics cards will support OpenGL?

VoR · Apr 24, 2009

RichardI said:
Does that mean that Apple might be thinking about using more powerful video cards in their desktop and mobile products? Or just trying to take advantage of what is already there. Because unless you own a Mac Pro with a honkin' video card there isn't much to take advantage of.

Rich

GPUs are pretty fast at specific jobs, and it's just another chip inside your computer with some potential compute power that should be taken advantage of.
Seems to be a lot of optimism for opencl, and it also sounds like most of the support should be aimed away from apple and towards ffmpeg devs

trip1ex · Apr 24, 2009

HD on the iPod? Why?

jctevere · Apr 24, 2009

twoodcc said:
sounds great. so someone running windows on a mac that has a nvidia card can run this software, correct?

Pretty much. Of course, this is assuming that the graphics card supports CUDA and the OpenCL technology. My guess is that most definitely the unibody macbooks' graphics cards will support it, if not earlier graphics cards. Otherwise it would be a kick in the face to almost everyone...

Bubba Satori · Apr 24, 2009

This sounds great. Hopefully Apple will offer more than two or three video cards in it's computers.

jctevere · Apr 24, 2009

trip1ex said:
HD on the iPod? Why?

I agree with you, I never really understood it either. However, I did noticed that when I view HD videos on my iPhone vs SD videos, some videos took up the whole screen where as others kept the two bars on the side or were smaller then the actual screen. I am not sure if this is because it was HD or SD, but maybe how big the video resolution was. (which does correlate with the level of definition). But, another big gain is when you use the video output on the iPod to view videos on an external monitor. Just recently I began to use this feature on my iPod Video to watch videos in my car at work. (I know, I have a very easy and lax job). But I prefer HD content rather than the SD content on my car's monitor for video and audio reasons (I love BOSE studio on wheels).

ChrisA · Apr 24, 2009

twoodcc said:
sounds great. so someone running windows on a mac that has a nvidia card can run this software, correct?

"This Software" was a hand crafted demo that does just one thing. Likely it will never be released to the public. It was a lab experiment.

But something like it might be released. Apple will but some support for this ind of stuff in Snow Leopard which comes out late this year (maybe) and then developers will need to use it. It will take time for this to trickle down.

occamsrazor · Apr 24, 2009

What I (and many others) are waiting to see is whether OpenCL/Snow Leopard will make GPU-accelerated video-decoding with the popular media-centre applications (XBMC, Plex, Boxee, etc) possible. They already have it under Linux with VD-PAU but currently with OS X the GPU does nothing at all for this - all H.264 decoding for example is done by the CPU. For 1080P material the difference in CPU use when GPU-acceleration is present is VERY dramatic.

P-Worm · Apr 24, 2009

HyperZboy said:
OpenCL could easily be implemented for the last generation POWERPC Macs.

Easily? Have you ever tried programming the underlying architecture of an operating system. By your naive post, I would think not. The truth of the matter is that getting this stuff to work is extremely difficult and try to have it work on two different chips would add to the work enormously.

P-Worm

foobarbaz · Apr 24, 2009

I hate the "up to 5 times" marketing speech... Is that a 400$ computer with two 2000$ graphics cards?

What's interesting is how much cheaper this makes high performance. Everything so "spectacular" about this can also achieved by simply buying more CPUs. So the price is really what makes that technology.

Since we all already have paid for graphics cards, obviously, using them for encoding will give us a free increase in bang per buck. But how much is that free increase, and how much do you have to pay for the 5-fold increase? That's the only benchmark that means something to me...

Lesser Evets · Apr 24, 2009

YOWZA!

...that's all I can say about this.

snowmoon · Apr 24, 2009

dernhelm said:
It doesn't make sense that the quality would be worse. It isn't as if the graphics card would multiply bits differently than a CPU. If they were doing the same operation on both the Core i7 CPU with a non-CUDA application, and a "lesser" CPU combined with an NVIDIA card and a CUDA enabled application, the resulting video would be the same, all that should differ would be the time it took to get there.

Makes perfect sense. CUDA code is fundamentally different in how you program ( if you want to see a speedup ). x.264 can take advantage of speedups and quality enhancements that are prohibitively expensive on the GPU with little more than a few cycle penalty. Each camp is advancing the tech and I'm sure there are benefits to each, but for now the CPU is still king for quality and speed.

The other benefit is CPU's are only getting faster, but the penalties for CUDA style programming are not going away.

5-Fold Increase in Video Encoding with OpenCL-like Technology

macrumors 6502

macrumors 6502

macrumors 68000

macrumors 6502

macrumors 6502

macrumors 65816

macrumors regular

macrumors 65816

Suspended

macrumors G5

macrumors P6

macrumors 6502a

macrumors 601

macrumors 6502

macrumors 6502a

macrumors 68040

macrumors 6502

Suspended

macrumors 6502

macrumors G5

macrumors 6502

macrumors 68020

macrumors 65816

macrumors 68040

macrumors 6502a

Our Staff