Apple to Offer Cluster Rendering?

dizastor · May 12, 2004

SWEET!

Must... save... for... multiple... g5s....

seriously though, this ROCKS! I really hope that this gives a signigicant speed increase... I would love to not spend half of my time waiting for renders while editing my movies.

spankalee · May 12, 2004

toontra said:
Would it be possible to include pro-audio apps (eg Logic) in this? The idea of having 2+ comps clustered whilst mixing would be great - some of the new 3rd-party virtual instruments and convoluted reverbs are mighty CPU-hungry!

Audio is a different story because there it's mostly real-time, there isn't much rendering, so latency becomes a huge issue.

There are already dedicated hardware products that offload audio processing like the TC PowerCore and UAD-1. Again, latency is the issue here because effects outside the Mac introduce latency. Correcting this becomes a pain.

As for rendering, Logic introduced a feature called "Freeze Tracks" where the effects are rendered to a temporary file so that you don't have to use processing power to play the track with effects. Freeze happens pretty quick, so I don't think people are clamoring for distributed freezing. What audio types need is a 3.0Ghz G5+ with a 1.5GHz bus and 8Gb of RAM.

appleface · May 12, 2004

portable cluster

why spend the money on two pb's when you're only going to use the brains and not the keyboard, optical drive, screen, etc. in the second one? that would be a waste--a waste of space, money, hardware. when is the portable cluster node composed of two dual G4s coming out? that's 4 x 1.5 ghz. msrp $4500.

sethypoo · May 12, 2004

Neat! Uber-fast rendering.....maybe someday it'll come to iMovie?

oingoboingo · May 12, 2004

thatwendigo said:
Hey, I can't help it if I was right about both the Centrino on desktop (which people dismissed) and xSan meaning we'd see even more clustering apps (which people didn't respond to). Now, to see if I can go on with my iMac predictions...

The appearance of the XServes (especially the cluster-node version), Xgrid, and things like distributed builds in XCode might have also been obvious giveaways that Apple was heading in this direction. Centrino on the desktop has been discussed on PC forums almost since the day it was announced for notebooks. Congratulations on making some good picks, but these events weren't really 'bolts from the blue'. Also (and I'm not being sarcastic here), if you want people to read and remember the predictions and opinions you're writing, maybe you should set up your own web site to host them. Posting in discussion forums is a bit like sending in a doctoral thesis to be published in the classifieds section of the local newspaper. No-one is going to notice it amongst the 10,000 other posts, no matter how brilliant (or full of crap) it may be.

Now for a *real* challenge, try and predict what Steve will be wearing for the keynote at the WWDC.

PaisanoMan · May 12, 2004

When the article says this is coming to pro video applications, I hope they really mean Final Cut Pro. Shake and Compressor both already utilize distributed rendering, and I'm assuming that Motion does as well (but haven't used it).

If I could edit on my PowerBook and then render hi-def output with a small Linux cluster, I'll be in wishful-thinking heaven.

qubex · May 12, 2004

The limitations of XGrid et al.

XGrid as it currently stands is a truly marvelous technology preview product. One can reasonably assume that any distributed computing technology made by Apple will somehow find its roots in the XGrid paradigm. However, XGrid suffers from a serious limitation - as do all distributed computing systems I have seen to date. It is not a flaw easily dismissed, or acknowledged en passant only to brush it off later being somehow solved by Apple's brilliant engineering team. The problem is data, and the bandwidth of the interconnecting network infrastructure.

Those of you who have tried the XGrid technology preview, or more generally are involved in distributed computing projects (such as the erstwhile RSA keysearch, SETI@Home, Folding@Home, etc.) have probably noticed that all of the tasks approached are situations where a relatively small amount of data requires massive amounts of calculations to be performed on it. Thus for a compartively small "wait" while transferring data, one can distribute small packets of data to independent computers for processing in parallel. Furthermore, parallelising tasks over "slow networks" is only feasible if each packet of data can be processed independently from all others, because otherwise one will incur very serious delays when transacting over the network. What do I mean by "slow networks"? Sadly, anything slower than Infiniband is "slow" for the purposes of high-performance parallel processing. Thus even Gigabit Ethernet and FibreChannel are "slow" for the purposes of massively parallel processing.

What is the point of "farming out" complex video transitions if each computer must wait for the previous one to finish and "hand off" the data? What is the point of bothering to transmit a couple hundred megabytes of HD video begin and end frames for a transition when you can probably compute the transition more quickly on your own box? Both entail waiting, but the latter could potentially entail less waiting than the former, and certainly entails less infrastructure. Anybody who doubts the difficulties of parallelisation need look no further than that supposedly optimised resource hog and bullwark of the Apple design bureau: Photoshop. How many filters are dual-processor aware? More to the point: how many are not dual-processor aware? And that is within the confines of a single machine, with basically zero latency issues. I rest my case.

I do not wish to rain on anybody's parade, but I do not see this particular rumour bring much import to most professional Mac users' lives. Certainly not for audio, where latency is a serious issue. Certianly not for professional video editors using G5s on run-of-the-mill half-duplex 100baseT networks. Neither do I expect this technology to percolate down to consumer-level applications: though you may percieve Apple to be benevolent, remember that at heart Apple - nay, APPL - is a corporation seeking a profit. They have clearly identified cluster computting as being a significant target - witness the sale of the G5 XServe "processor blades" designed explicitly for distributed computing, and the recent introduction of the XSan network storage solution.

XSan and XGrid, running on RAID XServes and G5 processor-blade XServes respectively, clearly complement each other and form the two prongs of a concerted attack. This much is certain and evident. By comparison, the Big Mac massively parallel cluster was only the beginning: Apple now truly has all the ingredients necessary to become a serious player in the high-performance/high-reliability computing market. Enticing professional video editors is only one aspect of this policy - and, I expect, not a cornerstone. At the rate technology increases performance (yes, even at Apple's slow release rates), the time taken for a given render falls by half every year (if the increase from 2GHz to 3GHz "by summer" is to be believed), more-or-less in-keeping with Moore's Law. Somebody remarked that now it will be possible to have tomorrow's mac's rendering speed today. It also means that come tomorrow, you won't need that grid anymore. Apple wants you to buy PowerMacs, and it wants to you to keep on buying them. Longevity of your investment means lost revenue to them. Simple as that.

I noticed somebody got excited about the prospects of this technology somehow making its way into iMovie. This is almost certainly not the case: iMovie will not support distributed rendering. Even in its fourth incarnation, iMovie is still an inefficient Carbon app that does not even include support for multiple threads running concurrently on dual processor machines. It could not possibly be "upgraded" to run in XGrid-distributed style without a major rework. This is not corporate oversight on Apple's part: it is a sound business strategy, since their analysts are very careful to avoid undercutting their own offerings. Final Cut Pro for the professional market, Final Cut Express for the prosumer market, and iMovie for the lowly consumer. Same goes for all the other Apple software offerings. They may share technology in some select places, but there is clearly an active effort to maintain the highly profitable market segmentation currently extant.

So, overall, an interesting development, but hardly one that "Changes Everything".

xStep · May 13, 2004

Earendil said:
This is going to be awesome! Already my 1.25PB has to wait through iMovie rendering. Yet, within this house my Dad has a 867PB, and my mom has an 867G4 Tower (which she uses for email! ). my little siblings also have a little rev b iMac running at 233mhz, I culd plug that into the rendering network just to cheer the other computers on

G4 1.25gh
G4 867
G4 867
-------------
G4 2.9ghz

You forgot the rev B iMac.

That will give you 3.2Ghz of horsepower. You can tell your friends your little brother has an 'old' 3.2GHz iMac.

This technology depends on being able to break up the coding. Currently I'd think gaming is doubtful due to the realtime response demanded by the user. Sure you can break up the rendering of the frames, but they'd have to be returned extremely quickly for you not to feel the lag.

qubex · May 13, 2004

Videogame frames are typically rendered by the GPU, so videogame performance would certainly not be affected by this (potential) technology.

The inherent slowness of CPUs when compared to GPUs for graphical calculations, coupled with the slow networks, make the proposition simply laughable.

Running physics and AI engines in some kind of clustering environment may be possible, but would you really want to sacrifice ping times for free cycles?

whooleytoo · May 13, 2004

qubex said:
What is the point of "farming out" complex video transitions if each computer must wait for the previous one to finish and "hand off" the data?

Surely, that depends on the effect in question. With some transitions, it might equally be possible to use a (say) four node cluster by dividing the screen into quarters, each node rendering its portion independently of the others. Especially for lengthy transitions, this would greatly benefit from clustering.

jeffbistrong · May 13, 2004

iGAV said:
It's about time... Apple really needs this for FCP, as it's totally nailed by Media 100's 844/X, the lack of Qmaster is holding FCP back purely because of the lack of rendering performance.

What is Qmaster?

X_Entity · May 13, 2004

A great thing if they support rendering on other architectures. In any given building there are likely to be more x86 boxen than macs. Linux based render farms on x86 boxes are far cheaper to implement and have proven track record on a number of CG animated films.

qubex · May 13, 2004

whooleytoo said:
Surely, that depends on the effect in question. With some transitions, it might equally be possible to use a (say) four node cluster by dividing the screen into quarters, each node rendering its portion independently of the others. Especially for lengthy transitions, this would greatly benefit from clustering.

You are correct - subdividing the frame into sections is an obvious way of approaching the problem. This will work, for example, in a fade-out. However, consider something like a "diffused gaussian blur-out": there is an "information leakage" at the interface between the sections as the pixel's values are averaged with those of its neighbours, resulting in the need to obtain information from other computers. This in turn requires network transactions, incurring latencies. You will find that this is by far the general case. The same goes for the oft-overlooked audio component of video edits: the rendering of echos etc. will also require internode communication if the audio data is linearly subdivided.

In mathematical terms, internode communication (and resulting slowdowns) is the general case. It is only a highly specific subset of circumstances that do not require such communication.

X_Entity said:
A great thing if they support rendering on other architectures. In any given building there are likely to be more x86 boxen than macs. Linux based render farms on x86 boxes are far cheaper to implement and have proven track record on a number of CG animated films

You can't be serious. How would Apple profit from such a move? They are, after all, a hardware company that wishes to sell you hardware. Allowing you increase the speed of their precious software without having to purchase their hardware makes no commercial sense.

raven13mb · May 13, 2004

How exactly would something like this work? I do wedding video's in FCPHD and I have two G5 1.8's in front of me. The one is a refurb I just picked up (which has a loud fan--it rev's up way more than my other 1.8...which really is whisper quiet all the time-- any suggestions on that?). Feedback on how the cluster would work would be great!

qubex · May 13, 2004

raven13mb said:
How exactly would something like this work? I do wedding video's in FCPHD and I have two G5 1.8's in front of me. The one is a refurb I just picked up (which has a loud fan--it rev's up way more than my other 1.8...which really is whisper quiet all the time-- any suggestions on that?). Feedback on how the cluster would work would be great!

Basically, it would work something like this:

(1) You initiate a render on your "main" G5 machine. Your FCP-HD session looks at what it has been told to render, say a cross-fade dissolve, and figures it is amenable to parallelisation. It sends out a network query (using Randezvous) for machines "willing and capable" to help. Your "other" G5 responds.
(2) Since it recieved one reply, it divides the fames in two, keeps half, and sends half of the beginning and ending frames to your other machine. They then both work on the video concurrently.
(3) At some point, your computer finishes rendering, and stores away the result. Either before or after that, it recieves the result from the other machine. (Asynchronicity problem.)
(4) When they have both finished, your main machine recomposes the two half-frame sequences and presents you with the result.

Of course this is a gross oversimplification. In particular, since we're talking of a cross-fade, it wouldn't be a single start- and end-frame, but rather a video clip. But you get the general idea. "Divide and Conquer."

"A hundred ants can steal your picnic but if you tie a hundred bees to a brick it won't fly."

whooleytoo · May 13, 2004

qubex said:
You are correct - subdividing the frame into sections is an obvious way of approaching the problem. This will work, for example, in a fade-out. However, consider something like a "diffused gaussian blur-out": there is an "information leakage" at the interface between the sections as the pixel's values are averaged with those of its neighbours, resulting in the need to obtain information from other computers. This in turn requires network transactions, incurring latencies.

Again, I think a clever algorithm could still work around most (but obviously not all of this latency).

Each node could split its dataset into two. First, process the pixels dependent on other screen segments/nodes; and send to those other nodes. While awaiting a response, process the pixels which independent of other nodes. When data is received back from the other nodes, merge the two parts by performing the blur along the borders of the two parts of the dataset.

I believe for the example mentioned, each node would only need a limited amount of data (a thin border of pixels) from the other nodes, so bandwidth wouldn't be a problem - though latency still would. I guess the more complex the transition, the less of an issue the latency overhead would be.

qubex · May 13, 2004

Mine was only a rapid example. Of course there do exist algorithms to correct the problem, in the specific instance. Say information "bleeds" by 2 pixels/frame, and the transition is 25 frames (1 sec.): each node would get an extra "frame" of 50 pixels around its allocated work unit so as to provide the necessary data, removing the need for network queries.

I was only trying to illustrate the broder problems. But you are perfectly right that in this particular instance, a shortcut exists.

xStep · May 14, 2004

qubex said:
Videogame frames are typically rendered by the GPU, so videogame performance would certainly not be affected by this (potential) technology.

The inherent slowness of CPUs when compared to GPUs for graphical calculations, coupled with the slow networks, make the proposition simply laughable.

Running physics and AI engines in some kind of clustering environment may be possible, but would you really want to sacrifice ping times for free cycles?

Check this story out. It was a matter of time... http://arstechnica.com/news/posts/1084398037.html

qubex · May 15, 2004

Impressive and scary, I know. But while liking two graphics accelerators on a single high-speed symmetric bus (notice it uses PCI-Express and not AGP8X, which is asymmetric), doing the same over a relatively slow network, even when running at 1Gbit/sec, is not feasible.

h'biki · May 16, 2004

iGAV said:
It's about time... Apple really needs this for FCP, as it's totally nailed by Media 100's 844/X, the lack of Qmaster is holding FCP back purely because of the lack of rendering performance.

What I'd like to see is Apple to offer some kind of inbuilt hardware acceleration or an Apple designed Magma style expansion chassis, capable of running multiple processor acceleration cards without the need to buy several PowerMac G5's or Xserves for dedicated editors and motion graphics people.

Then we'd see FCP becomming more popular, and certainly challenging the high end Avid and Media 100 systems.

Its pretty damn popular already and helped killed Media 100.

I don't think a hardware accelerator is going to be cost effective for Apple. Expensive R&D, very limited market, catered to by a number of third parties. A scalable render architecture based purely on your mac boxes will appeal a lot more to post houses I think -- because the benefit of a clustered render farm extends to multiple editors and not just whoever is 'senior' enough to get the accelerated machine.

Lotring · May 28, 2004

network rendering is the ****

Mr. Anderson said:
I'm on Lightwave and there is a distributed rendering system that comes with it - but its a pain to set up.

How does Qmaster work?

D

Lightwave has a program called "Screamernet" built into it and I set it up to work with 20 computers in a lab. there are tutorials everywhere online, give it a google search sometime. the render times are drastically cut especially when you render scenes with 1000 frames or more in which there are particle emitters and lots of surfaces. goodluck with getting it to work.

Apple to Offer Cluster Rendering?

macrumors 6502a

macrumors member

macrumors regular

macrumors 68000

macrumors 6502a

macrumors member

macrumors 6502

macrumors 68020

macrumors 6502

macrumors 604

macrumors member

macrumors member

macrumors 6502

macrumors newbie

macrumors 6502

macrumors 604

macrumors 6502

macrumors 68020

macrumors 6502

macrumors regular

macrumors newbie

Our Staff