janstett said:
What you're saying isn't entirely true and may give some people the wrong idea.
First, a multicore system is helpful when running multiple CPU-intensive single-threaded applications on a proper multitasking operating system. For example, right now I'm ripping CDs on iTunes. One processor gets used a lot and the other three are idle. I could be using this CPU power for another app.
Yes, all true, but I was trying to answer the question in a direct manner without creating a post even more huge than what I wrote. While it's true that the task scheduler can load balance various processes or apps across multiple CPUs, it sitll tries to jam most everything onto the primary CPU first, then cascade or overflow to the next, then the next and so on... Multiple application distribution also doesn't address the question of the other poster... Who really cares if I can have Mail running on CPU 2 while ripping a CD via CPU 1 while I'm running DVDSP on CPU 0 and it's pegged to the max trying to encode an MPEG2 DVD object... The fact is, the software isn't multi-threaded so it's not taking advantage of the system resources at hand.
I'm going to assume there are utilities out there to assign CPU affinity to processes or applications under OSX. Under Windows, I'm sure you're aware that this can be done from the task manager. I wrote a small utility to do this automatically when loading an application - took me all of 10 minutes. I suppose I could write one for the Mac as well, but my 3D rendering software does a pretty good job of using most of the available CPU power in my system anyway.
The reality is that to take advantage of multiple cores, you had to take advantage of threads. Now, I was doing this in my programs with OS/2 back in 1992. I've been writing multithreaded apps my entire career. But writing a threaded application requires thought and work, so naturally many programmers are lazy and avoid threads. Plus it is harder to debug and synchronize a multithreaded application. Windows and Linux people have been doing this since the stone age, and Windows/Linux have had usable multiprocessor systems for more than a decade (it didn't start with Hyperthreading).
Was I really that unclear in my post? I never said that it started with Hyperthreading... I said that Hyperthreading was Intel's method of getting a jump start into the dual-core (as in two CPUs on a single die) paradigm. Of course there's been plenty of multi-processor PCs over the years. I've owned dozens of them, also starting with dual and quad 486 machines. I have done some multithreaded software development and have a fair understanding of how it works... I know that most multithreaded software, or at least mainstream software, leaves a lot to be desired. I use Lightwave 3D almost every day ant it's been multithreaded since about '89 back when support was added for multiprocessor '040 Amiga boxes and NewTek's own Toaster Screamer, which was a quad 68040 render station. But as I was saying, Lightwave's renderer has been designed to support a user selectable number of threads from 1 to 16, but the thread scheduler is almost completely brain-dead and it does a piss-poor job of managing threads efficeintly. On top of that most third-party plug-ins as well as Lightwave's own interface to post-process plugins is still single-threaded. In many situations, faster results can be achieved by running individual render nodes as single-threaded with each node instance assigned affinity for a single CPU/core. ...Hence why I wrote that little utility I mentioned. But this isn't atypical, it's more normal that one would imagine if they're not familiar with the situation of multithreaded applications.
Trying to justify multiple CPUs or cores by saying, "you can use it to run more applications simultaneously..." is somewhat short-sighted. While essentially true, it doesn't address the problem of using all that power for one application where it's needed. Some apps are multithreaded - Toast uses two or four threads concurrently, I can't remember if it's 2 or 4. Even if it's 4 threads, what does that do for someone looking to use an 8-core system? Sure, they can run two instances of Toast and encode two separate files simultaneously... But what if they just need to do one file at a time at regular intervals? Half the CPUs in the system will sit mostly idle.
So it goes back to getting developers to write threaded applications.
Precisely.
Now that we're getting to 4 and 8 core systems, it also presents a problem.
How so? It requires a different mentality and different design model, for sure... But as we start ramping up to 8 or even 16 and eventually 32 or more cores in a system, it's hardly a new problem. New to most developers, but some have done it before with some very high-level applications. Think SGI/Cray, 16 to 32 R14K CPUs per Origin server 4 CPUs with dedicated RAM per NUMA card, 8 Origins connected via SGI's NUMA-Link for high-speed clustering... That's 256 CPUs in a clustered, multiprocessor implementation. That exact config was purchased by the US Army and NCAR (National Center for Atmosphereic Research) in Boulder, CO... Got to see that one in action a couple years ago.
The classic reason to create a thread is to prevent the GUI from locking up while processing. Let's say I write a GUI program that has a calculation that takes 20 seconds. If I do it the lazy way, the GUI will lock up for 20 seconds because it can't process window messages during that time. If I write a thread, the calculation can take place there and leave the GUI thread able to process messages and keep the application alive, and then signal the other thread when it's done.
OK, but if you have a process or calculation that takes 20 seconds on any somewhat modern CPU, then there's obviously steps that can be broken up and probably operations that can be run concurrently or even out of order. But this is where the discussion starts to go beyond threads into software design theory and how to manage processes, subprocesses, threads and fibers.
But now with more than 4 or 8 cores, the problem is how do you break up the work? 9 women can't have a baby in a month. So if your process is still serialized, you still have to wait with 1 processor doing all the work and the others sitting idle. For example, if you encode a video, it is a very serialized process.
Now, much of the programming experience I have has to do with video codecs and 3D rendering processes. While on the surface video ecoding is very serialized, most common codecs like MPEG2, MPEG4 or Windows Media/VC1 can easily be broken down into subprocesses and multiple threads. Look at all the color calculations, sub-pixel filtering, averaging and quantizations to be done... Even if you take a brute-force serialized approach, it would still be easy to implement multi-pass encoding in a threaded model. Serialized workflows can be multithreaded in a video codec beyond multiple passes.. Plenty of opportunity to serialize operations on a chunk of data from one thread or CPU to another when we're involving assembly of color data, sub pixel filtering, then quantization, etc.. I would dig deeper than the macroblock level as macroblock isolation would seem more problematic... This goes far beyond the scope of this post, but I can think of various possible ways this could work and I know of applications with multi-threaded video encoders that work across multiple CPUs on a single video clip. So there's obviously a way. As for 3D rendering, I've written my own renderers - both real-time and static raytracers/GI renderers. Raytracing and Global Illumination modles are very easy to construct with a multithreaded development model. In fact, they make far more sense to build this way... That combined with how rendering takes a lot of time, is why 3D software was some of the first mainstream software to go multithreaded on an industry-wide basis. ...That and database software, another no-brainer.
I hear some work has been done to simultaneously encode macroblocks in parallel, but getting 8 processors to chew on a single video is an interesting problem.
It's been done. I'm not sure where you heard that information, but you should have heard that somewhere back in '97 when Sony released their MPEG/DVD Tool library and was promoting its use on quad Pentium Pro workstations. Based on how H.264 works, I can easily see how 16 or 32 cores or threads within the encoding process could be a huge benefit. I'm not as familiar with VC1, but I would imagine there's similar ways to apply a multithreaded design.
Look at the high-end DVD and Blu-Ray authoring tools like Scenarist. Its encoding system is fully multithreaded and not only can it take advantage of multiple render nodes, but multiple CPUs per node. Each node instance scales to eight threads for their encoding engine and this works with H.264 and VC1 codecs in addition to MPEG2. Currently, their software is optimized for AMD64 CPUs and they recommend node systems with 2 x Dual Core Opteron and primary workstations with either 2 or 4 x Dual Core Opteron. They've sent out a tech note to their users advising that Scenarist is ready for systems with 2 x Quad Core Opterons when AMD begins shipping these chips -- which should be December/January.