macOS The Basic Idea Behind Multi-Core Coding Is ... ?

astrostu · Jul 26, 2007

I'm writing a lot of Java code for my work-research. I'm using a machine that has 4 cores and a lot of the code is looping through lists of data points, comparing them to a number, and depending upon the result, copying the value to another array or going on to the next one (grossly simplified, but you get the idea).

My question is, to make this take advantage of four processors, is the basic idea such that I create 4 threads, have one thread loop through the first 25% of the values, second thread the next 25%, and so on, and each one will automatically take up a different processor? (And then I combine the lists afterwards.)

Or is it more complicated?

iSee · Jul 26, 2007

Yes, that's the right idea. You don't really get a guarantee that each thread will run on a separate processor core, but you can't take advantage of the extra cores without threads (or processes) to run on them.

Check one thing to make sure this will really help, before you go through the trouble though (maybe you've already determined this):

If, when you run the same computations using your single threaded solution, is the CPU usage near 100%? (Activity Monitor in the Utilities subfolder of the Applications folder gives you an easy way to monitor this, if you didn't know).

If so, the multiple threads will probably help you. As you describe it, the threads can really churn away independently until the end, so they should help a lot.

savar · Jul 26, 2007

I'm no expert, but I'll throw some wood on the fire:

Multicore is at first glance no different than multi-thread or multi-process. The OS scheduler will distribute threads and processes across the multiple cores as efficiently as it knows how to.*

The most significant difference that I can think of is that each core has its own cache. So whereas two threads running on one core can share the same data in the cache, if the threads are on separate cores then each must access the memory separately.

So to amend my first statement, you want to create threads or processes which operate on logically separable chunks of data. So yeah, chunking the data into fourths will probably use the cores pretty efficiently.

What's more, if the two threads are writing to the same part of memory, then each core is going to be constantly invalidating the other cores cache.

So I think that what seems like a simple problem actually becomes fairly complex, even if you ignore the inherent problems with multithreading. Luckily there will be multicore APIs coming out soon...these should abstract out most of the complexity of writing efficiently parallelizable code.
.
*Well...actually it doesn't. The OS X scheduler is actually still very naive. But it's a good bet that it will be much improved in Leopard.

lazydog · Jul 27, 2007

Hi
I'm not an expert either but I've got a suggestion. Imagine that the first 3 threads finish processing their data much faster than the fourth for whatever reason. Your program will then have to wait until the fourth one finishes during which your 3 other cores will be sitting around doing nothing. So I think it might be better to divide your data set up into more segments than you have cores and then have a scheduler that allocates segments to threads. When a thread finishes a segment, it asks the scheduler for another segments.

I think this would work quite well if you can use JDK 5.0. Like I said I'm not an expert on this sort of thing but I understand that JDK 5.0 has CAS (compare and set) types which do not need synchronisation. This would let you implement a scheduler that allocates segments without needing to synchronise its methods.

hope this helps

b e n

Flynnstone · Jul 27, 2007

You even might want to consider breaking into 8 "chunks" ( or 16). Eight core are likely to become more common.
The best way to determine is to test. Try 1 through 16 chunks and time it.

robbieduncan · Jul 27, 2007

As alluded to above there are a lot of pitfalls to multi-threaded processing. I know: I am about to start acceptance testing of a system that includes some Java I used that has a massive number of threads (on one configuration over 120).

The biggest killer in terms of performance is cross-thread synchronisation. You want to avoid this at all costs.

For example instead of having one global array that they threads copy the "good" items into have 1 per thread. Then merge these at the end. This will be much, much faster.

Oh, and if you're using Java 1.4 (not sure about 1.5) add this to the command line: -XX:+UseConcMarkSweepGC For me on Solaris this results in a 100% or better performance gain.

astrostu · Jul 27, 2007

iSee said:
If, when you run the same computations using your single threaded solution, is the CPU usage near 100%?

Yes.

iSee said:
If so, the multiple threads will probably help you. As you describe it, the threads can really churn away independently until the end, so they should help a lot.

That's the hope.

savar said:
... if the two threads are writing to the same part of memory, then each core is going to be constantly invalidating the other cores cache.

The plan is to create a dummy array that each thread will write to (one per thread) and then when they're done to combine them into the final array, avoiding the issue of overwriting.

lazydog said:
Imagine that the first 3 threads finish processing their data much faster than the fourth for whatever reason. Your program will then have to wait until the fourth one finishes during which your 3 other cores will be sitting around doing nothing. So I think it might be better to divide your data set up into more segments than you have cores and then have a scheduler that allocates segments to threads. When a thread finishes a segment, it asks the scheduler for another segments.

I don't think that will be an issue because I'd effectively be dividing a circle in half (for two threads) or into quadrants (for four threads), and hence each should take the same amount of time.

Flynnstone said:
You even might want to consider breaking into 8 "chunks" ( or 16). Eight core are likely to become more common.
The best way to determine is to test. Try 1 through 16 chunks and time it.

See the above comment. This is something I'll probably try timing and see what happens.

robbieduncan said:
The biggest killer in terms of performance is cross-thread synchronisation. You want to avoid this at all costs.

For example instead of having one global array that they threads copy the "good" items into have 1 per thread. Then merge these at the end. This will be much, much faster.

Yep, that's the plan.

robbieduncan said:
Oh, and if you're using Java 1.4 (not sure about 1.5) add this to the command line: -XX:+UseConcMarkSweepGC For me on Solaris this results in a 100% or better performance gain.

What does that do? I know next to nothing about compiling options.

lazydog · Jul 27, 2007

astrostu said:
Yes.
I don't think that will be an issue because I'd effectively be dividing a circle in half (for two threads) or into quadrants (for four threads), and hence each should take the same amount of time.

Aren't you assuming that the 4 threads will be running, all the time, on separate cores? It might even be having more threads than cores might improve times. I guess some tests would answer all this.

b e n

Krevnik · Jul 27, 2007

savar said:
The most significant difference that I can think of is that each core has its own cache. So whereas two threads running on one core can share the same data in the cache, if the threads are on separate cores then each must access the memory separately.

Not entirely accurate. The Intel Core line has shared caches between cores for each die. The catch is that Core 2 Duos, for example, have a single cache, while Core 2 Quads (Kentsfield, Clovertown), and Xeon Dual-Proc systems have two sets of cache, as you actually have two dies on the chip (early Core 2 Quads are 2 Core 2 Duos on a single chip).

So, while your comments should be /assumed/ for the sake of design purposes, it isn't the case in hardware. (But since you can't tell if your cache is shared or not, you should assume it isn't)

ChrisA · Jul 27, 2007

This problem has a classic solution. One name for it is "boss and workers". Workers ask the boss for something to do, he gives then some data (or just tells then what data to work on and the worker goes and gets it himself.) when the job is done the workers sends the result to the boss. THe key here is to decide how big each job needs to be. You can also adjust the number of workers. The workers can be threads or they can be processes running on a networked computer. The prime example of this is Seti At Home but a lot of animated films are rendered this way

robbieduncan · Jul 28, 2007

astrostu said:
What does that do? I know next to nothing about compiling options.

It's a runtime (not compiler, you add it to the java not javac command) option that enables the concurrent, multi-threaded garbage collector instead of the non-concurrent, single threaded one that is the default. What I was seeing with the normal GC was that whenever it ran all my threads got paused whilst it executed. This resulted in noticeable stalls in processing. With the concurrent GC my threads do not get paused so processing continues whilst garbage collection occurs.

netwalker · Jul 28, 2007

Read & learn about Concurrency and esp. the new Concurrent Framework in Java 1.5.

A starting point:
http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html

stadidas · Jul 29, 2007

savar said:
*Well...actually it doesn't. The OS X scheduler is actually still very naive. But it's a good bet that it will be much improved in Leopard.

One of the new developer builds that came out recently had a newly re-written low-level schedular to take advantage of multi-core systems. Hopefully massive speed gains will ensue.

Search

Search

macOS The Basic Idea Behind Multi-Core Coding Is ... ?

astrostu

macrumors 6502

iSee

macrumors 68040

savar

macrumors 68000

lazydog

macrumors 6502a

Flynnstone

macrumors 65816

robbieduncan

Moderator emeritus

astrostu

macrumors 6502

lazydog

macrumors 6502a

Krevnik

macrumors 601

ChrisA

macrumors G5

robbieduncan

Moderator emeritus

netwalker

macrumors regular

stadidas

macrumors regular

Our Staff