Register FAQ / Rules Forum Spy Search Today's Posts Mark Forums Read
Go Back   MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Reply
 
Thread Tools Search this Thread Display Modes
Old Jul 26, 2007, 03:15 PM   #1
astrostu
macrumors 6502
 
Join Date: Feb 2007
The Basic Idea Behind Multi-Core Coding Is ... ?

I'm writing a lot of Java code for my work-research. I'm using a machine that has 4 cores and a lot of the code is looping through lists of data points, comparing them to a number, and depending upon the result, copying the value to another array or going on to the next one (grossly simplified, but you get the idea).

My question is, to make this take advantage of four processors, is the basic idea such that I create 4 threads, have one thread loop through the first 25% of the values, second thread the next 25%, and so on, and each one will automatically take up a different processor? (And then I combine the lists afterwards.)

Or is it more complicated?
astrostu is offline   0 Reply With Quote
Old Jul 26, 2007, 05:46 PM   #2
iSee
macrumors 68030
 
iSee's Avatar
 
Join Date: Oct 2004
Yes, that's the right idea. You don't really get a guarantee that each thread will run on a separate processor core, but you can't take advantage of the extra cores without threads (or processes) to run on them.

Check one thing to make sure this will really help, before you go through the trouble though (maybe you've already determined this):

If, when you run the same computations using your single threaded solution, is the CPU usage near 100%? (Activity Monitor in the Utilities subfolder of the Applications folder gives you an easy way to monitor this, if you didn't know).

If so, the multiple threads will probably help you. As you describe it, the threads can really churn away independently until the end, so they should help a lot.
iSee is offline   0 Reply With Quote
Old Jul 26, 2007, 09:12 PM   #3
savar
macrumors 68000
 
savar's Avatar
 
Join Date: Jun 2003
Location: District of Columbia
Send a message via AIM to savar
I'm no expert, but I'll throw some wood on the fire:

Multicore is at first glance no different than multi-thread or multi-process. The OS scheduler will distribute threads and processes across the multiple cores as efficiently as it knows how to.*

The most significant difference that I can think of is that each core has its own cache. So whereas two threads running on one core can share the same data in the cache, if the threads are on separate cores then each must access the memory separately.

So to amend my first statement, you want to create threads or processes which operate on logically separable chunks of data. So yeah, chunking the data into fourths will probably use the cores pretty efficiently.

What's more, if the two threads are writing to the same part of memory, then each core is going to be constantly invalidating the other cores cache.

So I think that what seems like a simple problem actually becomes fairly complex, even if you ignore the inherent problems with multithreading. Luckily there will be multicore APIs coming out soon...these should abstract out most of the complexity of writing efficiently parallelizable code.
.
*Well...actually it doesn't. The OS X scheduler is actually still very naive. But it's a good bet that it will be much improved in Leopard.
__________________
Mehce
savar is offline   0 Reply With Quote
Old Jul 27, 2007, 03:50 AM   #4
lazydog
macrumors 6502a
 
Join Date: Sep 2005
Location: Cramlington, UK
Send a message via MSN to lazydog
Hi
I'm not an expert either but I've got a suggestion. Imagine that the first 3 threads finish processing their data much faster than the fourth for whatever reason. Your program will then have to wait until the fourth one finishes during which your 3 other cores will be sitting around doing nothing. So I think it might be better to divide your data set up into more segments than you have cores and then have a scheduler that allocates segments to threads. When a thread finishes a segment, it asks the scheduler for another segments.

I think this would work quite well if you can use JDK 5.0. Like I said I'm not an expert on this sort of thing but I understand that JDK 5.0 has CAS (compare and set) types which do not need synchronisation. This would let you implement a scheduler that allocates segments without needing to synchronise its methods.

hope this helps

b e n
lazydog is offline   0 Reply With Quote
Old Jul 27, 2007, 07:42 AM   #5
Flynnstone
macrumors 65816
 
Flynnstone's Avatar
 
Join Date: Feb 2003
Location: Cold beer land
You even might want to consider breaking into 8 "chunks" ( or 16). Eight core are likely to become more common.
The best way to determine is to test. Try 1 through 16 chunks and time it.
__________________
"You can't solve you're problems with the same level of thinking that created the problems." - Einstein
New iMac 24"/G5 1.8/...
Flynnstone is offline   0 Reply With Quote
Old Jul 27, 2007, 08:09 AM   #6
robbieduncan
Moderator
 
robbieduncan's Avatar
 
Join Date: Jul 2002
Location: London
As alluded to above there are a lot of pitfalls to multi-threaded processing. I know: I am about to start acceptance testing of a system that includes some Java I used that has a massive number of threads (on one configuration over 120).

The biggest killer in terms of performance is cross-thread synchronisation. You want to avoid this at all costs.

For example instead of having one global array that they threads copy the "good" items into have 1 per thread. Then merge these at the end. This will be much, much faster.

Oh, and if you're using Java 1.4 (not sure about 1.5) add this to the command line: -XX:+UseConcMarkSweepGC For me on Solaris this results in a 100% or better performance gain.
robbieduncan is offline   0 Reply With Quote
Old Jul 27, 2007, 03:06 PM   #7
astrostu
Thread Starter
macrumors 6502
 
Join Date: Feb 2007
Quote:
Originally Posted by iSee View Post
If, when you run the same computations using your single threaded solution, is the CPU usage near 100%?
Yes.


Quote:
Originally Posted by iSee View Post
If so, the multiple threads will probably help you. As you describe it, the threads can really churn away independently until the end, so they should help a lot.
That's the hope.


Quote:
Originally Posted by savar View Post
... if the two threads are writing to the same part of memory, then each core is going to be constantly invalidating the other cores cache.
The plan is to create a dummy array that each thread will write to (one per thread) and then when they're done to combine them into the final array, avoiding the issue of overwriting.


Quote:
Originally Posted by lazydog View Post
Imagine that the first 3 threads finish processing their data much faster than the fourth for whatever reason. Your program will then have to wait until the fourth one finishes during which your 3 other cores will be sitting around doing nothing. So I think it might be better to divide your data set up into more segments than you have cores and then have a scheduler that allocates segments to threads. When a thread finishes a segment, it asks the scheduler for another segments.
I don't think that will be an issue because I'd effectively be dividing a circle in half (for two threads) or into quadrants (for four threads), and hence each should take the same amount of time.


Quote:
Originally Posted by Flynnstone View Post
You even might want to consider breaking into 8 "chunks" ( or 16). Eight core are likely to become more common.
The best way to determine is to test. Try 1 through 16 chunks and time it.
See the above comment. This is something I'll probably try timing and see what happens.


Quote:
Originally Posted by robbieduncan View Post
The biggest killer in terms of performance is cross-thread synchronisation. You want to avoid this at all costs.

For example instead of having one global array that they threads copy the "good" items into have 1 per thread. Then merge these at the end. This will be much, much faster.
Yep, that's the plan.


Quote:
Originally Posted by robbieduncan View Post
Oh, and if you're using Java 1.4 (not sure about 1.5) add this to the command line: -XX:+UseConcMarkSweepGC For me on Solaris this results in a 100% or better performance gain.
What does that do? I know next to nothing about compiling options.
astrostu is offline   0 Reply With Quote
Old Jul 27, 2007, 04:15 PM   #8
lazydog
macrumors 6502a
 
Join Date: Sep 2005
Location: Cramlington, UK
Send a message via MSN to lazydog
Quote:
Originally Posted by astrostu View Post
Yes.
I don't think that will be an issue because I'd effectively be dividing a circle in half (for two threads) or into quadrants (for four threads), and hence each should take the same amount of time.
Aren't you assuming that the 4 threads will be running, all the time, on separate cores? It might even be having more threads than cores might improve times. I guess some tests would answer all this.

b e n
lazydog is offline   0 Reply With Quote
Old Jul 27, 2007, 05:30 PM   #9
Krevnik
macrumors 68020
 
Krevnik's Avatar
 
Join Date: Sep 2003
Quote:
Originally Posted by savar View Post
The most significant difference that I can think of is that each core has its own cache. So whereas two threads running on one core can share the same data in the cache, if the threads are on separate cores then each must access the memory separately.
Not entirely accurate. The Intel Core line has shared caches between cores for each die. The catch is that Core 2 Duos, for example, have a single cache, while Core 2 Quads (Kentsfield, Clovertown), and Xeon Dual-Proc systems have two sets of cache, as you actually have two dies on the chip (early Core 2 Quads are 2 Core 2 Duos on a single chip).

So, while your comments should be /assumed/ for the sake of design purposes, it isn't the case in hardware. (But since you can't tell if your cache is shared or not, you should assume it isn't)
__________________
iMac 2013 27", 13" rMBP, iPad 4, iPhone 5s
Krevnik is offline   0 Reply With Quote
Old Jul 27, 2007, 06:31 PM   #10
ChrisA
macrumors G4
 
Join Date: Jan 2006
Location: Redondo Beach, California
This problem has a classic solution. One name for it is "boss and workers". Workers ask the boss for something to do, he gives then some data (or just tells then what data to work on and the worker goes and gets it himself.) when the job is done the workers sends the result to the boss. THe key here is to decide how big each job needs to be. You can also adjust the number of workers. The workers can be threads or they can be processes running on a networked computer. The prime example of this is Seti At Home but a lot of animated films are rendered this way
ChrisA is offline   0 Reply With Quote
Old Jul 28, 2007, 02:54 AM   #11
robbieduncan
Moderator
 
robbieduncan's Avatar
 
Join Date: Jul 2002
Location: London
Quote:
Originally Posted by astrostu View Post
What does that do? I know next to nothing about compiling options.
It's a runtime (not compiler, you add it to the java not javac command) option that enables the concurrent, multi-threaded garbage collector instead of the non-concurrent, single threaded one that is the default. What I was seeing with the normal GC was that whenever it ran all my threads got paused whilst it executed. This resulted in noticeable stalls in processing. With the concurrent GC my threads do not get paused so processing continues whilst garbage collection occurs.
robbieduncan is offline   0 Reply With Quote
Old Jul 28, 2007, 07:31 AM   #12
netwalker
macrumors newbie
 
Join Date: Jul 2007
Location: The Netherlands
Read & learn about Concurrency and esp. the new Concurrent Framework in Java 1.5.

A starting point:
http://java.sun.com/docs/books/tutor...ncy/index.html
netwalker is offline   0 Reply With Quote
Old Jul 29, 2007, 08:16 AM   #13
stadidas
macrumors regular
 
Join Date: Feb 2006
Location: Kent, United Kingdom
Quote:
Originally Posted by savar View Post
*Well...actually it doesn't. The OS X scheduler is actually still very naive. But it's a good bet that it will be much improved in Leopard.
One of the new developer builds that came out recently had a newly re-written low-level schedular to take advantage of multi-core systems. Hopefully massive speed gains will ensue.
stadidas is offline   0 Reply With Quote

Reply
MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
thread Thread Starter Forum Replies Last Post
Our new game Brave Code is now looking for beta testers! Lakoo iPhone and iPod touch Apps 0 May 15, 2011 11:10 PM
Is the basic new iMac (i3 3.06Ghz, 4Gb RAM) a big upgrade for me.... MrMister111 iMac 5 Dec 12, 2010 09:57 PM
Who is the interface designer behind the Mac OS X ? alebar14 OS X 2 Oct 25, 2007 03:57 AM


All times are GMT -5. The time now is 05:39 PM.

Mac Rumors | Mac | iPhone | iPhone Game Reviews | iPhone Apps

Mobile Version | Fixed | Fluid | Fluid HD
Copyright 2002-2013, MacRumors.com, LLC