macOS parallel processing

farmerdoug · Oct 10, 2008

If a mac has two cores and has been so programmed, then an appropriately written c code make good use of both processors? Right.
For example if
For (i = 0; i < 10; i+= 2)
a(i) = b(i);
a(i+1) = b(i+ 1);

The assignments in the for statement can be done in parallel since neither line depends on the other, right?

So how do I get this code to run each if statement in parallel.
while ( ( (laserfile1 = readdir(ld)) != NULL) && ( (laserfile2 = readdir(ld)) != NULL ) )
{
if ( strstr(laserfile1->d_name, "fits") != NULL)
{
printf("%s\n", laserfile1->d_name);

//shift
strcpy (infile1, laserdirectory);
strcat (infile1, laserfile1->d_name);
strcpy (outfile1, laserdirectory);
strcat (outfile1, laserfile1->d_name);
shift (infile1, outfile1);

//cosmic rays
strcpy(cmd1, codedirectory);
strcat(cmd1, "cosmicrays input = ");
strcat(cmd1, laserdirectory);
strcat(cmd1, laserfile1->d_name);
strcat(cmd1, "[0] ");
strcat(cmd1, "output = ");
strcat(cmd1, laserdirectory);
strcat(cmd1, laserfile1->d_name);
strcat(cmd1, "[0] ");
system(cmd1);

//subtract darks

//findlaserspots

strcpy (spotfile1, psfdirectory);
strcat (spotfile1, "S");
strcat (spotfile1, laserfile1->d_name);
findlaserspots (infile1, spotfile1);

//make psf images

strcpy (outfile1, psfdirectory);
strcat (outfile1, laserfile1->d_name);
makelaserpsf (spotfile1, infile1, outfile1);

centroidpsfs(spotfile1, outfile1);

}
if ( strstr(laserfile2->d_name, "fits") != NULL)
{
printf("%s\n", laserfile2->d_name);

//shift
strcpy (infile2, laserdirectory);
strcat (infile2, laserfile2->d_name);
strcpy (outfile2, laserdirectory);
strcat (outfile2, laserfile2->d_name);
shift (infile2, outfile2);

//cosmic rays
strcpy(cmd2, codedirectory);
strcat(cmd2, "cosmicrays input = ");
strcat(cmd2, laserdirectory);
strcat(cmd2, laserfile2->d_name);
strcat(cmd2, "[0] ");
strcat(cmd2, "output = ");
strcat(cmd2, laserdirectory);
strcat(cmd2, laserfile2->d_name);
strcat(cmd2, "[0] ");
system(cmd2);

//subtract darks

//findlaserspots

strcpy (spotfile2, psfdirectory);
strcat (spotfile2, "S");
strcat (spotfile2, laserfile2->d_name);
findlaserspots (infile2, spotfile2);

//make psf images

strcpy (outfile2, psfdirectory);
strcat (outfile2, laserfile2->d_name);
makelaserpsf (spotfile2, infile2, outfile2);

centroidpsfs(spotfile2, outfile2);
}

}

Thanks

Cromulent · Oct 10, 2008

You need to use POSIX threads. There is tons of C documentation for it on the web.

farmerdoug · Oct 10, 2008

thanks eom.

thanks

gnasher729 · Oct 10, 2008

farmerdoug said:
If a mac has two cores and has been so programmed, then an appropriately written c code make good use of both processors? Right.
For example if
For (i = 0; i < 10; i+= 2)
a(i) = b(i);
a(i+1) = b(i+ 1);

The assignments in the for statement can be done in parallel since neither line depends on the other, right?

The comment about using Posix threads was absolutely right; but you also have to think about what you want to run in parallel: Splitting a loop in two parts like that is nonsense, because it takes thousand times longer for two threads to communicate whether they are both finished than it takes to copy ten integers; and it would be even worse if you copied 100 million integers:

If one core runs
for (i = 0; i < 100000000; i += 2) a = b ;
and the other core runs
for (i = 0; i < 1000000000; i += 2) a [i+1] = b [i+1];

then both cores access data in the same cache lines, so they will be waiting for each other all the time, and your code likely runs ten times slower than on a single processor. You think these instructions are different, the processor thinks differently.

You need to identify pieces of code that run for a long, long time (like a millisecond or at least tens of microseconds) independent from other code.

Berlepsch · Oct 10, 2008

I have to agree with gnash - the bottleneck in your program is not the processor but the hard disk, and having concurring threads accessing different files on the disk will probably make a your program slower than a single serial task.

ChrisA · Oct 10, 2008

farmerdoug said:
If a mac has two cores and has been so programmed, then an appropriately written c code make good use of both processors? Right.
For example if
For (i = 0; i < 10; i+= 2)
a(i) = b(i);
a(i+1) = b(i+ 1);

The assignments in the for statement can be done in parallel since neither line depends on the other, right?

In theory you are right but there are many way to do "parallel processing" using mutliple CPUs is just one way. The multi-core method is absolutly the wrong way to solve the above problem. For your problem you need a CPU with an instruction set designed for allow for multiple operations to be executing at once and then you'd have to write the code. In the past I've worked with machine that could do multiple asignmants at the same time. This machine had multiple paths to RAM.

If yu must use a multi core machine for this then you could re-write the loop at

For (i = 0; i < 10; i+= 2)
a(i) = b(i);

For (i = 1; i < 10; i+= 2)
a(i+1) = b(i+ 1);

And then run each loop on it's own core.

But typically on a multi-core machine you's split the problem up completely differently. For exampple you's run the user interface on one core, network I/O on another and so on and desing you application as a set of cooperating tasks. You would not break up a loop unless the loop took a LONG time per iteration, for example if you were rendering movie frames, one frame per loop

cube · Oct 10, 2008

Using threads directly is more for coarse-grained stuff.

I think this is currently the better approach in the C(++) world:

http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm

AlmostThere · Oct 10, 2008

ChrisA said:
If you must use a multi core machine for this then you could re-write the loop at

For (i = 0; i < 10; i+= 2)
a(i) = b(i);

For (i = 1; i < 10; i+= 2)
a(i+1) = b(i+ 1);

And then run each loop on it's own core.

But typically on a multi-core machine you's split the problem up completely differently. For exampple you's run the user interface on one core, network I/O on another and so on and desing you application as a set of cooperating tasks. You would not break up a loop unless the loop took a LONG time per iteration, for example if you were rendering movie frames, one frame per loop

I would second this. It has proved an effective strategy for me when there is a lot of network latency, where multiple network calls can be run concurrently. If functions like "findlaserspots" and "subtract darks" and "cosmic rays" involve lengthy calculations, this might be a strategy worth investigating.

On that point, you can also parallelize numerical processing using SIMD such as AltiVec and SSE3 (wikipedia is your friend), e.g. process arrays 4 values at a time, so you only need 4 iterations for an array of 16 numbers.

Apple offer this functionality through a range of 'Accelerate' vector libraries, which gives you C library function calls. There are some additional complications and works best with large data sets, e.g. images. This is similar to sort of optimisation that the GPU uses in libraries such as Core Image and we will likely see broader functionality through the upcoming OpenCL.

Anyway, probably worth mentioning in a thread about parallel processing.

lee1210 · Oct 10, 2008

Just add a "&" to the end of your commands before calling them with system. This will run them in the background, so your program will continue execution immediately after the calls to system. This will only work if the results of each program you run with system do not depend on the completion of the previous. It sounds like that is the case. You also don't seem to read the results of any of these programs, so it should work fine. This gets you parallel execution without threads. Multi-programming doesn't have to mean multi-threading.

-Lee

AlmostThere · Oct 10, 2008

and in that vein, fork()

Loads of options 😀

ChrisA · Oct 21, 2008

Hey, Now I see what you need to do. "subtract darks" "cosmic rays", "PSF". I've dome my share of image processing. What you need to do is state the problem not ask about solutions. You problem is that you have a zillion images to reduce. In this case the model to use is "boss and workers" You set up one "worker" process. It's logic is siimple "while(forever) {ask the bos for an image to work on, process that image}" Then you make a boss process which is also simple "get big stack of files to be processeed, set pointer to top of stack, while(forever) Wait for request from worker, give him whatever is on top of stack, decrement pointer.}"

The key here is to run just one "Boss" and as many copies of the "worker" as you have computers and core to run them on. Note that tis will work very well on a networked cluster of 100+ rack mounted "worker" computers. This is how Pixar and the like process animated movies, one frame at a time using a room full of computers. In other words "Boss and worker scales well"

You can implement this with theads but with jobs so large you don't have to bother. You couod use shall scipts or heck even IRAF's "CL". The larger the jobs the boss hands out the lower the ooverhead. If you are handiong out 4k x 4k image files the over head will be low. But if you are handing out "stars" (small sub images) to be centrioded then you have to much overhead

farmerdoug said:
If a mac has two cores and has been so programmed, then an appropriately written c code make good use of both processors? Right.
For example if
For (i = 0; i < 10; i+= 2)
a(i) = b(i);
a(i+1) = b(i+ 1);

The assignments in the for statement can be done in parallel since neither line depends on the other, right?

So how do I get this code to run each if statement in parallel.
while ( ( (laserfile1 = readdir(ld)) != NULL) && ( (laserfile2 = readdir(ld)) != NULL ) )
{
if ( strstr(laserfile1->d_name, "fits") != NULL)
{
printf("%s\n", laserfile1->d_name);

//shift
strcpy (infile1, laserdirectory);
strcat (infile1, laserfile1->d_name);
strcpy (outfile1, laserdirectory);
strcat (outfile1, laserfile1->d_name);
shift (infile1, outfile1);

//cosmic rays
strcpy(cmd1, codedirectory);
strcat(cmd1, "cosmicrays input = ");
strcat(cmd1, laserdirectory);
strcat(cmd1, laserfile1->d_name);
strcat(cmd1, "[0] ");
strcat(cmd1, "output = ");
strcat(cmd1, laserdirectory);
strcat(cmd1, laserfile1->d_name);
strcat(cmd1, "[0] ");
system(cmd1);

//subtract darks

//findlaserspots

strcpy (spotfile1, psfdirectory);
strcat (spotfile1, "S");
strcat (spotfile1, laserfile1->d_name);
findlaserspots (infile1, spotfile1);

//make psf images

strcpy (outfile1, psfdirectory);
strcat (outfile1, laserfile1->d_name);
makelaserpsf (spotfile1, infile1, outfile1);

centroidpsfs(spotfile1, outfile1);

}
if ( strstr(laserfile2->d_name, "fits") != NULL)
{
printf("%s\n", laserfile2->d_name);

//shift
strcpy (infile2, laserdirectory);
strcat (infile2, laserfile2->d_name);
strcpy (outfile2, laserdirectory);
strcat (outfile2, laserfile2->d_name);
shift (infile2, outfile2);

//cosmic rays
strcpy(cmd2, codedirectory);
strcat(cmd2, "cosmicrays input = ");
strcat(cmd2, laserdirectory);
strcat(cmd2, laserfile2->d_name);
strcat(cmd2, "[0] ");
strcat(cmd2, "output = ");
strcat(cmd2, laserdirectory);
strcat(cmd2, laserfile2->d_name);
strcat(cmd2, "[0] ");
system(cmd2);

//subtract darks

//findlaserspots

strcpy (spotfile2, psfdirectory);
strcat (spotfile2, "S");
strcat (spotfile2, laserfile2->d_name);
findlaserspots (infile2, spotfile2);

//make psf images

strcpy (outfile2, psfdirectory);
strcat (outfile2, laserfile2->d_name);
makelaserpsf (spotfile2, infile2, outfile2);

centroidpsfs(spotfile2, outfile2);
}

}

Thanks

Search

Search

macOS parallel processing

farmerdoug

macrumors 6502a

Cromulent

macrumors 604

farmerdoug

macrumors 6502a

gnasher729

Suspended

Berlepsch

macrumors 6502

ChrisA

macrumors G5

cube

Suspended

AlmostThere

macrumors 6502a

lee1210

macrumors 68040

AlmostThere

macrumors 6502a

ChrisA

macrumors G5

Our Staff