parallel processing

Discussion in 'Mac Programming' started by farmerdoug, Oct 10, 2008.

  1. farmerdoug macrumors 6502a

    Joined:
    Sep 16, 2008
    #1
    If a mac has two cores and has been so programmed, then an appropriately written c code make good use of both processors? Right.
    For example if
    For (i = 0; i < 10; i+= 2)
    a(i) = b(i);
    a(i+1) = b(i+ 1);

    The assignments in the for statement can be done in parallel since neither line depends on the other, right?

    So how do I get this code to run each if statement in parallel.
    while ( ( (laserfile1 = readdir(ld)) != NULL) && ( (laserfile2 = readdir(ld)) != NULL ) )
    {
    if ( strstr(laserfile1->d_name, "fits") != NULL)
    {
    printf("%s\n", laserfile1->d_name);

    //shift
    strcpy (infile1, laserdirectory);
    strcat (infile1, laserfile1->d_name);
    strcpy (outfile1, laserdirectory);
    strcat (outfile1, laserfile1->d_name);
    shift (infile1, outfile1);

    //cosmic rays
    strcpy(cmd1, codedirectory);
    strcat(cmd1, "cosmicrays input = ");
    strcat(cmd1, laserdirectory);
    strcat(cmd1, laserfile1->d_name);
    strcat(cmd1, "[0] ");
    strcat(cmd1, "output = ");
    strcat(cmd1, laserdirectory);
    strcat(cmd1, laserfile1->d_name);
    strcat(cmd1, "[0] ");
    system(cmd1);

    //subtract darks

    //findlaserspots

    strcpy (spotfile1, psfdirectory);
    strcat (spotfile1, "S");
    strcat (spotfile1, laserfile1->d_name);
    findlaserspots (infile1, spotfile1);

    //make psf images

    strcpy (outfile1, psfdirectory);
    strcat (outfile1, laserfile1->d_name);
    makelaserpsf (spotfile1, infile1, outfile1);

    centroidpsfs(spotfile1, outfile1);

    }
    if ( strstr(laserfile2->d_name, "fits") != NULL)
    {
    printf("%s\n", laserfile2->d_name);

    //shift
    strcpy (infile2, laserdirectory);
    strcat (infile2, laserfile2->d_name);
    strcpy (outfile2, laserdirectory);
    strcat (outfile2, laserfile2->d_name);
    shift (infile2, outfile2);

    //cosmic rays
    strcpy(cmd2, codedirectory);
    strcat(cmd2, "cosmicrays input = ");
    strcat(cmd2, laserdirectory);
    strcat(cmd2, laserfile2->d_name);
    strcat(cmd2, "[0] ");
    strcat(cmd2, "output = ");
    strcat(cmd2, laserdirectory);
    strcat(cmd2, laserfile2->d_name);
    strcat(cmd2, "[0] ");
    system(cmd2);

    //subtract darks

    //findlaserspots

    strcpy (spotfile2, psfdirectory);
    strcat (spotfile2, "S");
    strcat (spotfile2, laserfile2->d_name);
    findlaserspots (infile2, spotfile2);

    //make psf images

    strcpy (outfile2, psfdirectory);
    strcat (outfile2, laserfile2->d_name);
    makelaserpsf (spotfile2, infile2, outfile2);

    centroidpsfs(spotfile2, outfile2);
    }

    }

    Thanks
     
  2. Cromulent macrumors 603

    Cromulent

    Joined:
    Oct 2, 2006
    Location:
    The Land of Hope and Glory
    #2
    You need to use POSIX threads. There is tons of C documentation for it on the web.
     
  3. farmerdoug thread starter macrumors 6502a

    Joined:
    Sep 16, 2008
  4. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #4
    The comment about using Posix threads was absolutely right; but you also have to think about what you want to run in parallel: Splitting a loop in two parts like that is nonsense, because it takes thousand times longer for two threads to communicate whether they are both finished than it takes to copy ten integers; and it would be even worse if you copied 100 million integers:

    If one core runs
    for (i = 0; i < 100000000; i += 2) a = b ;
    and the other core runs
    for (i = 0; i < 1000000000; i += 2) a [i+1] = b [i+1];

    then both cores access data in the same cache lines, so they will be waiting for each other all the time, and your code likely runs ten times slower than on a single processor. You think these instructions are different, the processor thinks differently.

    You need to identify pieces of code that run for a long, long time (like a millisecond or at least tens of microseconds) independent from other code.
     
  5. Berlepsch macrumors 6502

    Berlepsch

    Joined:
    Oct 22, 2007
    #5
    I have to agree with gnash - the bottleneck in your program is not the processor but the hard disk, and having concurring threads accessing different files on the disk will probably make a your program slower than a single serial task.
     
  6. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #6
    In theory you are right but there are many way to do "parallel processing" using mutliple CPUs is just one way. The multi-core method is absolutly the wrong way to solve the above problem. For your problem you need a CPU with an instruction set designed for allow for multiple operations to be executing at once and then you'd have to write the code. In the past I've worked with machine that could do multiple asignmants at the same time. This machine had multiple paths to RAM.

    If yu must use a multi core machine for this then you could re-write the loop at

    For (i = 0; i < 10; i+= 2)
    a(i) = b(i);

    For (i = 1; i < 10; i+= 2)
    a(i+1) = b(i+ 1);

    And then run each loop on it's own core.

    But typically on a multi-core machine you's split the problem up completely differently. For exampple you's run the user interface on one core, network I/O on another and so on and desing you application as a set of cooperating tasks. You would not break up a loop unless the loop took a LONG time per iteration, for example if you were rendering movie frames, one frame per loop
     
  7. cube macrumors G5

    Joined:
    May 10, 2004
    #7
  8. AlmostThere macrumors 6502a

    #8
    I would second this. It has proved an effective strategy for me when there is a lot of network latency, where multiple network calls can be run concurrently. If functions like "findlaserspots" and "subtract darks" and "cosmic rays" involve lengthy calculations, this might be a strategy worth investigating.

    On that point, you can also parallelize numerical processing using SIMD such as AltiVec and SSE3 (wikipedia is your friend), e.g. process arrays 4 values at a time, so you only need 4 iterations for an array of 16 numbers.

    Apple offer this functionality through a range of 'Accelerate' vector libraries, which gives you C library function calls. There are some additional complications and works best with large data sets, e.g. images. This is similar to sort of optimisation that the GPU uses in libraries such as Core Image and we will likely see broader functionality through the upcoming OpenCL.

    Anyway, probably worth mentioning in a thread about parallel processing.
     
  9. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #9
    Just add a "&" to the end of your commands before calling them with system. This will run them in the background, so your program will continue execution immediately after the calls to system. This will only work if the results of each program you run with system do not depend on the completion of the previous. It sounds like that is the case. You also don't seem to read the results of any of these programs, so it should work fine. This gets you parallel execution without threads. Multi-programming doesn't have to mean multi-threading.

    -Lee
     
  10. AlmostThere macrumors 6502a

  11. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #11
    Hey, Now I see what you need to do. "subtract darks" "cosmic rays", "PSF". I've dome my share of image processing. What you need to do is state the problem not ask about solutions. You problem is that you have a zillion images to reduce. In this case the model to use is "boss and workers" You set up one "worker" process. It's logic is siimple "while(forever) {ask the bos for an image to work on, process that image}" Then you make a boss process which is also simple "get big stack of files to be processeed, set pointer to top of stack, while(forever) Wait for request from worker, give him whatever is on top of stack, decrement pointer.}"

    The key here is to run just one "Boss" and as many copies of the "worker" as you have computers and core to run them on. Note that tis will work very well on a networked cluster of 100+ rack mounted "worker" computers. This is how Pixar and the like process animated movies, one frame at a time using a room full of computers. In other words "Boss and worker scales well"

    You can implement this with theads but with jobs so large you don't have to bother. You couod use shall scipts or heck even IRAF's "CL". The larger the jobs the boss hands out the lower the ooverhead. If you are handiong out 4k x 4k image files the over head will be low. But if you are handing out "stars" (small sub images) to be centrioded then you have to much overhead



     

Share This Page