PDA

View Full Version : Learning threads and distributed computing




MrFusion
Mar 14, 2009, 05:20 AM
Hello everyone,

I want to tackle threads and distributed computing as the next step in learning how to program.
I am looking for a topic or problem that could benefit from threading, but which is not so complicated that a hobbyist programmer can't do it. It's a learning experience, so the advantages and disadvantages of threading should be obvious.

Many people here have or will have a CS degree. Maybe you want to share some idea's or assignments you had in college?

Please don't hold back on the basis of math. I am certain I can handle most mathematical problems. A physics problem would be even better. But in the end it's about learning how to program.

Thanks!



lazydog
Mar 14, 2009, 05:28 AM
You could try your hand at Genetic Algorithms. Have a look at Mona Lisa in 50 triangles (http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/). You could certainly use threads and multiple cores in a project similar to this.

b e n

Catfish_Man
Mar 14, 2009, 09:13 AM
The classic "intricacies of threading" problem is the dining philosophers (http://en.wikipedia.org/wiki/Dining_philosophers_problem).

lee1210
Mar 14, 2009, 10:13 AM
The classic "intricacies of threading" problem is the dining philosophers (http://en.wikipedia.org/wiki/Dining_philosophers_problem).

The last solution listed on that page is co-credited to Jayadev Misra (http://www.cs.utexas.edu/~misra/). I'm sure it's of little interest to most, but he taught a class of mine (CS 337) at UT Austin, and it was probably one of the best courses of my college career. Those currently learning to program or even those with a great deal of experience may wish to read the following, which is a transcription of a speech Dr. Misra gave on our last day of class:
http://www.cs.utexas.edu/users/misra/Speeches.dir/337.pdf

-Lee

Edit: As I feared, I am repeating myself. This was originally posted in this thread:
http://forums.macrumors.com/showthread.php?t=571779
I will use any excuse to bring it up, because i think it's a worthwhile read. Also, re-reading that thread I am embarrassed to see i misspelled "collector" at one point. I feel it's not worthwhile to correct now. =)

Cromulent
Mar 14, 2009, 11:26 AM
The last solution listed on that page is co-credited to Jayadev Misra (http://www.cs.utexas.edu/%7Emisra/). I'm sure it's of little interest to most, but he taught a class of mine (CS 337) at UT Austin, and it was probably one of the best courses of my college career. Those currently learning to program or even those with a great deal of experience may wish to read the following, which is a transcription of a speech Dr. Misra gave on our last day of class:
http://www.cs.utexas.edu/users/misra/Speeches.dir/337.pdf

-Lee

Edit: As I feared, I am repeating myself. This was originally posted in this thread:
http://forums.macrumors.com/showthread.php?t=571779
I will use any excuse to bring it up, because i think it's a worthwhile read. Also, re-reading that thread I am embarrassed to see i misspelled "collector" at one point. I feel it's not worthwhile to correct now. =)

Interesting read. Thanks Lee.

lee1210
Mar 14, 2009, 12:51 PM
this may not be the best place to start, but i figured since i posted largely off-topic once, i might try an on-topic post.

Have you written anything particularly long running so far? If so, is there any means by which you might break the tasks down such that they could be done in parallel ? For example, searching a file system for a file. You could make each subdirectory a new node, and process each node in a separate thread. You would need to have a synchronized list of pending nodes, and a main thread that is dispatching work up to N concurrent threads, where N is some maximum. It's normally best to pick an N that's one less than available cores, but for something that is for practice, this may not be necessary.

Otherwise... you could have a system monitor that gathers a number of pieces of information and reports them to a remote system. In serial, this might take 5 minutes to gather all information and transmit it, but if these can all be run in parallel, you might be able to do this in a fraction of the time, etc.

Other common tasks are things like image-recognition/processing. For example, one could break down a 256x256 image into 16 16x16 images, and process each independently. The units would have to communicate partial matches so something that crosses the boundaries isn't missed, etc. but it is parallelizable problem.

Choosing a standard, well-understood problem as others have suggested might be better, though, since you will have a lot of documentation to look at if you run into trouble.

-Lee

MrFusion
Mar 14, 2009, 01:26 PM
Have you written anything particularly long running so far?


Yes. I have a program that is slow and that (I think) could benefit from threading. Before I start rewriting that, I want to play around with threading. Best way to do so is to first do a problem which can be truly parallelized and which is well documented. With a new problem comes new ideas as well.

I can probably start messing around, but then most likely I will be flooding this forum even more with questions about how to do something. I am also interested in learning it the right way (design wise). Any solution I might accidently find is not necessarily the best method.

Thanks for the suggestions everyone. If you have more, I am more than happy to hear them.

lee1210
Mar 14, 2009, 04:06 PM
You should read before you start coding. A lot. Some things you should look into that haven't been mentioned already:
Readers-writers problem
Semaphores (What do P and V mean... not the dutch words, but conceptually)
Mutexes (a special case of a semaphore)
Implementing Semaphores and Mutexes in a particular language you wish to work in
Why you'd use a mutex (mutual exclusion to a resource) vs. a semaphore (atomically incrementing decrementing a value for tracking multiple resource access, etc.)

It may be worthwhile (even if not for this specific purpose) to read some of the writings of Dr. Dijkstra:
http://www.cs.utexas.edu/users/EWD/
Beware that a number are in Dutch, so unless you are a dutch speaker you may have to seek out translations.

I have gathered up some material that I've attached here from my OS class. Hopefully Dr. Emmett Witchel will not mind my posting of these materials, as it is not-for-profit. His page re: the current incarnation of the course these are from can be found here:
http://www.cs.utexas.edu/users/witchel/372/
All but the solutions to lab2 and lab3 are to his credit (or perhaps Dr. Kathryn McKinley, who was teaching the course at the same time).

There are lecture notes available. I am sure there are many other similar courses that may have other materials as well, but i refer to these as they are where i learned these things. The projects included are my own. They were written about 5 years ago, and i don't think i scored particularly well, but hopefully they will be of use.

The materials are not well organized, but i did make a point to find the original assignments for lab2 and lab3 from the wayback machine so you'd have some idea of what that code was meant to do, and so that you could attempt to implement your own solution if desired. The code is all java.

-Lee

lee1210
Mar 14, 2009, 04:24 PM
sorry to double post, but i don't want to edit and re-edit as people are reading the thread... so:

You might want to do something like finding if (and where) in a 2048x2048 grid of black/white pixels (this could just be a 2d array of booleans, in practice) a certain pattern (say a 2x2 block of black pixels) lays. You can then break it down into smaller blocks, and have certain types of tasks that can be distributed to threads. One type of task is:
check block (16x16), which determines if all or part of the pattern falls in a 16x16 grid.
check block (4x4), this could be a secondary task that would check to see if a 4x4 block is the pattern or not. if the previous task had a partial match, this could be used to see if the full pattern existed at that spot
distribute blocks, the "master thread" could be in charge of giving out tasks, and generating new tasks as threads return partial matches.

You could start assuming there would be at most one of the pattern, then generalize from there, etc.

-Lee