PDA

View Full Version : [C++] MacPro slower than MacBook




topcomer
Jul 28, 2009, 02:58 AM
Hi,

I develop C++ code for scientific computing, usually on my MacBook, but now my research group bought a MacPro and I tried to build and run my code there. Well, it turned out that the speed is comparable for extremely small memory usage, while for larger data (if 30mb can be considered larger), the MacBook totally outperforms the MacPro. For reasonable data (200mb) the MacPro is almost stuck.

I then tried to profile using Shark, and found that the following call takes 23% of the runtime of my process:

ml_set_interrupts_enabled

Can someone please help me to understand what is going on?
Thank you!



gnasher729
Jul 28, 2009, 03:24 AM
Google is your friend. Why don't you type "ml_set_interrupts_enabled" into Google and see what happens? Maybe someone had the same question in 2004, posted on the Mac developers site, and got a reply from Eric Schlegel?

(Note: If you don't know who Eric Schlegel is, then I can guarantee he knows 1000 times more about MacOS X than you do. If you know who he is, he very very very likely knows more about it).

darkwing
Jul 28, 2009, 08:14 AM
gnasher729, I did what you said and found something completely irrelevant to OP's question. Sure the person asked why so much time was being used there, but he wasn't having tremendously slow code on a MacPro and fast code on a MacBook. Also, OP clearly states his code is slow on the MacPro while processing data and the google result was about CPU usage during idle.

OP, osx is tremendously slow and inefficient. That aside, is your code written with parallel processing in mind? The fact that the MacPro is going to have more cores than your MacBook, and the call you are seeing heavy usage in, probably indicates you will want to see how you are handling data concurrently. Are you locking a lot of semaphores?

Without more information as to what your code is doing I'm grasping at straws here.

Sayer
Jul 28, 2009, 08:38 AM
Without a lot more detail its impossible to know where the culprit is. Two different computers may have two totally different installs of the dev tools.

First make sure both environments are identical e.g. Xcode is the same version on both and configured the same (Xcode installs several versions of gcc, g++ and now LLVM compilers and sets the default version to use also). Which version of Mac OS X are on both?

Is this a command line tool? Is it built with a make file?

More details needed...

gnasher729
Jul 28, 2009, 11:29 AM
gnasher729, I did what you said and found something completely irrelevant to OP's question. Sure the person asked why so much time was being used there, but he wasn't having tremendously slow code on a MacPro and fast code on a MacBook. Also, OP clearly states his code is slow on the MacPro while processing data and the google result was about CPU usage during idle.

Eric Schlegel's response gives a very clear hint what the OP is doing, and what he needs to change. If you don't see it, tough.

AdamN
Jul 28, 2009, 12:58 PM
Ask this question (with the version numbers mentioned as necessary above) on stackoverflow.com - they have many more solid programmers there who can help without being snarky.

Cromulent
Jul 28, 2009, 01:23 PM
Ask this question (with the version numbers mentioned as necessary above) on stackoverflow.com - they have many more solid programmers there who can help without being snarky.

I hardly ever see anyone being snarky on this forum. If you are referring to a comment I made which apparently has now been deleted, I was attempting to get the poster to provide an argument to substantiate their point which I feel was necessary given the rather ridiculous nature of the claim.

parapup
Jul 28, 2009, 01:41 PM
If you are finding that most of the (wall) time is spent in ml_set_interrupts_enabled - it means your program's threads are blocked on something - IO/Condition Var/something else.

Could it be possible that your program has a scalability glitch somewhere - the Mac Pro has more CPUs than the MacBook Pro and if you program creates more threads on the Mac Pro and get stuck most of the time due to synchronization or something else - that may explain the problem you are seeing.

That's just one possibility though - without actually seeing what your program does it is hard to tell what the problem is.

gnasher729
Jul 28, 2009, 03:44 PM
When you see a lot of time spent in 'ml_set_interrupts_enabled' it usually means that a use app is spinning across the kernel boundrary. When you trap into the kernel the system need to disable interrupts a couple of times, since shark still tries to sample these times it looks like we are spending a lot of time enabling interrupts, when actually shark is just blinkered and its eyes are uncovered when interrupts are re-enabled.


A good way to slow down an application is to call select () with a zero timeout as often as you can. Or check the event loop as often as possible, etc. Both are also good ways to call ml_set_interrupts_enabled a lot.

topcomer
Jul 29, 2009, 09:21 AM
Thanks all for the replies. I try to summarize the answers.

Why don't you type "ml_set_interrupts_enabled" into Google and see what happens?

I did it before posting and found that specific discussion but wasn't helpful for my case.

That aside, is your code written with parallel processing in mind?

No. However, my code is built on PETSc, which is a library designed for parallel computing. But nowhere in my code there is any use of parallel computations (at least not intentionally).

Which version of Mac OS X are on both? Is this a command line tool? Is it built with a make file?

They are both on OS X 10.5.7, the MacPro has XCode 3.1, the MacBook XCode 3.0. I use Eclipse as IDE with Makefile building.

I was attempting to get the poster to provide an argument to substantiate their point which I feel was necessary given the rather ridiculous nature of the claim

Why am I claiming something ridiculous? I'm sorry but we aren't born professors, so I must learn by mistakes.


Could it be possible that your program has a scalability glitch somewhere - the Mac Pro has more CPUs than the MacBook Pro and if you program creates more threads on the Mac Pro and get stuck most of the time due to synchronization or something else - that may explain the problem you are seeing.

That's just one possibility though - without actually seeing what your program does it is hard to tell what the problem is.

Is there any specific information I can provide?

When you see a lot of time spent in 'ml_set_interrupts_enabled' it usually means that a use app is spinning across the kernel boundrary. When you trap into the kernel the system need to disable interrupts a couple of times, since shark still tries to sample these times it looks like we are spending a lot of time enabling interrupts, when actually shark is just blinkered and its eyes are uncovered when interrupts are re-enabled.

So is it Shark that causes the slow down while profiling? Anyway, the code is slow also when I do not profile..

Cromulent
Jul 29, 2009, 10:09 AM
Why am I claiming something ridiculous? I'm sorry but we aren't born professors, so I must learn by mistakes.

I was referring to Darkwing, not you. Sorry I should have made that clearer.

darkwing
Jul 29, 2009, 12:24 PM
topcomer,

Can you use activity monitor and tell us the overall CPU usage on both machines for your program? Also can you tell us what the memory usage of the program looks like? Is swap usage increasing on the MacPro while your program runs?

Thanks.

gnasher729
Jul 29, 2009, 03:50 PM
So is it Shark that causes the slow down while profiling? Anyway, the code is slow also when I do not profile..

No, Shark isn't slowing down anything. Shark tries to sample about 1000 times per second which instruction is executing. When interrupts are disabled, it can't sample: It will wait until interrupts are enabled again and sample the next instruction after that. And that will be in 'ml_set_interrupts_enabled'. So your code doesn't spend 23% of its time in 'ml_set_interrupts_enabled'. It spends 23% of its time with interrupts disabled, and when they get enabled it attributes the time to 'ml_set_interrupts_enabled'.

Your code is calling something like crazy that eventually enables and disables interrupts, which is likely an enormous waste of time. Try using Shark to find out what calls 'ml_set_interrupts_enabled' so often and find out how to avoid that.

topcomer
Jul 30, 2009, 02:49 AM
Thanks for the explanation about interrupts. It would be difficult to locate who calls them though since I never heard about "interrupts" before in my life. I'll try my best.

topcomer,

Can you use activity monitor and tell us the overall CPU usage on both machines for your program? Also can you tell us what the memory usage of the program looks like? Is swap usage increasing on the MacPro while your program runs?

Thanks.

MacBook

CPU ~97% (on a single core I suppose)
2 Threads
No swap increase

MacPro

CPU ~160%
2 Threads
No swap increase

topcomer
Jul 30, 2009, 08:49 AM
I attached the Sampler (in Instruments) to my process and got that 58% is spent in the following callstack:

__semwait_signal
pthread_cond_wait
glvmDoWork
_pthread_start
thread_start

gnasher729
Jul 30, 2009, 11:17 AM
I attached the Sampler (in Instruments) to my process and got that 58% is spent in the following callstack:

__semwait_signal
pthread_cond_wait
glvmDoWork
_pthread_start
thread_start

I guess the idea is that you want to spend 99.9% in glvmDoWork, but not in pthread_cond_wait.

Check in your code how often glvmDoWork calls pthread_cond_wait. It probably does it much, much, much too often. Set a breakpoint on the call to pthread_cond_wait, then step over it, then have a look at how much useful work your code does before it calls pthread_cond_wait the next time.

Basically it looks like your code is spending all its time having one or more threads talking to each other, instead of doing anything useful.

lee1210
Jul 30, 2009, 11:22 AM
I haven't dealt with threading too much, so I don't have a too much to add to this... but what has struck me as strange here is that the MacBook has multiple cores, so it should be able to have both threads running just as well as the Mac Pro. Maybe it's that the Mac Pro is going to have plenty of cores available for running your code, while the MacBook might be less likely to have both cores free. Anyhow, just an observation, since there were no single-core MacBooks produced.

-Lee

gnasher729
Jul 30, 2009, 12:08 PM
I haven't dealt with threading too much, so I don't have a too much to add to this... but what has struck me as strange here is that the MacBook has multiple cores, so it should be able to have both threads running just as well as the Mac Pro. Maybe it's that the Mac Pro is going to have plenty of cores available for running your code, while the MacBook might be less likely to have both cores free. Anyhow, just an observation, since there were no single-core MacBooks produced.

Lee, consider a situation where one thread per core is created, but only one task. On the dual core MacBook two threads are talking to each other, one saying "I am busy", the other saying "I've got nothing to do". On the eight core MacPro eight threads are talking to each other, one saying "I am busy", and seven saying "I've got nothing to do". The seven idle threads will, if things are programmed badly enough, keep the one thread from doing any actual work, so the MacPro will end up slower.

It is hard to say what is actually happening, but your CPU time spent in functions like pthread_cond_wait should be very close to zero, not > 50%. So that is a sign that there is much too much communication between threads, and nothing else.