macOS my C program runs slowly using only 10% CPU, why not more?

nino65 · Dec 12, 2007

Hi all,
I was a Linux user. I bought a macbook pro a month ago. I installed Leopard: every thing is fine... but
I wrote a C program that reads several data files and makes a lot of computations with that. Well in a Linux box with a intel core duo it used to last 10 to 20 seconds to end its job while in my new intel macbook pro it lasts several minutes.

I checked with top and Bigtop and the cpu usage is below 10% and I can't succeed in changing its cpu usage neither with renice. In the Linux the same process used up to 100% of one CPU.

I suspect there is a top cpu usage for the processes that are launched by users, isn't it?

Any idea? can anyone help me with some hints?

thanks in advance
nino65

hhlee · Dec 12, 2007

the bottleneck is in the hard disk access bandwidth, particularly if you're loading a bunch of smaller files. you could try to cat all the files together - that might force the system cache up just one large file.

nino65 · Dec 12, 2007

thanks for the fast answer, hhlee,
I had the same suspect and thinking to this I bought the 7200 rpm HD, nevertheless I can't believe the linux HD was so faster than this one to justify the differrence in performance.

I checked also with Shark and the most time consuming part was a loop in wich I do some calculation and not the file reading, but maybe shark does not report on the system calls ...

ChrisA · Dec 12, 2007

First off, I'm not surprised that Linux is faster. There has been an open competition for years anoung kernel hackers to squeeze performance out of the code. mac OS X was written with different goals. Also Mac OS runs on the mach microkernal. Every system call goes through that level. But 3x slower? My guess would have been 1.5x or there about.

Leopard has Dtrace. They got it from Sun. It is the perfect tool if you'd like to figure out what going on.
http://www.sun.com/bigadmin/content/dtrace/

If you want anyone here to help you will have to post small bits of C code

iSee · Dec 12, 2007

There isn't a default cap on CPU usage of user processes, so something else is bottlenecking your code.

Are you doing any I/O at all in your computation loops? Any reading or writing to file or console, etc?

If not, maybe it is memory allocation. Are you doing lots of allocation and freeing of memory in your computation loops? Are there large allocations (take up a large fraction of available RAM)?

Gelfin · Dec 12, 2007

nino65 said:
I suspect there is a top cpu usage for the processes that are launched by users, isn't it?

Definitely not, and this is easy to check:

Code:

int main(void)
{
   int i = 0;
   while(i >= 0) i++;
   return 0;
}

This produces 100%ish CPU.

From what you've described my first suspicion was a filesystem bottleneck too, but there are a number of things that might be causing your issue, and it's hard to know without seeing your code. Are you reading in a very large amount of data? Even if your primary FS reads are fine, there's always the chance you could be swapping yourself into the ground. How does the RAM on the MacBook Pro compare to the Linux box?

Catfish_Man · Dec 12, 2007

Shark is your best friend in these things, and it most definitely does report system calls. Learn to use its data mining tools and it should help you pinpoint the issue.

gnasher729 · Dec 12, 2007

nino65 said:
Any idea? can anyone help me with some hints?

Have you got Shark installed? Have you tried it? What does it say about your code?

nino65 · Dec 13, 2007

thanks a lot to everyone,
I'll try to resume here answers to each question

first of all, the code is several hundreds of lines so I cannot post it here, the linux box and my macbook pro have got both an intel core duo with 2 GB memory, I don't remember the model of the HD mounted by the linux machine but for sure it was nothing special.

- I used Shark and the time profile is (>3%)
11.3% 11.3% AutoCode buildForecasts
9.8% 9.8% libSystem.B.dylib __svfscanf_l
7.0% 7.0% mach_kernel ml_set_interrupts_enabled
4.2% 4.2% mach_kernel memcmp
4.2% 4.2% mach_kernel name_cache_unlock
3.5% 3.5% mach_kernel cache_lookup_path
3.5% 3.5% libSystem.B.dylib strtod_l$UNIX2003

- I verified, with the simple program suggested by Gelfin, that there is no cpu cup. By the way, I had to modify it a bit because it quickly exited with i=2147483648

- As suggested first by hhlee and then by iSee, Gelfin ... the bottleneck SURELY IS the file reading (it reads about 40,000 files of 100 rows x 2 cols each). I tried to improve a bit the code "caching" the files in a matrix hosting 1024 files at a time but it was not helping very much. I think I have to do what suggested by hhlee: joining all the files togheter (or in a few big ones)

- STILL the surprise (and the mistery) is that in the linux box it was not 3 but 30 TIMES FASTER!

thanks again to all of you

garethlewis2 · Dec 13, 2007

If you are coming from a Linux background moving into OS X, you are going to bump into a hell of alot of bottlenecks than you will ever expect.

OS X needs to make sure sure that when you read a file, it is actually sending that data correctly, so the file reading code, waits until all IO is finished before returning. In Linux, since it is not an industrial strength OS, the OS call returns immediately spawning a seperate thread to read the file, it couldn't give a rats arse if the data was read correctly. This is where Linux is trouncing OS X.

fimac · Dec 13, 2007

garethlewis2 said:
OS X needs to make sure sure that when you read a file, it is actually sending that data correctly, so the file reading code, waits until all IO is finished before returning. In Linux, since it is not an industrial strength OS, the OS call returns immediately spawning a seperate thread to read the file, it couldn't give a rats arse if the data was read correctly. This is where Linux is trouncing OS X.

Huh? Of course Linux is industial strength. Were you being sarcastic? Do you have a link? Are you confusing read with write?

This may be of interest to the OP: http://sekhon.berkeley.edu/macosx/.

stupidregister · Dec 13, 2007

garethlewis2 said:
In Linux, since it is not an industrial strength OS, the OS call returns immediately spawning a seperate thread to read the file, it couldn't give a rats arse if the data was read correctly.

Doesn't this imply the opposite? That is unless slowness is considered a strength.

iSee · Dec 13, 2007

nino65 said:
...(it reads about 40,000 files of 100 rows x 2 cols each)....

Opening 40,000 files is a lot!

Seek time on hard drives is measured in milliseconds. For with an average seek time of 8ms, and assuming 1 seek per file, it is going to take 320 seconds just to position the read head over the first sector of data of every file.

If the Linux box had a desktop hard drive in it, it is going to be a lot faster than the laptop hard drive (even at 7200 RPM) in the Macbook. Other components of the I/O stack are probably faster too.

But, as the OP points out, something else is going on.

8-9ms is a pretty representative average seek time for a desktop hard drive--laptop drives are usually at least a couple of ms slower (though I haven't looked in to this for a while).

So obviously the Linux box is doing a great job beating the average seek time, while the MB is not so much.

It's not necessarily OS X's fault though. You could try OS X with a different file system, for example. And it might be the hard drive's built-in cache that is doing such a good job.

If I had to guess, though, I'd "blame" OS X. Still, it's not part of the OS X development philosophy to optimize something like this (reading 40,000 small files). They concentrate on optimizing the overall user level responsiveness of the system.

Catfish_Man · Dec 13, 2007

Rather than get into pointless platform contention here, let's brainstorm solutions.

Have you considered mmap()ing the files? What are your read patterns for them?

eastcoastsurfer · Dec 13, 2007

garethlewis2 said:
In Linux, since it is not an industrial strength OS, the OS call returns immediately spawning a seperate thread to read the file, it couldn't give a rats arse if the data was read correctly. This is where Linux is trouncing OS X.

That's funny.

To the OP, are you doing any multi-threading? The BSD code that OSX is based on use to be notoriously slow when doing threading (IIRC, context switching was extremely slow). To the point where it was recommended you not use OSX for things like an Apache web server. Of course they may have fixed this issues by now.

MarkCollette · Dec 13, 2007

Can you use Shark to differentiate between the time to:
- Open each file
- Read the first byte in each file
- Subsequent reads on each file

That would let us know if the problem is seeking between files.

If it is some lack of proper operating system hard drive read queuing / read coalescing, then maybe we can reorganise the code a little. You could have a thread for opening and reading in files, and a thread for processing the read-in file's data. As soon as you've read in one file, you get that thread to read in the next. It will mostly be blocking, waiting for I/O, but that's alright, because your processing thread will be busy crunching the previously read in data. If that appears to help, then you can make it so that the reader thread constantly runs, setting its data on a queue, which the processing thread can grab work off of, at its own rate. You'll have to limit the queue size, so that the reading thread doesn't fill RAM up with unprocessed data, leaving no room to actually do the processing in.

ChrisA · Dec 13, 2007

garethlewis2 said:
If you are coming from a Linux background moving into OS X, you are going to bump into a hell of alot of bottlenecks than you will ever expect.

OS X needs to make sure sure that when you read a file, it is actually sending that data correctly, so the file reading code, waits until all IO is finished before returning. In Linux, since it is not an industrial strength OS, the OS call returns immediately spawning a seperate thread to read the file, it couldn't give a rats arse if the data was read correctly. This is where Linux is trouncing OS X.

A thread to read the file??? Are you sure. Show me the code. Where is the thread created?

You may be thinking about writing. Writting is different If you don't fsync() after a write you can't be sure the data is really ini the disk and yes in Linux there is a process (not thread) that flushes data back to the disk.

ChrisA · Dec 13, 2007

MarkCollette said:
Can you use Shark to differentiate between the time to:
- Open each file
- Read the first byte in each file
- Subsequent reads on each file.

The way I do this is to comment out sections of my code. Finally whaen I comment out a section and find that my execution time is reduced by 90% I know there the problem is. Then I try and see if commenting out even fewer line has the same effect. Of course there are more sophisticated tools but a few #if 0 ... #endif pairs are very quick to toss in.

MarkCollette · Dec 13, 2007

ChrisA said:
The way I do this is to comment out sections of my code. Finally whaen I comment out a section and find that my execution time is reduced by 90% I know there the problem is. Then I try and see if commenting out even fewer line has the same effect. Of course there are more sophisticated tools but a few #if 0 ... #endif pairs are very quick to toss in.

Yes, that's a pretty straightforward approach, when the steps are not dependent on each other, or if you can hard-code the outputs of a function.

The problem is, in my example of opening, reading the first byte, and reading the rest, that will isolate initial seeks versus straight read times, and there isn't really a way to measure the read without first doing the initial seek.

But, the reason I've found, for using profilers, is that they can show you performance bottlenecks that you've never anticipated. I remember this one time, programming in Java, it took a really long time to sort the rows of a table. It turns out that one specific object type was really slow at being compared, and I just had to wrap it in something that could cache some intermediate calculation. I would never have thought that could have been it. Or this other time, when parsing a file seemed fast enough, but I still profiled it. It turns out that in a very specific set of circumstances, some code that reads byte by byte, was directly reading from a file. All I had to do was change one line of code to use buffered I/O, and it became an order of magnitude faster to parse the whole file. It was buried so deeply in an area that I thought was already as fast as possible. Really, once your programs hit 100+ KB of source code, profilers are mandatory.

nino65 · Dec 14, 2007

thank you all for this interesting discussion.

I must say that I'm not a professional programmer, I'm a researcher (a physicist) who is used to write his own program to simulate physical/biological/chemical processes. So, even though I try to write efficient and smart code as much as I can, I am not at a level to know every single detail of C language or of system calls and so on. I learned how to use Shark just a couple of day ago and don't know how to collect information about what MarkCollette suggested

As I am a physicist I made an experiment that I think could give some hints on what's the central point.
I run a first instance of the program and when it was at more or less half of the way, I compiled and run another instance of the program (with other parameter values). Well, the second instance reached in few seconds the point at which the first one was and then they proceeded together (slowly) to the end.

I think that this means that it's all depending on the cache of the HD or how the OS is managing it or somthing like that, I don't have enough knowledge on that.

We do not have the definitive answer but I think we are very close.

thank you again

Eraserhead · Dec 14, 2007

Was the Linux Box a Desktop system?

toddburch · Dec 14, 2007

I'm writing some C++ now to process large files. I've found that when the input files are on a USB flash drive, performance is REALLY bad. When the file is on the HD, it is MUCH faster. The second time I run against the HD, it is VERY, VERY fast. This tells me OS X caching is to be leveraged whenever possible.

Todd

nino65 · Dec 14, 2007

yes, i forgot to say that the linux was a desktop (I don't remember if it was a Dell or an HP)

iSee · Dec 14, 2007

OS X does generally cache files that have been read. If you look at the Memory tab of the ~~system profiler~~ Activity Monitor, the blue part of the pie chart shows the memory currently being used for this.

If the OP has enough memory and will use the same file sets multiple times, the current program might be good enough--the first time it is run it will be slow, but subsequent passes involving the same files will be much faster.

nino65, you shouldn't need to overlap execution as you did in your test (assuming you have enough RAM that all the files get cached)--the OS X cache will leave files in its cache until it runs out of space (it will yeild its space to apps that allocate memory and, of course, new file access).

If that doesn't work, then you might want to try hhlee's suggestion of combining the source files into one large one if it doesn't complicate the code too much.

Also, I'm not sure if you are writing to files, you may want to try keeping the results in memory until processing is finished and writing it out all at once (and hopefully to one file and not many).

It would be interesting to hear what approach is successful...

nino65 · Dec 14, 2007

iSee said:
If you look at the Memory tab of the system profiler, the blue part of the pie chart shows the memory currently being used for this.

iSee, maybe I am too new to Mac OS but I don't see any pie chart in the system profiler. My one looks like this

macOS my C program runs slowly using only 10% CPU, why not more?

macrumors newbie

macrumors 6502

macrumors newbie

macrumors G5

macrumors 68040

macrumors 68020

macrumors 68030

Suspended

macrumors newbie

macrumors 6502

macrumors member

macrumors member

macrumors 68040

macrumors 68030

macrumors 6502a

macrumors 68000

macrumors G5

macrumors G5

macrumors 68000

macrumors newbie

macrumors G4

macrumors 6502a

macrumors newbie

macrumors 68040

macrumors newbie

Attachments

Our Staff