How much source code in OS X?

craig1410 · May 17, 2007

Hi,
Just curious - does anyone know how many lines of source code there are in OS X and how this has grown over the last few releases? Wikipedia suggests that Tiger has 86 million lines of source code - is this correct? If so then why does OS X have so much source code compared to Windows Vista which Wikipedia says has something like 50 million lines? I sort of expected a "less is more" situation here with more elegant code and less backwards compatibility.

Also, what language is most of OS X written in? Does this account for some of the difference in source code line count?

I remember being stunned when Windows 2000 went from 20m lines to 40m for Windows XP! At the same time Solaris went from something like 10m to 12m between version 7 and 8 IIRC.

Thanks,
Craig.

Scarlet Fever · May 17, 2007

i read that as well. I think the difference would be the included drivers for stuff, and the extra coding to get the system really secure.

one example is when Airport takes a second to find an internet connection after a few hours of sleep, once one page finds a connection, the other pages automatically load. It's a tiny feature, but its a couple of lines of code.

Eidorian · May 17, 2007

http://www.opensource.apple.com/darwinsource/

Time to start counting.

craig1410 · May 18, 2007

Hi,
I still find it surprising that there are so many lines of code compared to some other OS's. Do you know what the 86m lines covers? Does it cover everything from the kernel to all the bundled software? If so then would you agree that it's not fair to compare it directly with Windows Vista which doesn't bundle as much software with the O/S?

Any idea how many lines of code are in the base operating system? I mean the kernel plus all essential subsystems. I have read that you can shrink a base install of OS X down from 20GB or so to well under 10GB if you uninstall extra language packs and printer drivers.

If OS X is based on a unix code base can I assume it is predominantly written in C/C++?

Cheers,
Craig.

janey · May 18, 2007

line count is so...ineffective.

Ignoring traditional/recommended/whatever style for Java...
The same exact code:

1 line

Code:

public class Hi { public static void main (String[] args) { System.out.println("hello world"); }}

vs.

5 lines

Code:

public class Hi	{
	public static void main (String[] args)	{
		System.out.println("hello world");
	}
}

vs.
7 lines

Code:

public class Hi
{
	public static void main (String[] args)	
	{
		System.out.println("hello world");
	}
}

Hmmmm....you get the idea

craig1410 · May 18, 2007

Hi,
Yes I know this can be a problem but there are "standard" ways of counting lines as far as I know using tools built for this purpose. Of course it's difficult to determine which method has been used for OS X versus Vista etc.

That's part of the reason I am curious what language has been used for OS X.

Cheers,
Craig.

Scarlet Fever · May 18, 2007

craig1410 said:
That's part of the reason I am curious what language has been used for OS X.

well OS X is based on UNIX, so to do stuff from a command line level, you would be looking at UNIX

janey · May 18, 2007

craig1410 said:
Yes I know this can be a problem...

That still says almost zilch about anything...except lines of code. I know I dumbed it down in my example, but still. And there really isn't one universal method of counting LOC. There really isn't. Even if there was a good use for counting them.

like I honestly don't get it, how do you pass ANY judgment on any and all code by the amount of lines it has, except that it has that many more lines.

Even if you did, given my above example of a hello world in Java, what about Ruby's extremely concise succinct hello world, which does the same exact thing as the 1/5/7 liner programs in java:

Code:

puts "hello world"

Given the ruby example, why don't I just double the lines of code by having separate puts statements for 'hello' and 'world'? Oh wait, how about this equally as legitimate and insanely long way of doing a hello world in the same language..

Code:

#!/usr/bin/ruby
class String
    def hi
        puts self
    end
end
"hello world".hi

Would it honestly be fair? 1 line of code does the same thing you could also do in 7 lines which could be counted as 3 or 6 or whatever, depending on how you feel like going around defining a line of code (comments anyone?). Just because Apple is Apple doesn't mean the code is some flawless objet d'art...humans do it...humans also create autogenerators that aren't perfect either...

One can't even begin to compare Mac OS X to Vista using other methods, let alone this time-tested-to-be-BS method.

Cromulent · May 18, 2007

craig1410 said:
Any idea how many lines of code are in the base operating system? I mean the kernel plus all essential subsystems. I have read that you can shrink a base install of OS X down from 20GB or so to well under 10GB if you uninstall extra language packs and printer drivers.

3.6GBs in fact if I remember correctly.

ghall · May 18, 2007

I try not to try and wrap my head around computer code, I don't think I have the kind of brain to understand how the few lines of code that are allowing me to type this message, do what they do.

clevin · May 18, 2007

Scarlet Fever said:
and the extra coding to get the system really secure.

there is no such thing as "extra security code", the more codes, the more holes.

jdechko · May 18, 2007

janey said:
...Snip...

Gosh, what a buzzkill.

</joking>

But I wouldn't be surprised if there were more lines of code than 2000/XP/Vista. After all it is the "World's most advanced operating system", so there's bound to be a lot of code.

Queso · May 18, 2007

Wouldn't OSX be coded in a range of different languages, due to its modular nature and the amount of open source components that are included?

Although I would have thought a variant of C would account for most of it, since it's a UNIX.

Cromulent · May 18, 2007

It is very hard to tell where the OS ends and utilities begin within Unix operating systems due to the complete integration with one another. What exactly are they counting to get those 86 million lines?

craig1410 · May 18, 2007

janey said:
That still says almost zilch about anything...except lines of code. I know I dumbed it down in my example

Janey, It sounds like I've managed to wind you up and that was certainly not my intention I assure you. Time for a chill pill...

Please don't think I'm trying to pass any sort of judgement, I am merely trying to understand why OS X is so much better (faster on same hardware, more secure, prettier etc etc) than Vista and yet "appears" to have many more lines of code. This is doubly strange given that OS X has less baggage (ie. ground up rebuild from NeXTStep) whereas Windows has roots which stretch back to the 1980's.

Also, what really freaked me out with Windows is the way it went from 20m lines to 40m lines between 2000 and XP. That's a lot of code to churn out for a single increment. As I mentioned earlier, I recall Sun boasting about the fact Solaris had only gone from something like 10m lines to 12m lines between their corresponding releases. To me this made a lot of sense from a stability point of view.

I'm trying to get an idea of where this disparity lies - maybe Apple's developers do make use of lots of whitespace in the code with lots of comments and a more verbose style. Maybe OS X is written in assembly language and Windows is written in super-terse C++. Maybe the windows kernel alone is 40m lines of code (would explain a lot...

)
I'm just trying to understand it because I have in the past mocked Microsoft for bloat and I'm not quite sure how to defend OS X in this regard having seen the 86m lines figure.

I hope this clarifies my mission.
Thanks,
Craig.

mkrishnan · May 18, 2007

Craig, one thing to consider is how many server-level functions are in the standard version of OS X. OS X incorporates the Apache code, for instance.

Also, generally, bloat is a hard concept... if you look at open source programs, they're generally very code heavy in comparison to commercial ones. That's because they're written *carefully* and they follow the rules explicitly. MS doesn't trip up so much because they have lots of code. If their system engineering worked, having lots of code would not be problematic. The problem is that the code doesn't match the system model -- shortcuts and tricks make the code respond unpredictably in novel situations. Sometimes those shortcuts lead to less code rather than more. If you write long code that conforms exactly to the system-level specification of what it takes in, what it puts out, and what resources it uses, the code length really only comes in at the level of speed discussions.

elppa · May 18, 2007

Scarlet Fever said:
i read that as well. I think the difference would be the included drivers for stuff, and the extra coding to get the system really secure.

one example is when Airport takes a second to find an internet connection after a few hours of sleep, once one page finds a connection, the other pages automatically load. It's a tiny feature, but its a couple of lines of code.

Just two. I'd love to be able to code that efficiently

johnee · May 18, 2007

elppa said:
Just two. I'd love to be able to code that efficiently

but those 2 lines run the call stack up to 54532423 entries

(notice how i just used my left hand to hit some random numbers)

savar · May 18, 2007

dynamicv said:
Wouldn't OSX be coded in a range of different languages, due to its modular nature and the amount of open source components that are included?

Although I would have thought a variant of C would account for most of it, since it's a UNIX.

The kernel and most of the POSIX layer is written in C.

Naturally, most of the userland applications are either Objective-C, or will be rewritten to Obj-C in the future.

As others have mentioned, its really hard to measure lines of code. Especially for a POSIX OS, which has literally hundreds of applications that aren't strictly a part of the OS but without which nobody would want to use it. And you definitely cannot compare across different languages either. The same code in C or Perl would look totally different.

In general, LOC is just intended to convey a sense of scale.* A small shareware program might be 10-50K LOC, an enterprise application 1 million LOC, and an OS 50 million LOC. They are just very loose measurements.

*In some perverse dev shops, LOC is also used to rate programmer effectiveness. The higher the LOC, the higher the bonus.

craig1410 · May 18, 2007

mkrishnan said:
Craig, one thing to consider...

Yes that's a fair point, with open source, peer review certainly should make for cleaner more readable code which might take a few more lines than terse, opaque code. In fact a recent article I read by Dave Jewell was talking about the state of the Windows source code after it was leaked to the internet a few years ago. The conclusion that Dave came to was that it was completely "unmaintainable". Here is a link to the article in question - worth a read! http://www.regdeveloper.co.uk/2007/04/29/vista_end_dream/

Thanks,
Craig.

plinden · May 18, 2007

This is a nice Mac-related (not OS X, but still somewhat related) story about lines of code: Mac Folklore

craig1410 · May 18, 2007

plinden said:
This is a nice Mac-related (not OS X, but still somewhat related) story about lines of code: Mac Folklore

Brilliant!

Krevnik · May 18, 2007

craig1410 said:
Any idea how many lines of code are in the base operating system? I mean the kernel plus all essential subsystems. I have read that you can shrink a base install of OS X down from 20GB or so to well under 10GB if you uninstall extra language packs and printer drivers.

That depends on what you call a base operating system. Darwin is a base operating system, in that it is functional, but it isn't OS X.

For what it is worth... the default install on new machines for 10.4 seems to be about 15GB of space with all the bundled software. I wipe and reinstall OS X on the machine after purchasing, with an install no bigger than 4GB (excluding developer tools).

If OS X is based on a unix code base can I assume it is predominantly written in C/C++?

Yeah. The kernel has something that most OSes don't though: an embedded C++ runtime in kernel space. This is to support the C++ driver model which lets you reuse behavior in generic drivers for your specific ones. I would probably guess that eats up some code, but not enough to explain 30m lines.

slooksterPSV · May 18, 2007

What about commenting? That would account for code. If you made a program that had 5000 lines of commenting and only 100 lines of code it would compile to be very small compared to 5000 lines of code and 100 lines of commenting.

janey · May 18, 2007

clevin said:
there is no such thing as "extra security code", the more codes, the more holes.

Okay, there is a slight correlation in this regard purely because the larger it gets, the harder it becomes to maintain.

craig1410 said:
Please don't think I'm trying to pass any sort of judgement, I am merely trying to understand why OS X is so much better (faster on same hardware, more secure, prettier etc etc) than Vista and yet "appears" to have many more lines of code. This is doubly strange given that OS X has less baggage (ie. ground up rebuild from NeXTStep) whereas Windows has roots which stretch back to the 1980's.

craig, i don't think you get my point

You absolutely can't compare anything about OS X to Vista by only lines of code. No matter what you do. Won't mean anything at all in that respect. If you would like to, doing this is going down the wrong path.

And actually, Vista's based on NT, not DOS...so no, the roots stretch to the mid 90s or so, actually newer than nextstep.

craig1410 said:
Also, what really freaked me out with Windows is the way it went from 20m lines to 40m lines between 2000 and XP. That's a lot of code to churn out for a single increment. As I mentioned earlier, I recall Sun boasting about the fact Solaris had only gone from something like 10m lines to 12m lines between their corresponding releases. To me this made a lot of sense from a stability point of view.

I respectfully disagree. Not only did XP have a gazillion more versions than 2k (e.g. tablet pc, mce, home, pro, ia64, x64, embedded, FLP...), XP was meant for everything and everyone under the sun, particularly consumers with their myriad and seemingly infinte combinations of hardware and software configs, while 2k was business/server oriented (Me was the "home" version of 2k on msdos). Even if LOC actually meant something in this case, it would only be to show just how much 2k and xp are different. Sun can boast all they want about a trivial number, but again, can't be compared until you know things in excess detail, like what languages were used, what kind of method of counting LOC was used, how their particular development model worked...

craig1410 said:
I'm just trying to understand it because I have in the past mocked Microsoft for bloat and I'm not quite sure how to defend OS X in this regard having seen the 86m lines figure.

Unless OS X and Vista are identical, I repeat myself: this isn't the way to go. Even then, unless you had direct access to the entire codebases of both OSs and all the programs that come bundled with them and had the herculean amount of time to analyze both of them, you wouldn't know if the extra lines of code were there just because some noob of a programmer wrote them and didn't know better, or they're like that for optimization purposes...something like, say, loop unrolling where one single loop could go from 2 lines to 10 for the sake of using up less resources and speeding up performance.

savar said:
In general, LOC is just intended to convey a sense of scale....

Okay, another concession as long as you're not comparing a 1000-line app to a 2000-line app

savar said:
In some perverse dev shops, LOC is also used to rate programmer effectiveness. The higher the LOC, the higher the bonus.

Ridiculous

craig1410 said:
Yes that's a fair point, with open source, peer review certainly should make for cleaner more readable code which might take a few more lines than terse, opaque code.

Peer review etc. are not uncommon with closed source apps..if anything it's similar enough in a lot of places. Pair programming, unit testing,...everyone does it, doesn't matter if they're being paid for it or not.

That's not to say that sometimes one has to write ****** code anyway (hey we've all done it...) because there just isn't a nice way around it, so instead it'll be well documented (hopefully)

slooksterPSV said:
What about commenting? That would account for code. If you made a program that had 5000 lines of commenting and only 100 lines of code it would compile to be very small compared to 5000 lines of code and 100 lines of commenting.

As useful as good comments can be, I doubt they count towards most LOC counts cause the stuff added on to lots of the files, like licensing/author/documentation/info, is generally a supplement to the code

How much source code in OS X?

macrumors 65816

macrumors 68040

macrumors Penryn

macrumors 65816

macrumors 603

macrumors 65816

macrumors 68040

macrumors 603

macrumors 604

macrumors 68040

macrumors G3

macrumors 601

Suspended

macrumors 604

macrumors 65816

Moderator emeritus

macrumors 68040

macrumors 6502a

macrumors 68000

macrumors 65816

macrumors 601

macrumors 65816

macrumors 601

macrumors 68040

macrumors 603

Our Staff