PDA

View Full Version : What is a "line of code"?




Sydde
Mar 20, 2013, 03:21 PM
I read somewhere that Linux (the kernel) has upwards of 1.7 million lines of code, which got me to wondering if there is some kind of standard for what constitutes a line of code.
… // 3 lines of code

tempVal = someFn( argVal );
retVal = anotherFn( tempVal );
return retVal;

// vs 1 line of code

return anotherFn( someFn( argVal ) );

With modern compilers, the object binary code could well end up the same for both examples, perhaps even inline the routine, and in fact, the debug code might optimize out variables to the point that you cannot examine intermediate values in the debugger. It is handy to be able to write expansive, verbose code knowing that it will get streamlined by the compiler (I remember when this was not always the case). My question is whether there is some kind of broad consensus as to how lines if code are counted.

(I realize that Linux, in its original form, was written with heavy use of asm{ … }, so the count of lines was probably rather high relative to the contemporary reality.)



gnasher729
Mar 20, 2013, 03:39 PM
I read somewhere that Linux (the kernel) has upwards of 1.7 million lines of code, which got me to wondering if there is some kind of standard for what constitutes a line of code.
// 3 lines of code

tempVal = someFn( argVal );
retVal = anotherFn( tempVal );
return retVal;

// vs 1 line of code

return anotherFn( someFn( argVal ) );

With modern compilers, the object binary code could well end up the same for both examples, perhaps even inline the routine, and in fact, the debug code might optimize out variables to the point that you cannot examine intermediate values in the debugger. It is handy to be able to write expansive, verbose code knowing that it will get streamlined by the compiler (I remember when this was not always the case). My question is whether there is some kind of broad consensus as to how lines if code are counted.

(I realize that Linux, in its original form, was written with heavy use of asm{ }, so the count of lines was probably rather high relative to the contemporary reality.)

A quote of a previous boss: "Any not completely incompetent programmer can double their productivity according to any performance metrics, without any increase in productivity. "

It's four lines and two lines. Comments count when you count lines of code.

What _actually_ counts is how much the code achieves. That's the value. The actual code is not a benefit, it is cost. It needs to be examined when you look for bugs, it needs to be modified when specs change, however you look at it, every line of code written is cost.

That's worst when someone uses "copy + paste" to create code. Take a thousand line function, copy it, change the name, change two lines, and the inexperienced developer or manager thinks they just created lots of value. What the actually did is two lines worth of value, minus 1000 lines worth of cost.

mslide
Mar 20, 2013, 04:59 PM
My question is whether there is some kind of broad consensus as to how lines if code are counted.

No, there isn't. Some just simply count the number of lines in every source file, some don't count blank lines, some don't count comments, some just count the number of lines with a ";" in it (assuming the language is one that requires ";" at the end of lines like C), some count the total number of ";" (to count single lines with multiple statements as multiple lines), etc.

In the end, it doesn't really matter. That metric is only useful in the sense that it allows you to get a ballpark figure of about how must code there is. Honestly, the actual number is pretty useless.

When I'm doing this, I tend to just run every file through "wc -l" and be done with it. Even then, I only do it when I'm curious to see the order of magnitude. In other words, am I dealing with hundreds, thousands, millions, etc. I don't care about the actual number.

xStep
Mar 20, 2013, 07:18 PM
What _actually_ counts is how much the code achieves. That's the value. The actual code is not a benefit, it is cost. It needs to be examined when you look for bugs, it needs to be modified when specs change, however you look at it, every line of code written is cost.

A famous Apple story regarding lines of code.
Summary: It's hard to measure progress by lines of code : -2000 Lines Of Code (http://folklore.org/StoryView.py?story=Negative_2000_Lines_Of_Code.txt)

In the PBS documentary Triumph of the Nerds (http://en.wikipedia.org/wiki/Triumph_of_the_Nerds), Steve Balmer of Microsoft spoke about their partnership with I.B.M. The I.B.M. team would keep talking about KLOCS (Thousand Lines of Code) as a good thing. The more, the better. Steve thought this was nuts. Read his response at the Wikipedia entry (http://en.wikipedia.org/wiki/Source_lines_of_code). BTW, Triumph of the Nerds is a great documentary and the book it's based on digs much deeper into the stories behind Silicon Valley and the PC revolution.

As for a standard to count the lines of code. I've never heard of one in the 25 years I've been in the computing business. The only reason to count lines is to answer ones curiosity.

When I'm personally curious I may do what mslide mentioned or I may add some extra passes to remove blank lines and comment only lines. I think the last time I did this was several years ago to compare a system I had worked on for 9 years to the original code when I walked in the door. I was curious how much I added. Added, because we did add to the system. In some cases I had ripped out chunks of code to reduce individual source files. :D

lee1210
Mar 20, 2013, 07:54 PM
One of my greatest achievements working in software was culling thousands, maybe 10s of thousands of lines of unused code from a system. The president of the company told me this was not worthwhile, because there "are no bugs in code that doesn't run". The truth is that there are bugs, but you don't know if they'll ever be run or not, so you don't know if you should fix them. Our build was faster, our greps were faster, and it was easier to follow what was going on.

Other instances of reducing LoC involve removing duplication, reducing complexity, increasing modularity, etc. is a joy. One way to look at LoC is how expensive it is to maintain and enhance a system.

-Lee

ArtOfWarfare
Mar 20, 2013, 08:18 PM
The way I count how many lines of code there are is I scroll to the bottom and look at the gutter on the left side. Bam. Number of lines of code. I don't allow code to go beyond the 80th column.

I try to get between 50 and 500 lines of code in each file. If there are fewer, it suggests that the file could be merged with others, if there is much more then it's probably time to refactor into multiple files.

It helps ensure my code is easy to read. I used to allow myself to put thousands of lines in each file and to use however many columns, but I've come to realize that it makes reading the code and interpreting it suck.

Also, don't copy and paste blocks of code. If you're tempted to, it's probably a better idea to cut that block, paste it into its own function, and use that function multiple times.

Sydde
Mar 20, 2013, 08:45 PM
I try to get between 50 and 500 lines of code in each file. If there are fewer, it suggests that the file could be merged with others, if there is much more then it's probably time to refactor into multiple files.

Sorry, that approach does not make sense to me, primarily because I mostly use Objective-C. A file should be logically consistent, so adding functions from another file, IMHO, should only be done if there is a sort of theme that makes them go together. A file with 20 or 30 lines of code should be fine if the function(s) "belong together", making files make sense is more important to me than making them adhere to an arbitrary size standard. And, of course, in Objective-C, you sometimes subclass an object and only add one or two methods/overrides, but adding other stuff to the .m file would only facilitate confusion.

Breaking up large files, otoh, is usually a good idea, when possible.

ChrisA
Mar 21, 2013, 01:33 AM
I read somewhere that Linux (the kernel) has upwards of 1.7 million lines of code, which got me to wondering if there is some kind of standard for what constitutes a line of code.

As some one said, "The problem with standards is that there are so many of them."

Yes there are lots of ways to count code. As it turns out one is just as good as any other as long as you always count using the same method. It other words "LOC" is a realative measure. It should NEVER be used as an absolute unit.

For example you find it cost you $1,000 to write 100 lines. Count them any way you like. But the only use of counting is so that next time you might know who much it might cost to write (say) 125 lines. You could then guess $1,250.

Using as an absolute is pointless, So saying Linus has 1,000,000 lines is of no use untill to compare it with something else that was counted the same way

What I always did was eliminate comments then simply count semi-colons. That works as well as anything else. Some cound every end of line character. and other remove blank lines.

SOme count "code volume" and try to assign a complexity value to each line so "a = b;" counts as 1 but "if(a<b){ " counts higher.

I've found after doing this for years that none are very accurate and counting semicolons works well enough.

The only good motivation for this is cost estimates. We look at how many lines other projects used andwhat they cost per line. There are better ways. Google "COCOMO" It was a decent approach but sill only good enough for a rough order of magnitude. SOme of the projects I worked on also have 1M lines, some as small as 50K lines. Estimating was never good. What really happens is you write code until you use up the budget. If the budget was big the customer got some realy nice error handling and testing. If the budget was small he got some rather limited robustness.

ArtOfWarfare
Mar 21, 2013, 04:35 AM
Sorry, that approach does not make sense to me, primarily because I mostly use Objective-C. A file should be logically consistent, so adding functions from another file, IMHO, should only be done if there is a sort of theme that makes them go together. A file with 20 or 30 lines of code should be fine if the function(s) "belong together", making files make sense is more important to me than making them adhere to an arbitrary size standard. And, of course, in Objective-C, you sometimes subclass an object and only add one or two methods/overrides, but adding other stuff to the .m file would only facilitate confusion.

Breaking up large files, otoh, is usually a good idea, when possible.

Obviously I do whatever makes for the best code design - I wouldn't randomly merge things together - but I just consider it to be a code smell if you have dozens of files with just one or two functions in each of them. It's not necessarily a problem, but it's a simple indicator that my design choices may not be as good as they should be.

robbieduncan
Mar 21, 2013, 05:16 AM
We had a very short lived metric at work around lines of code. We had a web-app at the time that had lots of graphics. We counted each graphic as so many lines based on the old quote "a pictures is worth a thousand words" :D

firewood
Mar 21, 2013, 06:52 AM
In the big scheme of things, it's another meaningless management metric.

In the small scheme of things, for a developer or team of developers who use a very similar coding and commenting style, and stick to code that is very uniformly necessary to solving the problem (a big if), it's a slightly more objective and in some case more accurate measure than asking the average developer for their subjective opinion ("yup, it's 90% coded" when only 10% of the way toward the first alpha).

Some coders are decent or even good at estimation, most aren't. In that case a bad metric may be better than a worse metric.

MisterMe
Mar 21, 2013, 09:29 AM
Want to talk about lines of code? Talk about APL.

ArtOfWarfare
Mar 21, 2013, 11:23 AM
It seems to me that the resulting assembly code is much more worth talking about how long it is than the higher level code... high level code could change into a single assembly instruction or it could change into dozens of assembly instructions. More instructions means the code takes longer to execute (ignoring the variance in how long different assembly instructions take... I feel like that variance is a lot less than the variance between lines in a higher level language.)

Of course... loops... function calls... maybe the number of instructions that need to be executed is more worth talking about than the number of instructions.

gnasher729
Mar 21, 2013, 11:50 AM
It seems to me that the resulting assembly code is much more worth talking about how long it is than the higher level code... high level code could change into a single assembly instruction or it could change into dozens of assembly instructions. More instructions means the code takes longer to execute (ignoring the variance in how long different assembly instructions take... I feel like that variance is a lot less than the variance between lines in a higher level language.)

Of course... loops... function calls... maybe the number of instructions that need to be executed is more worth talking about than the number of instructions.

C++ with tons of inlined functions, where some developer puts the whole code into the header file, leading to an explosion of assembler code. Or template code. Every little sort operation in C++ generating code for the complete sort algorithm.

----------

One of my greatest achievements working in software was culling thousands, maybe 10s of thousands of lines of unused code from a system. The president of the company told me this was not worthwhile, because there "are no bugs in code that doesn't run". The truth is that there are bugs, but you don't know if they'll ever be run or not, so you don't know if you should fix them. Our build was faster, our greps were faster, and it was easier to follow what was going on.

Let's say there is a function that you think needs to change its behaviour. So you examine who calls it and if the callers are affected. And three of five callers are in dead code that is never executed. So you waste hours figuring out why this dead code uses this function in a weird way (and figure out it is because the code is dead, and wouldn't work anymore because it wasn't maintained). So you wasted your time on that dead code.

That's why lines of code cost you. All the time.

Brian Y
Mar 21, 2013, 11:53 AM
I've been working on a project here for 6 months. When I started working on it, it was 47,000 lines of code (just measured by wc -l).

Now, after 6 months of work, it's at 23,000. Just under half what it was. Performance is now about 1.5x what it was for the vast majority of it, and its maintainability has been increased drastically.

So that equals to -4000 lines of code per month, if you measure my productivity in LOC.

xStep
Mar 21, 2013, 10:56 PM
We had a very short lived metric at work around lines of code. We had a web-app at the time that had lots of graphics. We counted each graphic as so many lines based on the old quote "a pictures is worth a thousand words" :D

LOL! I'll have to keep that method in my back pocket. ;)

subsonix
Mar 22, 2013, 10:14 PM
It seems to me that the resulting assembly code is much more worth talking about how long it is than the higher level code...

Why? If that is what you are interested in then why not just talk about the size of the executable. Lines of code is just a rough estimate on the magnitude of a project.

firewood
Mar 23, 2013, 07:05 PM
I've been working on a project here for 6 months. When I started working on it, it was 47,000 lines of code (just measured by wc -l).

Now, after 6 months of work, it's at 23,000. Just under half what it was. Performance is now about 1.5x what it was for the vast majority of it, and its maintainability has been increased drastically.

So that equals to -4000 lines of code per month, if you measure my productivity in LOC.

Wrong measure. You need to average that with all the coding and months that had been done before you. It's like the standard 1 line of code per day being average productivity. What that really means is someone writing a few hundred lines before lunch, and then spend the rest of the year in meetings, doing specs, fixing bugs, throwing it away and rewriting it again because of a change in requirements, or a bug requiring a new architecture to fix, more meetings, more reviews, etc. Average those hundreds of LOC in the AM and you end up with 1.5 LOC/day by corporate project end of life.

Brian Y
Mar 24, 2013, 05:06 AM
Wrong measure. You need to average that with all the coding and months that had been done before you. It's like the standard 1 line of code per day being average productivity. What that really means is someone writing a few hundred lines before lunch, and then spend the rest of the year in meetings, doing specs, fixing bugs, throwing it away and rewriting it again because of a change in requirements, or a bug requiring a new architecture to fix, more meetings, more reviews, etc. Average those hundreds of LOC in the AM and you end up with 1.5 LOC/day by corporate project end of life.

Yep, I was trying to show how inadequate LOC is as a measure.

gnasher729
Mar 24, 2013, 05:19 AM
Why? If that is what you are interested in then why not just talk about the size of the executable. Lines of code is just a rough estimate on the magnitude of a project.

Size of the executable can grow by adding functionality, by hiring clumsy programmers who write inefficient code, by using language features that lead to code explosion. (Recent personal experience: By using graphics designers who can't use their tools and turn a simple icon into a 50 KB file).

subsonix
Mar 24, 2013, 08:51 AM
Size of the executable can grow by adding functionality, by hiring clumsy programmers who write inefficient code, by using language features that lead to code explosion. (Recent personal experience: By using graphics designers who can't use their tools and turn a simple icon into a 50 KB file).

Absolutely, or by using (or not) dynamic libraries, highly optimized code with loop unrolling or the fact that x86 uses a variable size instruction set. I simply put in question why assembly instruction would be more useful as a measure than lines of code for the purpose of getting a ballpark figure on project size.

firewood
Mar 24, 2013, 09:49 AM
t. I simply put in question why assembly instruction would be more useful as a measure than lines of code for the purpose of getting a ballpark figure on project size.

You could measure this against actual project size (say in 100's of man years). Big companies have, and used to report source LOC instead of binary as a (very approx.) scale of productivity.

tekboi
Mar 24, 2013, 11:29 PM
I was once told by a professor that:

"As a Real World Programmer. You won't have to write more than 3 lines of code"


is this true in any way?

Sydde
Mar 25, 2013, 12:16 AM
I was once told by a professor that:

"As a Real World Programmer. You won't have to write more than 3 lines of code"

is this true in any way?

Given that "As a Real World Programmer." is not a complete sentence, it is hard to assess. What I suspect the professor might have been trying to say is that there are only three distinct lines of code that make up the bulk of a program, that you will write them many thousands of times with slight variations.

A bit of an exaggeration, but not too far off from reality.

firewood
Mar 25, 2013, 01:59 AM
"As a Real World Programmer. You won't have to write more than 3 lines of code"


I think some large government or aerospace project took the actual number of lines of code in the finished product and divided by the man years of salary and contract time that they had paid for since the project started, and it came out somewhere between 3 and 10 lines of code per day. That's what happens when software teams have to spend 90%+ of their time in multiple meetings, doing specs and project charts and reviews and process documentation and etc.

gnasher729
Mar 25, 2013, 01:29 PM
You could measure this against actual project size (say in 100's of man years). Big companies have, and used to report source LOC instead of binary as a (very approx.) scale of productivity.

There is a quote from Steve Ballmer of all people making fun of IBM for doing exactly that on this thread. And "Any not completely incompetent programmer can double their productivity according to any performance metrics, without any increase in productivity. ", not by Ballmer, but still true.

There are plenty of horror stories of good developers with a brain dead manager, who write excellent code that falls short of the performance metric of the day, and then give up and start writing ****** code that exceeds everyone else's in performance metrics.

dejo
Mar 25, 2013, 01:33 PM
Want to talk about lines of code? Talk about APL.

Heh heh, I remember that. In university, we would fail our APL assignments sometimes if they weren't coded in a single line. And the nephew of the guy who invented that language (http://en.wikipedia.org/wiki/Kenneth_E._Iverson) was our TA.

notjustjay
Mar 25, 2013, 01:46 PM
Given that "As a Real World Programmer." is not a complete sentence, it is hard to assess. What I suspect the professor might have been trying to say is that there are only three distinct lines of code that make up the bulk of a program, that you will write them many thousands of times with slight variations.

A bit of an exaggeration, but not too far off from reality.

It is true that we seem to solve the same problems over and over again, just in different applications and in different languages.

firewood
Mar 25, 2013, 02:58 PM
And "Any not completely incompetent programmer can double their productivity according to any performance metrics, without any increase in productivity. ".

That's why it's not a performance metric. It's a measure of relative project size for large projects after the fact. If a programing team padded their work by 2x on 2 projects, they might still end up with about the same order of LOC on each.

softwareguy256
Apr 1, 2013, 10:40 PM
Sorry dudes as a professional and entrepreneur in the business making large amounts of cash, LOC matters.

I mean you could write a crappy 10k line toy app and sell 5000 copies a year for 1.99 before the 30% take and taxes, big freaking deal you made less than 10k a year.

If you want to scale and make the big bucks you're gonna need features. And thats going to take many lines of code. We're talking 500k to millions. Go ahead, cheat 10x by writing in the most verbose fashion, most of you will get tired and quit waaay before the 100k mark. It takes top talent to produce even 100k of code in 1 year that is coherent and stable enough to be built on for year 2, 3, 4, and on.

softwareguy256
Apr 1, 2013, 10:56 PM
Think about what you said, you're greatest achievement is getting paid to add no new features that will increase sales and help cover your salary. Money must grow on trees or this is a fast path to a pink slip.

One of my greatest achievements working in software was culling thousands, maybe 10s of thousands of lines of unused code from a system. The president of the company told me this was not worthwhile, because there "are no bugs in code that doesn't run". The truth is that there are bugs, but you don't know if they'll ever be run or not, so you don't know if you should fix them. Our build was faster, our greps were faster, and it was easier to follow what was going on.

Other instances of reducing LoC involve removing duplication, reducing complexity, increasing modularity, etc. is a joy. One way to look at LoC is how expensive it is to maintain and enhance a system.

-Lee

lee1210
Apr 1, 2013, 11:47 PM
Think about what you said, you're greatest achievement is getting paid to add no new features that will increase sales and help cover your salary. Money must grow on trees or this is a fast path to a pink slip.

Improving build performance, increasing dev productivity.
Reducing time to find and fix bugs.
Reducing surface area to verify when implementing features.
Clarifying and documenting true dependencies/structure.
Gaining insights into system structure, increasing effectiveness.
Cleaning up a big mess, making it easier for developers to understand system structure, decreasing training time for new devs.

If you're deciding who gets pink slips, and you hand them out to those that take initiative to solidify a system's foundation, improve tools and procedures, and gain deep system knowledge and expertise, I weep for the state of a large system under your tutelage after a few years.

Adding features is great. Making a system easier to add features to easily and safely is better. You may disagree, more power to you.

-Lee

Sander
Apr 2, 2013, 02:43 AM
My guess is that softwareguy256 works in a small company. Maybe even running his own one-man show. That's great; the paying customers obviously couldn't care less about the build efficiency or the amount of dead code; they pay for features.

If I started a new company, I'd probably hire a few guys like softwareguy256, who clearly are feature-oriented. Features sell.

However, if my company grows beyond, say, 10 people, and we're making software which still pays our rent 10 years from now, then I would be desperately looking for a lee1210: Someone who is not too proud to clean up other people's mess and makes sure we can still add "selling features" next year, and the years after that.

Features are an asset, code is a liability.

notjustjay
Apr 2, 2013, 12:44 PM
Sorry dudes as a professional and entrepreneur in the business making large amounts of cash, LOC matters.

I mean you could write a crappy 10k line toy app and sell 5000 copies a year for 1.99 before the 30% take and taxes, big freaking deal you made less than 10k a year.

If you want to scale and make the big bucks you're gonna need features. And thats going to take many lines of code. We're talking 500k to millions. Go ahead, cheat 10x by writing in the most verbose fashion, most of you will get tired and quit waaay before the 100k mark. It takes top talent to produce even 100k of code in 1 year that is coherent and stable enough to be built on for year 2, 3, 4, and on.

What? :confused:

You're talking about two different things here: the price which you choose to sell an app for to be profitable, and the features you include in your app.

If you want to write code professionally then absolutely you want to write good code with good features to sell your product. (Edit: originally I wrote "lots of features" but I realized this is not necessarily true; sometimes you just want the code to do one thing and do it in a really bombproof way).

LOC should not be the metric by which you determine whether you have been successful. It can however help you estimate the cost of the labour involved (tools such as SEER are designed for this) since obviously higher LOC means higher cost to develop and maintain the code. This can influence the price you sell the app at, but not necessarily.

ArtOfWarfare
Apr 3, 2013, 01:57 PM
Metric for measuring performance:

Features / (LOC * Development Time)

Although it's hard to define a feature and it's hard to define development time... IE, I want to include time spent relearning what the written code does months later in this time...

firewood
Apr 4, 2013, 02:47 PM
Metric for measuring performance:

Features / (LOC * Development Time)
...

Unless you end up with quick&dirty code that is too terse to be maintainable. See the obfuscated code contest for extreme examples.

All these measures are only for "on average" and "assuming everything else is roughly equal".

Sydde
Apr 6, 2013, 12:04 PM
I thought "features" was a synonym for "bugs"?

ArtOfWarfare
Apr 6, 2013, 04:46 PM
I thought "features" was a synonym for "bugs"?

""features"" is a synonym for "bugs". "features" is not.

Nightarchaon
Apr 7, 2013, 02:16 PM
SO the up shot is,

How long is a line of code, is = , How long is a piece of string.

MattInOz
Apr 8, 2013, 07:36 PM
""features"" is a synonym for "bugs". "features" is not.

Or this versions "bugs" are next versions "features".