Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Bill McEnaney

macrumors 6502
Original poster
Apr 29, 2010
295
0
Is the "wc" command buggy? If I run it on a three-line text file without a '\n' in the third line, the computer will say that the file contains only two lines. Just for fun, I wrote my own "wc" command that seems to work correctly partly because it calls fgets.
 
Last edited:
Is the "wc" command buggy? If I run it on a three-line text file without a '\n' in the third line, the computer will say that the file contains only two lines. Just for fun, I wrote my own "wc" command that seems to work correctly.

Have you checked to see if the man page covers this scenario? I'm on the iPad so I can't easily check.

B
 
Even if wc's author decided to make the machine stop counting lines when it sees the last '\n', my text file is still three lines long. So wc's line count is still incorrct. That's why I wonder why anyone would put a program into production when he know it'll print some incorrect results. My program adds 1 to the line counter each time the computer reads a new line with fgets. I don't need to look for new-line characters because fgets ensures that each nonempty string it returns will contain one.
 
Even if wc's author decided to make the machine stop counting lines when it sees the last '\n', my text file is still three lines long. So wc's line count is still incorrct. That's why I wonder why anyone would put a program into production when he know it'll print some incorrect results. My program adds 1 to the line counter each time the computer reads a new line with fgets. I don't need to look for new-line characters because fgets ensures that each nonempty string it returns will contain one.

You seem to be missing the fact that wc has been around since early unix and defines the line that way.

http://man.cat-v.org/unix-1st/1/wc

wc provides a count of the words, text lines, and roff control lines for each argument file.
A text line is a sequence of characters not beginning with "." and ended by a new--line.
A roff control line is a line beginning with ".".
A word is a sequence of characters bounded by the beginning of a line, by the end of a line, or by a blank or a tab.

Changing that behavior could be devastating to many utilities that depend on wc giving them the documented behavior.

B
 
I learn something every day. I've used Unix or derivatives since 1980 and this is the first time I found out that wc skiped roff control lines.:) Or maybe they removed that feature before 1980.
 
You seem to be missing the fact that wc has been around since early unix and defines the line that way.
I know that partly because for the past 10 years, I've used Unix almost exclusively, especially Solaris 8, i.e., SunOS. Unix is the operating system I know best. I've even read some of the kernel's source code, too. I bought an iMac because it's a Unix box and because I couldn't afford another Sun workstation.

But I seem to remember that the Solaris and GNU versions of wc would count my file's third line.
 
Last edited:
Changing that behavior could be devastating to many utilities that depend on wc giving them the documented behavior.
That's another good point. Just now, when I ran "head -3 words.txt" on my three-line file to see how many lines the computer would print, it printed two of them.
 
But I seem to remember that the Solaris and GNU versions of wc would count my file's third line.
These pages seem to remember differently:

EDIT: Having said that, i do agree the behavior can be confusing... so scripters just need to be aware of the situation and test their code as the need arises.
Code:
$ [COLOR="Blue"]printf 'x' > ~/Desktop/one-char[/COLOR]
$ [COLOR="Blue"]od -cb ~/Desktop/one-char[/COLOR]
0000000    x
          170
0000001
$ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char[/COLOR]
1

$ [COLOR="Blue"]awk '{x+=1} END {print x}' ~/Desktop/one-char[/COLOR]
1

$ [COLOR="Blue"]cat -n ~/Desktop/one-char |tail -1[/COLOR]
     1	x

$ [COLOR="Blue"]wc -l ~/Desktop/one-char[/COLOR]
       [COLOR="Red"]0[/COLOR] /Users/halito/Desktop/one-char

#

$ [COLOR="Blue"]printf 'x\n' > ~/Desktop/one-char-plus-newline[/COLOR]
$ [COLOR="Blue"]od -cb ~/Desktop/one-char-plus-newline[/COLOR]
0000000    x  \n
          170 012
0000002
$ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char-plus-newline[/COLOR]
1

$ [COLOR="Blue"]awk '{x+=1} END {print x}' ~/Desktop/one-char-plus-newline[/COLOR]
1

$ [COLOR="Blue"]cat -n ~/Desktop/one-char-plus-newline |tail -1[/COLOR]
     1	x

$ [COLOR="Blue"]wc -l ~/Desktop/one-char-plus-newline[/COLOR]
       1 /Users/halito/Desktop/one-char-plus-newline

#

$ [COLOR="Blue"]printf 'x\ny' > ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
$ [COLOR="Blue"]od -cb ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
0000000    x  \n   y
          170 012 171
0000003
$ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
2

$ [COLOR="Blue"]awk '{x+=1} END {print x}' ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
2

$ [COLOR="Blue"]cat -n ~/Desktop/one-char-plus-newline-plus-another-char |tail -1[/COLOR]
     2	y

$ [COLOR="Blue"]wc -l ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
       [COLOR="Red"]1[/COLOR] /Users/halito/Desktop/one-char-plus-newline-plus-another-char


Perhaps Perl is more consistent (and/or assiduous) when it comes to this sort of thing.


EDIT #2: Hmm, seems to me that grep is dead wrong above... it should agree with wc.

Why does it see a newline here?
Code:
$ [COLOR="Blue"]printf 'x' > ~/Desktop/one-char[/COLOR]
$ [COLOR="Blue"]grep -c $'x' ~/Desktop/one-char[/COLOR]
1
$ [COLOR="Blue"]grep -c $'a' ~/Desktop/one-char[/COLOR]
0
$ [COLOR="Blue"]grep -c $'\t' ~/Desktop/one-char[/COLOR]
0
$ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char[/COLOR]
1
Huh? It saw no "a" and no tab... but it found a newline. Where?
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.