A buggy "wc" command?

Discussion in 'macOS' started by Bill McEnaney, Jan 29, 2011.

  1. Bill McEnaney, Jan 29, 2011
    Last edited: Jan 29, 2011

    Bill McEnaney macrumors 6502

    Joined:
    Apr 29, 2010
    #1
    Is the "wc" command buggy? If I run it on a three-line text file without a '\n' in the third line, the computer will say that the file contains only two lines. Just for fun, I wrote my own "wc" command that seems to work correctly partly because it calls fgets.
     
  2. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #2
    Have you checked to see if the man page covers this scenario? I'm on the iPad so I can't easily check.

    B
     
  3. Nermal Moderator

    Nermal

    Staff Member

    Joined:
    Dec 7, 2002
    Location:
    New Zealand
    #3
    It does.

     
  4. Bill McEnaney thread starter macrumors 6502

    Joined:
    Apr 29, 2010
    #4
    Even if wc's author decided to make the machine stop counting lines when it sees the last '\n', my text file is still three lines long. So wc's line count is still incorrct. That's why I wonder why anyone would put a program into production when he know it'll print some incorrect results. My program adds 1 to the line counter each time the computer reads a new line with fgets. I don't need to look for new-line characters because fgets ensures that each nonempty string it returns will contain one.
     
  5. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #5
    You seem to be missing the fact that wc has been around since early unix and defines the line that way.

    http://man.cat-v.org/unix-1st/1/wc

    Changing that behavior could be devastating to many utilities that depend on wc giving them the documented behavior.

    B
     
  6. talmy macrumors 601

    talmy

    Joined:
    Oct 26, 2009
    Location:
    Oregon
    #6
    I learn something every day. I've used Unix or derivatives since 1980 and this is the first time I found out that wc skiped roff control lines.:) Or maybe they removed that feature before 1980.
     
  7. Bill McEnaney, Jan 29, 2011
    Last edited: Jan 29, 2011

    Bill McEnaney thread starter macrumors 6502

    Joined:
    Apr 29, 2010
    #7
    I know that partly because for the past 10 years, I've used Unix almost exclusively, especially Solaris 8, i.e., SunOS. Unix is the operating system I know best. I've even read some of the kernel's source code, too. I bought an iMac because it's a Unix box and because I couldn't afford another Sun workstation.

    But I seem to remember that the Solaris and GNU versions of wc would count my file's third line.
     
  8. Bill McEnaney thread starter macrumors 6502

    Joined:
    Apr 29, 2010
    #8
    That's another good point. Just now, when I ran "head -3 words.txt" on my three-line file to see how many lines the computer would print, it printed two of them.
     
  9. Hal Itosis, Jan 29, 2011
    Last edited: Jan 30, 2011

    Hal Itosis macrumors 6502a

    Hal Itosis

    Joined:
    Feb 20, 2010
    #9
    These pages seem to remember differently:

    EDIT: Having said that, i do agree the behavior can be confusing... so scripters just need to be aware of the situation and test their code as the need arises.
    Code:
    $ [COLOR="Blue"]printf 'x' > ~/Desktop/one-char[/COLOR]
    $ [COLOR="Blue"]od -cb ~/Desktop/one-char[/COLOR]
    0000000    x
              170
    0000001
    $ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char[/COLOR]
    1
    
    $ [COLOR="Blue"]awk '{x+=1} END {print x}' ~/Desktop/one-char[/COLOR]
    1
    
    $ [COLOR="Blue"]cat -n ~/Desktop/one-char |tail -1[/COLOR]
         1	x
    
    $ [COLOR="Blue"]wc -l ~/Desktop/one-char[/COLOR]
           [COLOR="Red"]0[/COLOR] /Users/halito/Desktop/one-char
    
    #
    
    $ [COLOR="Blue"]printf 'x\n' > ~/Desktop/one-char-plus-newline[/COLOR]
    $ [COLOR="Blue"]od -cb ~/Desktop/one-char-plus-newline[/COLOR]
    0000000    x  \n
              170 012
    0000002
    $ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char-plus-newline[/COLOR]
    1
    
    $ [COLOR="Blue"]awk '{x+=1} END {print x}' ~/Desktop/one-char-plus-newline[/COLOR]
    1
    
    $ [COLOR="Blue"]cat -n ~/Desktop/one-char-plus-newline |tail -1[/COLOR]
         1	x
    
    $ [COLOR="Blue"]wc -l ~/Desktop/one-char-plus-newline[/COLOR]
           1 /Users/halito/Desktop/one-char-plus-newline
    
    #
    
    $ [COLOR="Blue"]printf 'x\ny' > ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
    $ [COLOR="Blue"]od -cb ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
    0000000    x  \n   y
              170 012 171
    0000003
    $ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
    2
    
    $ [COLOR="Blue"]awk '{x+=1} END {print x}' ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
    2
    
    $ [COLOR="Blue"]cat -n ~/Desktop/one-char-plus-newline-plus-another-char |tail -1[/COLOR]
         2	y
    
    $ [COLOR="Blue"]wc -l ~/Desktop/one-char-plus-newline-plus-another-char[/COLOR]
           [COLOR="Red"]1[/COLOR] /Users/halito/Desktop/one-char-plus-newline-plus-another-char
    

    Perhaps Perl is more consistent (and/or assiduous) when it comes to this sort of thing.


    EDIT #2: Hmm, seems to me that grep is dead wrong above... it should agree with wc.

    Why does it see a newline here?
    Code:
    $ [COLOR="Blue"]printf 'x' > ~/Desktop/one-char[/COLOR]
    $ [COLOR="Blue"]grep -c $'x' ~/Desktop/one-char[/COLOR]
    1
    $ [COLOR="Blue"]grep -c $'a' ~/Desktop/one-char[/COLOR]
    0
    $ [COLOR="Blue"]grep -c $'\t' ~/Desktop/one-char[/COLOR]
    0
    $ [COLOR="Blue"]grep -c $'\n' ~/Desktop/one-char[/COLOR]
    1
    
    Huh? It saw no "a" and no tab... but it found a newline. Where?
     

Share This Page