Perl Counting Question

Discussion in 'Mac Programming' started by fivetoadsloth, Feb 3, 2011.

  1. fivetoadsloth macrumors 65816

    fivetoadsloth

    Joined:
    Aug 15, 2006
    #1
    I'm relatively new to this, so if anyone could help or direct me to a useful source that would be appreciated.

    I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.


    I tried writing some code, but it's more of a jumble of things I found online and in a book then anything that actually makes sense.


    Thanks a lot!
     
  2. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #2
    If this is all you are doing and the pattern appears only once per line you can just use "grep -c".

    It would still help to see what you have tried so far. Can you post something that comes close?

    EDIT: does this help? http://www.rocketaware.com/perl/perlfaq4/How_can_I_count_the_number_of_oc.htm

    B
     
  3. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #3
    I guess I would think you could use split:
    Code:
    my @words = split(/[ ,.]+/,$line);
    then:
    Code:
    for each my $word (@words) {
      $counts{$word}++ if $matchList{$word};
    }
    where counts is a hash (%counts), and %matchList is a hash of the words to match.

    Then:

    Code:
    print "The counts are: \n";
    foreach my $key (keys %counts){
      print "$key : $counts{$key}\n";
    }
    You'd need to count each line, so that would have to be in a loop reading lines.

    If you give us some sample text, and a bit more verbose of a description we may be able to better guide you.

    -Lee
     
  4. fivetoadsloth thread starter macrumors 65816

    fivetoadsloth

    Joined:
    Aug 15, 2006
    #4
    Thanks a lot for your help.

    I'll try to post something in a bit. I actually saw that site but wasn't sure how to implement it in my case. If I had a text file like:

    1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
    1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc

    (This continues over many lines, not just one.)

    I'd like to print the number of 0s and 1s.
     
  5. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #5
    This is quick and dirty, and surely someone more adept at perl could do it in one line, but:
    Code:
    #!/usr/bin/perl
    open (FILE, "oneszeros.txt");
    
    my %counts;
    while ($line = <FILE>) {
    	chomp($line);
    	my @words = split(/ /,$line);
    	foreach my $word (@words) {
    		$counts{$word}++;
    	}	
    }
    
    foreach my $key (keys %counts) {
    	print "$key : $counts{$key}\n";
    }
    
    close(FILE);
    
    This assumes that you have a file oneszeros.txt.

    -Lee
     
  6. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #6
    In Perl this is easy. I will not write it for you but you can code for this outline:

    0) The main data structure will be a "hash". The hash key as are the words you find in the input file. The "value" associated with each hash is the count of the number of time you've seen the word

    Hint: interesting Perl syntax might look like this
    my %wordlist;
    $wordlist{ $word } += 1;
    print $word was seen $wordlist{ $word } times so far;

    This is an outline
    1) open the file
    2) Read a line then :"split" the line into "words"
    3) for each word place it in the hash and increment the value in the hash. Note that assignment creates the hash if it is not there so don't test if it is already there.
    4) When you are all done simply print the hash table. Maybe sort by value so most frequent word are first.
     
  7. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #7
    Since the other two suggestions are word based, I figured I'd follow up on the site I linked using the character approach.

    Code:
    #!/usr/bin/perl
    
    $total = 0;
    while ($line = <>) {    
        $count = ($line =~ tr/1//);
        $total += $count;
    	}
    print "There are $total 1 charcters in the file\n";
    
    This requires your to provide the input on stdin i.e.

    Code:
    cat oneszeros.txt | ./countones.pl
    Not quite one line, but still readable and debuggable as $line and $count can be seen for each line.

    B
     
  8. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #8
    What are the patterns you are trying to count?
    I don't see any 1.0, 2.0, 3.0, etc in the text file.
     
  9. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #9
    If the pattern you are trying to count is "any string of non-white space characters separated by whitespace" then you might use something like
    perl -lne '$c{$_}++for/(\S+)/g;END{print "$_: $c{$_}" for keys %c}' file.txt
     
  10. fivetoadsloth thread starter macrumors 65816

    fivetoadsloth

    Joined:
    Aug 15, 2006
    #10
    I just used 0s and 1s for a quick example. THey could easily have been 0.0, 1.0, 2.0, etc. Sorry for any confusion!

    Thanks a lot, this definitely helped.

    Hm, thanks. I'm not sure if that is necessarily the pattern I am trying to count though.
     
  11. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #11
    An easier way to provide the input on stdin without needing to invoke additional programs is
    Code:
    ./countones.pl < oneszeros.txt
    But you are not even required to provide the input on stdin
    You could just name the input file(s) in ARGV
    Code:
    ./countones.pl oneszeros.txt
     
  12. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #12
    So what are you trying to count?
     
  13. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #13
    Thanks for that. Most of the Perl I ever wrote was used as a pipe between some other code and "mail", so I tend to make everything fit that model. (Also why I don't tend to use "<", but I do like the ARGV model in this case).

    B
     
  14. Bill McEnaney macrumors 6502

    Joined:
    Apr 29, 2010
    #14
    I think Python would let me give a string's count method a regular expression. If I ever knew the correct syntax, I forgot it. But I'm thinking of something like this:

    Code:
    line.count([['A'..'Z'])
    Maybe Perl's object-oriented features allow something like that? Perl can treat a text file's contents a one huge string. So if that language allows what I think we can do in Python, a one-line Perl program may teach the machine to count the kinds of pattern OP needs to count.
     
  15. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #15
    That doesn't look much like a regular expression,
    but do you want to write a perl object that takes an array ref of a list of strings to be interpreted as regular expression to count?
     
  16. Bill McEnaney, Feb 4, 2011
    Last edited: Feb 4, 2011

    Bill McEnaney macrumors 6502

    Joined:
    Apr 29, 2010
    #16
    I'm only a beginner in regular expressions. So I'm not surprised that I wrote something that doesn't look much like one. But I want a way OP can count substrings that match the pattern a regular expression describes. Maybe I should have written "line.count([A-Z, a-z])". I want a regexp that will match on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".
     
  17. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #17
    Code:
    -?\d+\.?\d*
    One or more digits followed by zero or one periods followed by zero or more digits, I threw in the optional negative sign and you could wrap it in whitespaces....

    B
     
  18. Bill McEnaney macrumors 6502

    Joined:
    Apr 29, 2010
    #18
    Great. Thanks, B. I hope that helps solve OP's problem.
     
  19. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #19
    Code:
    / \d+ # one or more digits
     \.     # followed by a period
     \d+  # and another sequence of one or more digits
    /x
    which is not quite the same as
    zero or one minus signs followed by One or more digits followed by zero or one periods followed by zero or more digits, although both match "123.456"

    nor is it the same as
    perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new( qr([A-Z, a-z]) )->explain'
    The regular expression:

    (?-imsx:[A-Z, a-z])

    matches as follows:

    NODE EXPLANATION
    ----------------------------------------------------------------------
    (?-imsx: group, but do not capture (case-sensitive)
    (with ^ and $ matching normally) (with . not
    matching \n) (matching whitespace and #
    normally):
    ----------------------------------------------------------------------
    [A-Z, a-z] any character of: 'A' to 'Z', ',', ' ',
    'a' to 'z'
    ----------------------------------------------------------------------
    ) end of grouping
    ----------------------------------------------------------------------
     
  20. Bill McEnaney macrumors 6502

    Joined:
    Apr 29, 2010
    #20
    Good stuff, but what do "i," "m," "s," and "x" each mean?
     
  21. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #21
    perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new( qr/[A-Z, a-z]/imsx )->explain'
    The regular expression:

    (?imsx:[A-Z, a-z])

    matches as follows:

    NODE EXPLANATION
    ----------------------------------------------------------------------
    (?imsx: group, but do not capture (case-insensitive)
    (with ^ and $ matching start and end of
    line) (with . matching \n) (disregarding
    whitespace and comments):
     
  22. dmi macrumors regular

    Joined:
    Dec 21, 2010
    #22
    see also
    perldoc perlre
    ...
    Modifiers

    Matching operations can have various modifiers. Modifiers that relate
    to the interpretation of the regular expression inside are listed
    below. Modifiers that alter the way a regular expression is used by
    Perl are detailed in "Regexp Quote-Like Operators" in perlop and "Gory
    details of parsing quoted constructs" in perlop.

    m Treat string as multiple lines. That is, change "^" and "$" from
    matching the start or end of the string to matching the start or
    end of any line anywhere within the string.

    s Treat string as single line. That is, change "." to match any
    character whatsoever, even a newline, which normally it would not
    match.

    Used together, as /ms, they let the "." match any character
    whatsoever, while still allowing "^" and "$" to match,
    respectively, just after and just before newlines within the
    string.

    i Do case-insensitive pattern matching.

    If "use locale" is in effect, the case map is taken from the
    current locale. See perllocale.

    x Extend your pattern's legibility by permitting whitespace and
    comments.
    ...
     

Share This Page