macOS Perl Counting Question

fivetoadsloth · Feb 3, 2011

I'm relatively new to this, so if anyone could help or direct me to a useful source that would be appreciated.

I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.

I tried writing some code, but it's more of a jumble of things I found online and in a book then anything that actually makes sense.

Thanks a lot!

balamw · Feb 3, 2011

fivetoadsloth said:
I tried writing some code, but it's more of a jumble of things I found online and in a book then anything that actually makes sense.

If this is all you are doing and the pattern appears only once per line you can just use "grep -c".

It would still help to see what you have tried so far. Can you post something that comes close?

EDIT: does this help? http://www.rocketaware.com/perl/perlfaq4/How_can_I_count_the_number_of_oc.htm

B

lee1210 · Feb 3, 2011

I guess I would think you could use split:

Code:

my @words = split(/[ ,.]+/,$line);

then:

Code:

for each my $word (@words) {
  $counts{$word}++ if $matchList{$word};
}

where counts is a hash (%counts), and %matchList is a hash of the words to match.

Then:

Code:

print "The counts are: \n";
foreach my $key (keys %counts){
  print "$key : $counts{$key}\n";
}

You'd need to count each line, so that would have to be in a loop reading lines.

If you give us some sample text, and a bit more verbose of a description we may be able to better guide you.

-Lee

fivetoadsloth · Feb 3, 2011

balamw said:
If this is all you are doing and the pattern appears only once per line you can just use "grep -c".

It would still help to see what you have tried so far. Can you post something that comes close?

EDIT: does this help? http://www.rocketaware.com/perl/perlfaq4/How_can_I_count_the_number_of_oc.htm

B

Thanks a lot for your help.

I'll try to post something in a bit. I actually saw that site but wasn't sure how to implement it in my case. If I had a text file like:

1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc

(This continues over many lines, not just one.)

I'd like to print the number of 0s and 1s.

lee1210 · Feb 3, 2011

fivetoadsloth said:
Thanks a lot for your help.

I'll try to post something in a bit. I actually saw that site but wasn't sure how to implement it in my case. If I had a text file like:

1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc

(This continues over many lines, not just one.)

I'd like to print the number of 0s and 1s.

This is quick and dirty, and surely someone more adept at perl could do it in one line, but:

Code:

#!/usr/bin/perl
open (FILE, "oneszeros.txt");

my %counts;
while ($line = <FILE>) {
	chomp($line);
	my @words = split(/ /,$line);
	foreach my $word (@words) {
		$counts{$word}++;
	}	
}

foreach my $key (keys %counts) {
	print "$key : $counts{$key}\n";
}

close(FILE);

This assumes that you have a file oneszeros.txt.

-Lee

ChrisA · Feb 3, 2011

fivetoadsloth said:
I'm relatively new to this, so if anyone could help or direct me to a useful source that would be appreciated.

I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.

I tried writing some code, but it's more of a jumble of things I found online and in a book then anything that actually makes sense.

Thanks a lot!

In Perl this is easy. I will not write it for you but you can code for this outline:

0) The main data structure will be a "hash". The hash key as are the words you find in the input file. The "value" associated with each hash is the count of the number of time you've seen the word

Hint: interesting Perl syntax might look like this
my %wordlist;
$wordlist{ $word } += 1;
print $word was seen $wordlist{ $word } times so far;

This is an outline
1) open the file
2) Read a line then :"split" the line into "words"
3) for each word place it in the hash and increment the value in the hash. Note that assignment creates the hash if it is not there so don't test if it is already there.
4) When you are all done simply print the hash table. Maybe sort by value so most frequent word are first.

balamw · Feb 3, 2011

fivetoadsloth said:
I actually saw that site but wasn't sure how to implement it in my case.

Since the other two suggestions are word based, I figured I'd follow up on the site I linked using the character approach.

Code:

#!/usr/bin/perl

$total = 0;
while ($line = <>) {    
    $count = ($line =~ tr/1//);
    $total += $count;
	}
print "There are $total 1 charcters in the file\n";

This requires your to provide the input on stdin i.e.

Code:

cat oneszeros.txt | ./countones.pl

Not quite one line, but still readable and debuggable as $line and $count can be seen for each line.

B

dmi · Feb 3, 2011

fivetoadsloth said:
I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.

fivetoadsloth said:
If I had a text file like:
1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc

What are the patterns you are trying to count?
I don't see any 1.0, 2.0, 3.0, etc in the text file.

dmi · Feb 3, 2011

If the pattern you are trying to count is "any string of non-white space characters separated by whitespace" then you might use something like
perl -lne '$c{$_}++for/(\S+)/g;END{print "$_: $c{$_}" for keys %c}' file.txt

fivetoadsloth · Feb 3, 2011

dmi said:
What are the patterns you are trying to count?
I don't see any 1.0, 2.0, 3.0, etc in the text file.

I just used 0s and 1s for a quick example. THey could easily have been 0.0, 1.0, 2.0, etc. Sorry for any confusion!

balamw said:
Since the other two suggestions are word based, I figured I'd follow up on the site I linked using the character approach.

Code:

#!/usr/bin/perl $total = 0; while ($line = <>) { $count = ($line =~ tr/1//); $total += $count; } print "There are $total 1 charcters in the file\n";

This requires your to provide the input on stdin i.e.

Code:

cat oneszeros.txt | ./countones.pl

Not quite one line, but still readable and debuggable as $line and $count can be seen for each line.

B

Thanks a lot, this definitely helped.

dmi said:
If the pattern you are trying to count is "any string of non-white space characters separated by whitespace" then you might use something like
perl -lne '$c{$_}++for/(\S+)/g;END{print "$_: $c{$_}" for keys %c}' file.txt

Hm, thanks. I'm not sure if that is necessarily the pattern I am trying to count though.

dmi · Feb 3, 2011

balamw said:
This requires your to provide the input on stdin i.e.

Code:

cat oneszeros.txt | ./countones.pl

An easier way to provide the input on stdin without needing to invoke additional programs is

Code:

./countones.pl < oneszeros.txt

But you are not even required to provide the input on stdin
You could just name the input file(s) in ARGV

Code:

./countones.pl oneszeros.txt

dmi · Feb 3, 2011

fivetoadsloth said:
I'm not sure if that is necessarily the pattern I am trying to count though.

So what are you trying to count?

balamw · Feb 3, 2011

dmi said:
An easier way to provide the input ...

Thanks for that. Most of the Perl I ever wrote was used as a pipe between some other code and "mail", so I tend to make everything fit that model. (Also why I don't tend to use "<", but I do like the ARGV model in this case).

B

Bill McEnaney · Feb 3, 2011

I think Python would let me give a string's count method a regular expression. If I ever knew the correct syntax, I forgot it. But I'm thinking of something like this:

Code:

line.count([['A'..'Z'])

Maybe Perl's object-oriented features allow something like that? Perl can treat a text file's contents a one huge string. So if that language allows what I think we can do in Python, a one-line Perl program may teach the machine to count the kinds of pattern OP needs to count.

dmi · Feb 3, 2011

Bill McEnaney said:
Code:

line.count([['A'..'Z'])

That doesn't look much like a regular expression,
but do you want to write a perl object that takes an array ref of a list of strings to be interpreted as regular expression to count?

Bill McEnaney · Feb 4, 2011

dmi said:
That doesn't look much like a regular expression,
but do you want to write a perl object that takes an array ref of a list of strings to be interpreted as regular expression to count?

I'm only a beginner in regular expressions. So I'm not surprised that I wrote something that doesn't look much like one. But I want a way OP can count substrings that match the pattern a regular expression describes. Maybe I should have written "line.count([A-Z, a-z])". I want a regexp that will match on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".

balamw · Feb 4, 2011

Bill McEnaney said:
I want a regexp that will match on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".

Code:

-?\d+\.?\d*

One or more digits followed by zero or one periods followed by zero or more digits, I threw in the optional negative sign and you could wrap it in whitespaces....

B

Bill McEnaney · Feb 4, 2011

balamw said:
Code:

-?\d+\.?\d*

One or more digits followed by zero or one periods followed by zero or more digits, I threw in the optional negative sign and you could wrap it in whitespaces....

B

Great. Thanks, B. I hope that helps solve OP's problem.

dmi · Feb 5, 2011

Bill McEnaney said:
on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".

Code:

/ \d+ # one or more digits
 \.     # followed by a period
 \d+  # and another sequence of one or more digits
/x

which is not quite the same as
zero or one minus signs followed by One or more digits followed by zero or one periods followed by zero or more digits, although both match "123.456"

nor is it the same as
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new( qr([A-Z, a-z]) )->explain'
The regular expression:

(?-imsx:[A-Z, a-z])

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[A-Z, a-z] any character of: 'A' to 'Z', ',', ' ',
'a' to 'z'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Bill McEnaney said:
"line.count([A-Z, a-z])"

Bill McEnaney · Feb 5, 2011

dmi said:
(?-imsx:[A-Z, a-z])

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[A-Z, a-z] any character of: 'A' to 'Z', ',', ' ',
'a' to 'z'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Good stuff, but what do "i," "m," "s," and "x" each mean?

dmi · Feb 5, 2011

Bill McEnaney said:
Good stuff, but what do "i," "m," "s," and "x" each mean?

perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new( qr/[A-Z, a-z]/imsx )->explain'
The regular expression:

(?imsx:[A-Z, a-z])

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?imsx: group, but do not capture (case-insensitive)
(with ^ and $ matching start and end of
line) (with . matching \n) (disregarding
whitespace and comments):

dmi · Feb 5, 2011

see also
perldoc perlre
...
Modifiers

Matching operations can have various modifiers. Modifiers that relate
to the interpretation of the regular expression inside are listed
below. Modifiers that alter the way a regular expression is used by
Perl are detailed in "Regexp Quote-Like Operators" in perlop and "Gory
details of parsing quoted constructs" in perlop.

m Treat string as multiple lines. That is, change "^" and "$" from
matching the start or end of the string to matching the start or
end of any line anywhere within the string.

s Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.

Used together, as /ms, they let the "." match any character
whatsoever, while still allowing "^" and "$" to match,
respectively, just after and just before newlines within the
string.

i Do case-insensitive pattern matching.

If "use locale" is in effect, the case map is taken from the
current locale. See perllocale.

x Extend your pattern's legibility by permitting whitespace and
comments.
...

macOS Perl Counting Question

macrumors 65816

Moderator emeritus

macrumors 68040

macrumors 65816

macrumors 68040

macrumors G5

Moderator emeritus

macrumors regular

macrumors regular

macrumors 65816

macrumors regular

macrumors regular

Moderator emeritus

macrumors 6502

macrumors regular

macrumors 6502

Moderator emeritus

macrumors 6502

macrumors regular

macrumors 6502

macrumors regular

macrumors regular

Our Staff