Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

fivetoadsloth

macrumors 65816
Original poster
Aug 15, 2006
1,035
0
I'm relatively new to this, so if anyone could help or direct me to a useful source that would be appreciated.

I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.


I tried writing some code, but it's more of a jumble of things I found online and in a book then anything that actually makes sense.


Thanks a lot!
 
I guess I would think you could use split:
Code:
my @words = split(/[ ,.]+/,$line);

then:
Code:
for each my $word (@words) {
  $counts{$word}++ if $matchList{$word};
}

where counts is a hash (%counts), and %matchList is a hash of the words to match.

Then:

Code:
print "The counts are: \n";
foreach my $key (keys %counts){
  print "$key : $counts{$key}\n";
}

You'd need to count each line, so that would have to be in a loop reading lines.

If you give us some sample text, and a bit more verbose of a description we may be able to better guide you.

-Lee
 
If this is all you are doing and the pattern appears only once per line you can just use "grep -c".

It would still help to see what you have tried so far. Can you post something that comes close?

EDIT: does this help? http://www.rocketaware.com/perl/perlfaq4/How_can_I_count_the_number_of_oc.htm

B
Thanks a lot for your help.

I'll try to post something in a bit. I actually saw that site but wasn't sure how to implement it in my case. If I had a text file like:

1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc

(This continues over many lines, not just one.)

I'd like to print the number of 0s and 1s.
 
Thanks a lot for your help.

I'll try to post something in a bit. I actually saw that site but wasn't sure how to implement it in my case. If I had a text file like:

1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc

(This continues over many lines, not just one.)

I'd like to print the number of 0s and 1s.

This is quick and dirty, and surely someone more adept at perl could do it in one line, but:
Code:
#!/usr/bin/perl
open (FILE, "oneszeros.txt");

my %counts;
while ($line = <FILE>) {
	chomp($line);
	my @words = split(/ /,$line);
	foreach my $word (@words) {
		$counts{$word}++;
	}	
}

foreach my $key (keys %counts) {
	print "$key : $counts{$key}\n";
}

close(FILE);

This assumes that you have a file oneszeros.txt.

-Lee
 
I'm relatively new to this, so if anyone could help or direct me to a useful source that would be appreciated.

I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.


I tried writing some code, but it's more of a jumble of things I found online and in a book then anything that actually makes sense.


Thanks a lot!

In Perl this is easy. I will not write it for you but you can code for this outline:

0) The main data structure will be a "hash". The hash key as are the words you find in the input file. The "value" associated with each hash is the count of the number of time you've seen the word

Hint: interesting Perl syntax might look like this
my %wordlist;
$wordlist{ $word } += 1;
print $word was seen $wordlist{ $word } times so far;

This is an outline
1) open the file
2) Read a line then :"split" the line into "words"
3) for each word place it in the hash and increment the value in the hash. Note that assignment creates the hash if it is not there so don't test if it is already there.
4) When you are all done simply print the hash table. Maybe sort by value so most frequent word are first.
 
I actually saw that site but wasn't sure how to implement it in my case.

Since the other two suggestions are word based, I figured I'd follow up on the site I linked using the character approach.

Code:
#!/usr/bin/perl

$total = 0;
while ($line = <>) {    
    $count = ($line =~ tr/1//);
    $total += $count;
	}
print "There are $total 1 charcters in the file\n";

This requires your to provide the input on stdin i.e.

Code:
cat oneszeros.txt | ./countones.pl

Not quite one line, but still readable and debuggable as $line and $count can be seen for each line.

B
 
I'm trying to read in a .txt file and then count the number of instances certain patterns occur, say 1.0, 2.0, 3.0, etc and then print out the number each occurred.
If I had a text file like:
1 0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1
1 1 0 1 0 1 0 0 1 1 0 0 1 1 etc
What are the patterns you are trying to count?
I don't see any 1.0, 2.0, 3.0, etc in the text file.
 
If the pattern you are trying to count is "any string of non-white space characters separated by whitespace" then you might use something like
perl -lne '$c{$_}++for/(\S+)/g;END{print "$_: $c{$_}" for keys %c}' file.txt
 
What are the patterns you are trying to count?
I don't see any 1.0, 2.0, 3.0, etc in the text file.

I just used 0s and 1s for a quick example. THey could easily have been 0.0, 1.0, 2.0, etc. Sorry for any confusion!

Since the other two suggestions are word based, I figured I'd follow up on the site I linked using the character approach.

Code:
#!/usr/bin/perl

$total = 0;
while ($line = <>) {    
    $count = ($line =~ tr/1//);
    $total += $count;
	}
print "There are $total 1 charcters in the file\n";

This requires your to provide the input on stdin i.e.

Code:
cat oneszeros.txt | ./countones.pl

Not quite one line, but still readable and debuggable as $line and $count can be seen for each line.

B

Thanks a lot, this definitely helped.

If the pattern you are trying to count is "any string of non-white space characters separated by whitespace" then you might use something like
perl -lne '$c{$_}++for/(\S+)/g;END{print "$_: $c{$_}" for keys %c}' file.txt

Hm, thanks. I'm not sure if that is necessarily the pattern I am trying to count though.
 
This requires your to provide the input on stdin i.e.

Code:
cat oneszeros.txt | ./countones.pl

An easier way to provide the input on stdin without needing to invoke additional programs is
Code:
./countones.pl < oneszeros.txt
But you are not even required to provide the input on stdin
You could just name the input file(s) in ARGV
Code:
./countones.pl oneszeros.txt
 
An easier way to provide the input ...

Thanks for that. Most of the Perl I ever wrote was used as a pipe between some other code and "mail", so I tend to make everything fit that model. (Also why I don't tend to use "<", but I do like the ARGV model in this case).

B
 
I think Python would let me give a string's count method a regular expression. If I ever knew the correct syntax, I forgot it. But I'm thinking of something like this:

Code:
line.count([['A'..'Z'])

Maybe Perl's object-oriented features allow something like that? Perl can treat a text file's contents a one huge string. So if that language allows what I think we can do in Python, a one-line Perl program may teach the machine to count the kinds of pattern OP needs to count.
 
Code:
line.count([['A'..'Z'])
That doesn't look much like a regular expression,
but do you want to write a perl object that takes an array ref of a list of strings to be interpreted as regular expression to count?
 
That doesn't look much like a regular expression,
but do you want to write a perl object that takes an array ref of a list of strings to be interpreted as regular expression to count?
I'm only a beginner in regular expressions. So I'm not surprised that I wrote something that doesn't look much like one. But I want a way OP can count substrings that match the pattern a regular expression describes. Maybe I should have written "line.count([A-Z, a-z])". I want a regexp that will match on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".
 
Last edited:
I want a regexp that will match on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".

Code:
-?\d+\.?\d*

One or more digits followed by zero or one periods followed by zero or more digits, I threw in the optional negative sign and you could wrap it in whitespaces....

B
 
Code:
-?\d+\.?\d*

One or more digits followed by zero or one periods followed by zero or more digits, I threw in the optional negative sign and you could wrap it in whitespaces....

B
Great. Thanks, B. I hope that helps solve OP's problem.
 
on or more digits followed by a period and another sequence of one or more digits. That expression should match, say, "123.456".
Code:
/ \d+ # one or more digits
 \.     # followed by a period
 \d+  # and another sequence of one or more digits
/x

which is not quite the same as
zero or one minus signs followed by One or more digits followed by zero or one periods followed by zero or more digits, although both match "123.456"

nor is it the same as
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new( qr([A-Z, a-z]) )->explain'
The regular expression:

(?-imsx:[A-Z, a-z])

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[A-Z, a-z] any character of: 'A' to 'Z', ',', ' ',
'a' to 'z'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
"line.count([A-Z, a-z])"
 
(?-imsx:[A-Z, a-z])

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[A-Z, a-z] any character of: 'A' to 'Z', ',', ' ',
'a' to 'z'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Good stuff, but what do "i," "m," "s," and "x" each mean?
 
Good stuff, but what do "i," "m," "s," and "x" each mean?
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new( qr/[A-Z, a-z]/imsx )->explain'
The regular expression:

(?imsx:[A-Z, a-z])

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?imsx: group, but do not capture (case-insensitive)
(with ^ and $ matching start and end of
line) (with . matching \n) (disregarding
whitespace and comments):
 
see also
perldoc perlre
...
Modifiers

Matching operations can have various modifiers. Modifiers that relate
to the interpretation of the regular expression inside are listed
below. Modifiers that alter the way a regular expression is used by
Perl are detailed in "Regexp Quote-Like Operators" in perlop and "Gory
details of parsing quoted constructs" in perlop.

m Treat string as multiple lines. That is, change "^" and "$" from
matching the start or end of the string to matching the start or
end of any line anywhere within the string.

s Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.

Used together, as /ms, they let the "." match any character
whatsoever, while still allowing "^" and "$" to match,
respectively, just after and just before newlines within the
string.

i Do case-insensitive pattern matching.

If "use locale" is in effect, the case map is taken from the
current locale. See perllocale.

x Extend your pattern's legibility by permitting whitespace and
comments.
...
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.