Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

LtRammstein

macrumors 6502a
Original poster
Jun 20, 2006
570
0
Denver, CO
Hey all,

I'm trying to teach myself proper Perl parsing methods, but running into issues. Especially on this one.

I am trying to parse content from a website that is stored in $contents. I am specifically looking for a series of 6 digits. I have an array filled with 6 digit elements (example 990146). What I want to do is is parse the $contents variable line by line, and pull out the lines that contain that 6 digit number.

What functions/methods/routines should/can I use to do this?

EDIT: this is what I have so far:

Code:
145 sub ContentParser($$)
146 {
147     my @ContentArray = "";
148     my $content = $_[0];
149     my $model = $_[1];
150     while(<$_[0]>)
151     {
152         chomp($_[0]);
153         if($_[0] =~ /($model)/)
154         {
155             print "Item model is: $model\tItem is: $_\n";
156         }
157     }
158 }

Its output is:

Code:
Item model is: 991779	Item is: <!DOCTYPE
Item model is: 991779	Item is: html
Item model is: 991779	Item is: PUBLIC
Item model is: 991779	Item is: -//W3C//DTD XHTML 1.0 Transitional//EN
Item model is: 991779	Item is: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
Item model is: 991779	Item is: <html
Item model is: 991779	Item is: xmlns=http://www.w3.org/1999/xhtml>
Item model is: 991779	Item is: <head
^C

I am quite confused on what's going on with this. Any help is greatly appreciated.
 

LtRammstein

macrumors 6502a
Original poster
Jun 20, 2006
570
0
Denver, CO
I've done the regex a little bit, but it's either being too greedy or not greedy enough.

I'll try the split function and see what it can do. Thanks.
 

robbieduncan

Moderator emeritus
Jul 24, 2002
25,611
893
Harrogate
Well I suggested split as you said that the entire content was in a scalar variable. But your while loop is kind of doing that for you. But kind of not as it seems to be splitting on white space...
 

LtRammstein

macrumors 6502a
Original poster
Jun 20, 2006
570
0
Denver, CO
Thanks for the help!

I think I got it working now.

Code:
Code:
145 sub ContentParser($$)
146 {
147     my @ContentArray = "";
148     my $content = $_[0];
149     my $model = $_[1];
150     
151     @ContentArray = split(/\n/,$content);
152     
153     foreach $line (@ContentArray)
154     {
155         if($line =~ /($model)/)
156         {
157             print "Model: $model\t Line: $line\n";
158         }
159     }
160     
161 #   while(<$_[0]>)
162 #   {   
163 #       chomp($_[0]);
164 #       $line = split(/\n/);
165 #       if($line =~ /($model)/)
166 #       {   
167 #           print "Item model is: $model\tItem is: $_\n";
168 #       }
169 #   }
170 }
 

ChOas

macrumors regular
Nov 24, 2006
139
0
The Netherlands
You could also do something like this:

Code:
sub getContentLines {
 my ($content,$model) = @_;
 return grep /$model/, split /\n/, $content;
};

Which you can then use in your main program like:

Code:
my @contentLines = getContentLines($yourPage, '991779');

But then you might aswell just do this and skip the whole subroutine:

Code:
my @contentLines = grep /$model/, split /\n/, $yourPage;

And if you are looking for multiple models:

Code:
 my $model = join '|', ('991779','991780','991781');
 my @contentLines = grep /$model/, split /\n/, $yourPage;

Loads of ways :D
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.