Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

reclusivemonkey

macrumors 6502
Original poster
Not sure whether this is the right forum to post in.

I would also like to preface this with the fact I know nothing about perl, so please keep it basic!

I have created the following script to download data from a website. It works fine but there are large gaps in the data (I want a nice neat table to go into Geektool on my desktop). I want to add
Code:
s/\t||\r||\n||\f//g
which worked fine when I used a mixture of curl and perl but wherever I try it in the script it either doesn't work or breaks things. I've been googling and reading perl tuts for about a week now but I just can't get a hook in anywhere. I would be very grateful if anyone could explain where I need to put the substitution. Thanks in advance.

Code:
#!/usr/bin/perl 
use strict;
use warnings;

use HTML::TableExtract;
use WWW::Mechanize;

my $url = "http://www.example.com";
my $mech = WWW::Mechanize->new();

$mech->agent_alias( 'Mac Safari' );
$mech->get( $url );

my $te = HTML::TableExtract->new( headers => [qw(Company Salary)] );
$te->parse($mech->content);

foreach my $row ($te->rows) {
        print join(' - ', @$row);
        }

I tried posting this to perl.beginners on usenet but I guess it must be too basic as the mods don't seem to be adding it :-S
 
Your posting is a little vague at spots. "Doesn't work or breaks things" isn't very descriptive. Also, seeing the problematic code (and sample input and output, along with desired output) would be more helpful in figuring out what's wrong.

Use "|" for alternatives. "||" has an empty alternative between the two "|". It won't cause problems in this case but will be less efficient, as it will match between every non-[\t\n\r\f] characters. Since you're only matching against single characters, you can use a character class ("[]") rather than alernatives. You could even use the tr/// operator: tr/\t\r\n\f/ /s.

Replacing with a single space is probably better than removing the characters.

Try the following in your script (replace your loop with the fragment below). If it doesn't do what you want, post sample input & output and desired output. If there's anything about the below fragment that you don't understand, feel free to ask questions. Note that the data stored in $te->rows will be altered by the map. If you don't want to do this, assign $_ to a variable local to the block and alter that variable.

Code:
...
my @row;
foreach my $row ($te->rows) {
    @row = map { s/[\t\r\n\f]+/ /g; s/^\s+|\s+$//g; $_; } @$row;
    print join(' - ', @row);
}
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.