Perl script: where to add substitution?

Discussion in 'Mac Programming' started by reclusivemonkey, May 4, 2009.

  1. reclusivemonkey macrumors 6502

    reclusivemonkey

    Joined:
    Jun 2, 2008
    Location:
    Sowerby Bridge, West Yorkshire, UK
    #1
    Not sure whether this is the right forum to post in.

    I would also like to preface this with the fact I know nothing about perl, so please keep it basic!

    I have created the following script to download data from a website. It works fine but there are large gaps in the data (I want a nice neat table to go into Geektool on my desktop). I want to add
    Code:
    s/\t||\r||\n||\f//g
    which worked fine when I used a mixture of curl and perl but wherever I try it in the script it either doesn't work or breaks things. I've been googling and reading perl tuts for about a week now but I just can't get a hook in anywhere. I would be very grateful if anyone could explain where I need to put the substitution. Thanks in advance.

    Code:
    #!/usr/bin/perl 
    use strict;
    use warnings;
    
    use HTML::TableExtract;
    use WWW::Mechanize;
    
    my $url = "http://www.example.com";
    my $mech = WWW::Mechanize->new();
    
    $mech->agent_alias( 'Mac Safari' );
    $mech->get( $url );
    
    my $te = HTML::TableExtract->new( headers => [qw(Company Salary)] );
    $te->parse($mech->content);
    
    foreach my $row ($te->rows) {
            print join(' - ', @$row);
            }
    
    I tried posting this to perl.beginners on usenet but I guess it must be too basic as the mods don't seem to be adding it :-S
     
  2. kanenas macrumors newbie

    Joined:
    Jun 20, 2008
    #2
    Your posting is a little vague at spots. "Doesn't work or breaks things" isn't very descriptive. Also, seeing the problematic code (and sample input and output, along with desired output) would be more helpful in figuring out what's wrong.

    Use "|" for alternatives. "||" has an empty alternative between the two "|". It won't cause problems in this case but will be less efficient, as it will match between every non-[\t\n\r\f] characters. Since you're only matching against single characters, you can use a character class ("[]") rather than alernatives. You could even use the tr/// operator: tr/\t\r\n\f/ /s.

    Replacing with a single space is probably better than removing the characters.

    Try the following in your script (replace your loop with the fragment below). If it doesn't do what you want, post sample input & output and desired output. If there's anything about the below fragment that you don't understand, feel free to ask questions. Note that the data stored in $te->rows will be altered by the map. If you don't want to do this, assign $_ to a variable local to the block and alter that variable.

    Code:
    ...
    my @row;
    foreach my $row ($te->rows) {
        @row = map { s/[\t\r\n\f]+/ /g; s/^\s+|\s+$//g; $_; } @$row;
        print join(' - ', @row);
    }
     
  3. reclusivemonkey thread starter macrumors 6502

    reclusivemonkey

    Joined:
    Jun 2, 2008
    Location:
    Sowerby Bridge, West Yorkshire, UK
    #3
    Thanks, but the guys at Perl Monks already sorted me out.
     

Share This Page