Help with RegEx expression

Discussion in 'Mac Programming' started by macfaninpdx, Apr 11, 2007.

  1. macrumors regular

    Joined:
    Mar 6, 2007
    #1
    I am wondering if anyone can help me form a Unix RegEx command that will parse a text file. I am familiar with Unix commands, but the regular expression complexity always seems to evade my comprehension.

    Here is what I want to do:
    Read a text file (see file format below) and load into some javascript arrays. The text file looks like this:
    Code:
    A bunch of text that can be ignored is at the top of the file.
    Blah blah blah.
    
    ----------------------------------------------------------------
    Filepath: [COLOR="Red"]/full/path/to/a/filename.ext[/COLOR]
    Filename: [COLOR="red"]filename[/COLOR]
    Title: filename [COLOR="red"]"Some Title"[/COLOR]
    File Contents:
    A bunch of other stuff I can igore.
    
    ----------------------------------------------------------------
    Filepath: [COLOR="Blue"]/full/path/to/a/filename2.ext[/COLOR]
    Filename: [COLOR="Blue"]filename2[/COLOR]
    Title: filename [COLOR="Blue"]"Some Other Title"[/COLOR]
    File Contents:
    A bunch of other stuff I can igore.
    
    
    So the end result will be that I will have three arrays. The "Red" text above will be the first element of the arrays, "Blue" will be second, etc. It will look like this:
    Code:
    array1[0] = /full/path/to/a/filename.ext;
    array1[1] = /full/path/to/a/filename2.ext;
    
    array2[0] = filename;
    array2[1] = filename2;
    
    array3[0] = "Some Title"
    array3[1] = "Some Other Title"
    
    I realize that I am in need of a lot of help, so I appreciate any advice you can give. I also realize that a small script, or at least a couple of commands will probably be necessary. I am pouring over the man pages now, and scouring the web for examples as you read this.

    But if it is simple enough for someone to reply, I would tremendously appreciate it. :)

    Thanks in advance!
     
  2. macrumors 68000

    savar

    Joined:
    Jun 6, 2003
    Location:
    District of Columbia
    #2
    You want the script to generate Javascript code, or you want it to generate javascript arrays?

    In either case, the regex are pretty simple. These are perl style regex:

    /^Filepath: (.*)$/
    /^Filename: (.*)$/
    /^Title: (.*)$/

    Explanation of the first one (other are two are very similar): Match a line that begins with the text "Filepath: " and capture the part between the ": " and the end of the line.

    In perl, for example, after executing "/^Filepath: (.*)$/", the information on that line would be extracted into a variable called $1. I imagine there is something similar in Javascript too. I still don't quite understand what you're trying to do.
     
  3. thread starter macrumors regular

    Joined:
    Mar 6, 2007
    #3
    It might make more sense if I give the bigger picture. I am creating a Widget. In the JavaScript code of the widget, I will be using a Unix shell command to read and parse a text file. The stdout will be stored in a Javascript variable, or a Javascript array, whichever is easier depending on the output.

    So, in a Widget JavaScript I can do something like this:
    Code:
    var myresult = widget.system("/bin/egrep '<regex expression here>' ~/inputfile.txt").outputString;
    I will look at the Perl expressions you listed. If the variable myresult (above) is a text string, I can split it into an array, which will be perfect.

    Thanks.
     
  4. thread starter macrumors regular

    Joined:
    Mar 6, 2007
    #4
    Got it working

    OK, using the perl suggestion above, I came up with the following:
    Code:
    var myresult = widget.system("/usr/bin/egrep '^Filepath: (.*)$' inputfile.txt", null).outputString;
    var myarray = myresult.split(/(\n)/);
    Thanks for your help. The reason I was having so many problems in the first place was because the text file hac Mac line endings instead of Unix, so I had to use tr to convert it. Then my results were much easier to understand. ;)

    One more question: The output above returns the entire line, including the "Filepath: " text. Is there a way I can return the line not including the "Filepath: "? I can always clean it up afterward using a replace, but I am always trying to learn more.

    Thanks.
     

Share This Page