PDA

View Full Version : Help with RegEx expression




macfaninpdx
Apr 11, 2007, 12:48 PM
I am wondering if anyone can help me form a Unix RegEx command that will parse a text file. I am familiar with Unix commands, but the regular expression complexity always seems to evade my comprehension.

Here is what I want to do:
Read a text file (see file format below) and load into some javascript arrays. The text file looks like this:
A bunch of text that can be ignored is at the top of the file.
Blah blah blah.

----------------------------------------------------------------
Filepath: /full/path/to/a/filename.ext
Filename: filename
Title: filename "Some Title"
File Contents:
A bunch of other stuff I can igore.

----------------------------------------------------------------
Filepath: /full/path/to/a/filename2.ext
Filename: filename2
Title: filename "Some Other Title"
File Contents:
A bunch of other stuff I can igore.



So the end result will be that I will have three arrays. The "Red" text above will be the first element of the arrays, "Blue" will be second, etc. It will look like this:

array1[0] = /full/path/to/a/filename.ext;
array1[1] = /full/path/to/a/filename2.ext;

array2[0] = filename;
array2[1] = filename2;

array3[0] = "Some Title"
array3[1] = "Some Other Title"


I realize that I am in need of a lot of help, so I appreciate any advice you can give. I also realize that a small script, or at least a couple of commands will probably be necessary. I am pouring over the man pages now, and scouring the web for examples as you read this.

But if it is simple enough for someone to reply, I would tremendously appreciate it. :)

Thanks in advance!



savar
Apr 11, 2007, 02:27 PM
You want the script to generate Javascript code, or you want it to generate javascript arrays?

In either case, the regex are pretty simple. These are perl style regex:

/^Filepath: (.*)$/
/^Filename: (.*)$/
/^Title: (.*)$/

Explanation of the first one (other are two are very similar): Match a line that begins with the text "Filepath: " and capture the part between the ": " and the end of the line.

In perl, for example, after executing "/^Filepath: (.*)$/", the information on that line would be extracted into a variable called $1. I imagine there is something similar in Javascript too. I still don't quite understand what you're trying to do.

macfaninpdx
Apr 11, 2007, 02:40 PM
I still don't quite understand what you're trying to do.

It might make more sense if I give the bigger picture. I am creating a Widget. In the JavaScript code of the widget, I will be using a Unix shell command to read and parse a text file. The stdout will be stored in a Javascript variable, or a Javascript array, whichever is easier depending on the output.

So, in a Widget JavaScript I can do something like this:
var myresult = widget.system("/bin/egrep '<regex expression here>' ~/inputfile.txt").outputString;

I will look at the Perl expressions you listed. If the variable myresult (above) is a text string, I can split it into an array, which will be perfect.

Thanks.

macfaninpdx
Apr 11, 2007, 04:59 PM
OK, using the perl suggestion above, I came up with the following:
var myresult = widget.system("/usr/bin/egrep '^Filepath: (.*)$' inputfile.txt", null).outputString;
var myarray = myresult.split(/(\n)/);

Thanks for your help. The reason I was having so many problems in the first place was because the text file hac Mac line endings instead of Unix, so I had to use tr to convert it. Then my results were much easier to understand. ;)

One more question: The output above returns the entire line, including the "Filepath: " text. Is there a way I can return the line not including the "Filepath: "? I can always clean it up afterward using a replace, but I am always trying to learn more.

Thanks.