Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

wrldwzrd89

macrumors G5
Original poster
Jun 6, 2003
12,110
77
Solon, OH
I could use a little help here. Alright, let's say I have this here HTML file, and I want to extract some of the things in brackets (which are just placeholders - in the HTML files I'm actually extracting data from there will be actual content where the brackets are) using grep, then send that data to a file.

I'm not entirely sure which regular expressions to use. Also, for a given piece of data I don't want the regular expression to return more than one match.

Oh, ignore all the broken links and images in the linked-to HTML file - they're supposed to be broken.
 
wrldwzrd89 said:
I could use a little help here. Alright, let's say I have this here HTML file, and I want to extract some of the things in brackets (which are just placeholders - in the HTML files I'm actually extracting data from there will be actual content where the brackets are) using grep, then send that data to a file.

I'm not entirely sure which regular expressions to use. Also, for a given piece of data I don't want the regular expression to return more than one match.

Oh, ignore all the broken links and images in the linked-to HTML file - they're supposed to be broken.

First off, the best regex tutorial I've ever read:
http://www.regular-expressions.info/tutorial.html

What you're looking for is something like this:
\[.*?\]

This matches a pair of square brackets with or without stuff in the middle. The ? makes * ungreedy -- it will return the shortest match possible. The brackets I think must be escaped since they have special meaning in a regex.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.