grep, regular expressions, HTML files

Discussion in 'Mac Programming' started by wrldwzrd89, Aug 3, 2006.

  1. macrumors G4


    Jun 6, 2003
    Solon, OH
    I could use a little help here. Alright, let's say I have this here HTML file, and I want to extract some of the things in brackets (which are just placeholders - in the HTML files I'm actually extracting data from there will be actual content where the brackets are) using grep, then send that data to a file.

    I'm not entirely sure which regular expressions to use. Also, for a given piece of data I don't want the regular expression to return more than one match.

    Oh, ignore all the broken links and images in the linked-to HTML file - they're supposed to be broken.
  2. macrumors 68000


    Jun 6, 2003
    District of Columbia
    First off, the best regex tutorial I've ever read:

    What you're looking for is something like this:

    This matches a pair of square brackets with or without stuff in the middle. The ? makes * ungreedy -- it will return the shortest match possible. The brackets I think must be escaped since they have special meaning in a regex.

Share This Page