grep, regular expressions, HTML files

Discussion in 'Mac Programming' started by wrldwzrd89, Aug 3, 2006.

  1. macrumors G4

    wrldwzrd89

    Joined:
    Jun 6, 2003
    Location:
    Solon, OH
    #1
    I could use a little help here. Alright, let's say I have this here HTML file, and I want to extract some of the things in brackets (which are just placeholders - in the HTML files I'm actually extracting data from there will be actual content where the brackets are) using grep, then send that data to a file.

    I'm not entirely sure which regular expressions to use. Also, for a given piece of data I don't want the regular expression to return more than one match.

    Oh, ignore all the broken links and images in the linked-to HTML file - they're supposed to be broken.
     
  2. macrumors 68000

    savar

    Joined:
    Jun 6, 2003
    Location:
    District of Columbia
    #2
    First off, the best regex tutorial I've ever read:
    http://www.regular-expressions.info/tutorial.html

    What you're looking for is something like this:
    \[.*?\]

    This matches a pair of square brackets with or without stuff in the middle. The ? makes * ungreedy -- it will return the shortest match possible. The brackets I think must be escaped since they have special meaning in a regex.
     

Share This Page