Regular Expression problem grabbing data from html page

Discussion in 'iOS Programming' started by petersz98, Jul 7, 2009.

  1. petersz98 macrumors newbie

    Joined:
    Jul 1, 2009
    #1
    I have downloaded the RegexKitLite program to use for Regular Expressions.

    What I need to do is capture from a web page all the data between the first instance of a div with its class as definition.

    <div class='definition'> capture data between these tags </div>

    I've solved the problem of how to get the web page loaded into a string but cannot figure out the correct regular expression. I use the regular expression
    (?<=<div class='definition'>).*?(?=</div>). This works when I test it on a test string in the program such as :-

    NSString *test = @"gfhfghfg<div class='definition'>test5678</div>llkklkkl";
    NSLog(@"Match: %@", [test stringByMatching:regex capture:0]);

    Result is :- Match: test5678

    When I use it on the web page it doesn't work for some reason:-

    NSData *response = [NSURLConnection sendSynchronousRequest: theRequest returningResponse: &resp error: &err];
    NSString * theString = [[NSString alloc] initWithData:response encoding:NSUTF8StringEncoding];

    NSString* regex = @"<div class='definition'>(.*?)</div>";
    NSLog(@"Match: %@", [theString stringByMatching:regex capture:0]);

    Result is :- Match(null)

    However if I change regex = @"<div class='definition'>(.*?)"

    Result is :- Match: <div class='definition'>

    which is not much use!
    Could it be a problem with the encoding it is converted to UTF8.

    Is anyone an expert on regular expressions or had a similar problem.
    Thanks.:(
     
  2. ChOas macrumors regular

    Joined:
    Nov 24, 2006
    Location:
    The Netherlands
    #2
    I would REALLY advice against using regular expressions to parse html or xml. It is a bag of hurt you really don't want to open.

    Would something like this fit your needs ? : http://touchtank.wordpress.com/

    Otherwise, please show me your source document and I will have al look so your regex will work on your sample data.
     
  3. petersz98 thread starter macrumors newbie

    Joined:
    Jul 1, 2009
    #3
    Here's my code if you can fix it.

    //NSString *FeedURL2=@"http://www.urbandictionary.com/iphone/search/random";
    //NSString *WordToSearch = txtSearch.text;
    NSString *WordToSearch = "dog";
    NSString *msg = @"search?term=";

    msg = [msg stringByAppendingString:WordToSearch];
    NSString *URL = @"http://www.urbandictionary.com/iphone/";
    NSString *FeedURL = [URL stringByAppendingString:msg];

    NSURLRequest *theRequest = [NSURLRequest requestWithURL:[NSURL URLWithString:FeedURL]];
    NSURLResponse *resp = nil;
    NSError *err = nil;
    NSData *response = [NSURLConnection sendSynchronousRequest: theRequest returningResponse: &resp error: &err];
    NSString *theString = [[NSString alloc] initWithData:response encoding:NSUTF8StringEncoding];

    if (theString != nil)
    {

    NSString* regex = @"(?<=<div class='definition'>).*?(?=</div>)";
    //NSString *test = @"dsdsa<div class='definition'>test5678hgfgdcxz</div>fghfghhg<div class='definition'>222222</div>";
    NSLog(@"Match: %@", [theString stringByMatching:regex capture:0]);
    }
     
  4. ChOas macrumors regular

    Joined:
    Nov 24, 2006
    Location:
    The Netherlands
    #4
    Aha!

    Ah... well... your regex breaks because you have \n's in the text.

    Since you seem to know where the stuff is you are looking for there is not even a need for a regex here. It is just token matching. And since I just found out about these nifty scanner things you can do this:

    Code:
    	if (theString != nil)
    	{
    		NSString *answer;
    		NSScanner *myScanner = [NSScanner scannerWithString:theString];
    		BOOL found = FALSE;
    		while (! [myScanner isAtEnd]) {
    		 if ([myScanner scanUpToString:@"<div class='definition'>" intoString:NULL]  &&
    		 	 [myScanner scanString:@"<div class='definition'>" intoString:NULL]		&&
    		     [myScanner scanUpToString:@"</div>" intoString:&answer]
    			 ) {
    				NSLog(@"Possible answer: %@", answer);
    				found = TRUE;
    				}
    		}
    			if (! found) NSLog(@"No definition for %@ found",WordToSearch);			
    	} 
    
    Which prints:


    Which means you probably have some more parsing to do :) but tis should get you going, right ?
     
  5. petersz98 thread starter macrumors newbie

    Joined:
    Jul 1, 2009
    #5
     

Share This Page