Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

petersz98

macrumors newbie
Original poster
Jul 1, 2009
4
0
I have downloaded the RegexKitLite program to use for Regular Expressions.

What I need to do is capture from a web page all the data between the first instance of a div with its class as definition.

<div class='definition'> capture data between these tags </div>

I've solved the problem of how to get the web page loaded into a string but cannot figure out the correct regular expression. I use the regular expression
(?<=<div class='definition'>).*?(?=</div>). This works when I test it on a test string in the program such as :-

NSString *test = @"gfhfghfg<div class='definition'>test5678</div>llkklkkl";
NSLog(@"Match: %@", [test stringByMatching:regex capture:0]);

Result is :- Match: test5678

When I use it on the web page it doesn't work for some reason:-

NSData *response = [NSURLConnection sendSynchronousRequest: theRequest returningResponse: &resp error: &err];
NSString * theString = [[NSString alloc] initWithData:response encoding:NSUTF8StringEncoding];

NSString* regex = @"<div class='definition'>(.*?)</div>";
NSLog(@"Match: %@", [theString stringByMatching:regex capture:0]);

Result is :- Match(null)

However if I change regex = @"<div class='definition'>(.*?)"

Result is :- Match: <div class='definition'>

which is not much use!
Could it be a problem with the encoding it is converted to UTF8.

Is anyone an expert on regular expressions or had a similar problem.
Thanks.:(
 

ChOas

macrumors regular
Nov 24, 2006
139
0
The Netherlands
I have downloaded the RegexKitLite program to use for Regular Expressions.

Is anyone an expert on regular expressions or had a similar problem.
Thanks.:(

I would REALLY advice against using regular expressions to parse html or xml. It is a bag of hurt you really don't want to open.

Would something like this fit your needs ? : http://touchtank.wordpress.com/

Otherwise, please show me your source document and I will have al look so your regex will work on your sample data.
 

petersz98

macrumors newbie
Original poster
Jul 1, 2009
4
0
Here's my code if you can fix it.

//NSString *FeedURL2=@"http://www.urbandictionary.com/iphone/search/random";
//NSString *WordToSearch = txtSearch.text;
NSString *WordToSearch = "dog";
NSString *msg = @"search?term=";

msg = [msg stringByAppendingString:WordToSearch];
NSString *URL = @"http://www.urbandictionary.com/iphone/";
NSString *FeedURL = [URL stringByAppendingString:msg];

NSURLRequest *theRequest = [NSURLRequest requestWithURL:[NSURL URLWithString:FeedURL]];
NSURLResponse *resp = nil;
NSError *err = nil;
NSData *response = [NSURLConnection sendSynchronousRequest: theRequest returningResponse: &resp error: &err];
NSString *theString = [[NSString alloc] initWithData:response encoding:NSUTF8StringEncoding];

if (theString != nil)
{

NSString* regex = @"(?<=<div class='definition'>).*?(?=</div>)";
//NSString *test = @"dsdsa<div class='definition'>test5678hgfgdcxz</div>fghfghhg<div class='definition'>222222</div>";
NSLog(@"Match: %@", [theString stringByMatching:regex capture:0]);
}
 

ChOas

macrumors regular
Nov 24, 2006
139
0
The Netherlands
Aha!

Here's my code if you can fix it.

Ah... well... your regex breaks because you have \n's in the text.

Since you seem to know where the stuff is you are looking for there is not even a need for a regex here. It is just token matching. And since I just found out about these nifty scanner things you can do this:

Code:
	if (theString != nil)
	{
		NSString *answer;
		NSScanner *myScanner = [NSScanner scannerWithString:theString];
		BOOL found = FALSE;
		while (! [myScanner isAtEnd]) {
		 if ([myScanner scanUpToString:@"<div class='definition'>" intoString:NULL]  &&
		 	 [myScanner scanString:@"<div class='definition'>" intoString:NULL]		&&
		     [myScanner scanUpToString:@"</div>" intoString:&answer]
			 ) {
				NSLog(@"Possible answer: %@", answer);
				found = TRUE;
				}
		}
			if (! found) NSLog(@"No definition for %@ found",WordToSearch);			
	}

Which prints:

2009-07-07 16:57:54.003 PeterTest[21663:20b] Possible answer: Not a <a href="/iphone/search?term=cat">cat</a>.
<br/>
<br/>Gotta love Blackadder.
<br/>
2009-07-07 16:57:54.004 PeterTest[21663:20b] Possible answer: a guy who hits and runs, as in he tells girls what they wanna hear to get in their panties and as soon as he gets the *****, he's gone. unable to commit to one woman. DOG.
2009-07-07 16:57:54.018 PeterTest[21663:20b] Possible answer: Man's best friend, next to <a href="/iphone/search?term=TV">TV</a>.


Which means you probably have some more parsing to do :) but tis should get you going, right ?
 

petersz98

macrumors newbie
Original poster
Jul 1, 2009
4
0
Ah... well... your regex breaks because you have \n's in the text.

Since you seem to know where the stuff is you are looking for there is not even a need for a regex here. It is just token matching. And since I just found out about these nifty scanner things you can do this:

Code:
	if (theString != nil)
	{
		NSString *answer;
		NSScanner *myScanner = [NSScanner scannerWithString:theString];
		BOOL found = FALSE;
		while (! [myScanner isAtEnd]) {
		 if ([myScanner scanUpToString:@"<div class='definition'>" intoString:NULL]  &&
		 	 [myScanner scanString:@"<div class='definition'>" intoString:NULL]		&&
		     [myScanner scanUpToString:@"</div>" intoString:&answer]
			 ) {
				NSLog(@"Possible answer: %@", answer);
				found = TRUE;
				}
		}
			if (! found) NSLog(@"No definition for %@ found",WordToSearch);			
	}

Which prints:




Thanks yes that's better than using regular expressions which is a nightmare!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.