Go Back   MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Reply
 
Thread Tools Search this Thread Display Modes
Old Jun 16, 2013, 12:45 PM   #1
ArtOfWarfare
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
NSRegularExpression to capture C function arguments?

I wrote this NSRegularExpression for detecting Core Graphics C functions:

Code:
NSString *regexString = @"([_a-zA-Z][_0-9a-zA-Z]*)\\(context(?:,(-?[0-9]*.?[0-9]+))*\\);";
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:regexString options:0 error:nil]
But it's not picking up individual arguments like I want.

For example, I want this string

Code:
CGContextMoveToPoint(context,0,100);
to have the following subranges captured:

Code:
CGContextMoveToPoint
0
100
But instead right now it picks up:

Code:
CGContextMoveToPoint
0,100
Why is it picking up that middle comma? I set it aside in its own non-capture group explicitly so it wouldn't be picked up and placed in any of the groups.

Here's my code using the regular expression:

Code:
[regex enumerateMatchesInString:codeString options:0 range:NSMakeRange(0, codeString.length) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
    for (NSUInteger i = 1, n = result.numberOfRanges; i < n; i++) {
        NSLog(@"%@", [codeString substringWithRange:[result rangeAtIndex:i]]);
    }
}];
(It starts with i = 1 because rangeAtIndex:0 always has the range of the entire matched string, whereas the 1 through numberOfRanges - 1 are supposed to have the matches for the individual capture groups.)
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)

Last edited by ArtOfWarfare; Jun 16, 2013 at 01:33 PM.
ArtOfWarfare is online now   0 Reply With Quote
Old Jun 16, 2013, 01:13 PM   #2
subsonix
macrumors 68040
 
Join Date: Feb 2008
Have you confirmed that the regex works as intended by itself?
subsonix is offline   0 Reply With Quote
Old Jun 16, 2013, 01:19 PM   #3
ArtOfWarfare
Thread Starter
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
Quote:
Originally Posted by subsonix View Post
Have you confirmed that the regex works as intended by itself?
How do you mean? When I give it a small text file full of code it's able to extract all of the CG function calls, it just messes up finding those individual arguments.
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)
ArtOfWarfare is online now   0 Reply With Quote
Old Jun 16, 2013, 01:25 PM   #4
subsonix
macrumors 68040
 
Join Date: Feb 2008
Quote:
Originally Posted by ArtOfWarfare View Post
How do you mean? When I give it a small text file full of code it's able to extract all of the CG function calls, it just messes up finding those individual arguments.
I looked at what you said here:

Quote:
Originally Posted by ArtOfWarfare View Post
to have the following subranges captured:

Code:
CGContextMoveToPoint
0
100
But instead right now it picks up:

Code:
CGContextMoveToPoint
0,100
And suspected the regex, but perhaps I missed your point.
subsonix is offline   0 Reply With Quote
Old Jun 16, 2013, 01:32 PM   #5
ArtOfWarfare
Thread Starter
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
Quote:
Originally Posted by subsonix View Post
I looked at what you said here:

And suspected the regex, but perhaps I missed your point.
I'm confused. The regex was in the first code block of my original post. It's the NSString that's passed into NSRegularExpression's initWithPattern method. It's picking up the function calls fine. The issue I'm having is with the individual capture groups it reports. I want it to separately report each argument, not give me a single string with all of the arguments in it.
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)
ArtOfWarfare is online now   0 Reply With Quote
Old Jun 16, 2013, 01:40 PM   #6
subsonix
macrumors 68040
 
Join Date: Feb 2008
Quote:
Originally Posted by ArtOfWarfare View Post
The issue I'm having is with the individual capture groups it reports. I want it to separately report each argument, not give me a single string with all of the arguments in it.
Yes, and perhaps that is down to the regular expression it self, I just asked if you have confirmed that it does break down the string into each individual argument.
subsonix is offline   0 Reply With Quote
Old Jun 16, 2013, 01:49 PM   #7
ArtOfWarfare
Thread Starter
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
Quote:
Originally Posted by subsonix View Post
Yes, and perhaps that is down to the regular expression it self, I just asked if you have confirmed that it does break down the string into each individual argument.
I wrote it thinking it would, but as far as I can tell it doesn't. If it did, everything would work and I wouldn't be asking about it.

Code:
(?:,(-?[0-9]*.?[0-9]+))*
I put the inner ()'s around so that it would be a separate capture group. The outer ()'s have the ?: prefix it so that , won't also get a capture group.
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)
ArtOfWarfare is online now   0 Reply With Quote
Old Jun 16, 2013, 02:10 PM   #8
subsonix
macrumors 68040
 
Join Date: Feb 2008
Quote:
Originally Posted by ArtOfWarfare View Post
I wrote it thinking it would, but as far as I can tell it doesn't. If it did, everything would work and I wouldn't be asking about it.
I usually test them for example in the terminal, there could also be a problem else where.

Quote:
Originally Posted by ArtOfWarfare View Post
Code:
(?:,(-?[0-9]*.?[0-9]+))*
I put the inner ()'s around so that it would be a separate capture group. The outer ()'s have the ?: prefix it so that , won't also get a capture group.
Wouldn't you need exactly one more group like the inner group separated by comma?
subsonix is offline   0 Reply With Quote
Old Jun 16, 2013, 06:44 PM   #9
JoshDC
macrumors regular
 
Join Date: Apr 2009
Looks like your first problem is with the dot here:

Code:
(-?[0-9]*.?[0-9]+)
I don't think it's matching what you think it does.

I'm also not convinced this will ever work. From a few small tests it looks like ICU's regular expressions (the engine behind NSRegularExpression) does not support variable capture groups. It's explained well – albeit for Java – in this Stack Overflow post.

I would strongly recommend looking at other approaches than regular expression for this kind of task. The expression you're using already requires a fair level of effort to understand, and doesn't handle all valid function calls (whitespace is an obvious omission). You're going to end up with an unwieldy regular expression that still doesn't accurately match all valid calls, while disallowing all invalid ones.

NSScanner might be worth a look, but you'll likely end up requiring a proper parser.
JoshDC is offline   0 Reply With Quote
Old Jun 17, 2013, 01:09 AM   #10
ArtOfWarfare
Thread Starter
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
Quote:
Originally Posted by JoshDC View Post
Looks like your first problem is with the dot here:

Code:
(-?[0-9]*.?[0-9]+)
I don't think it's matching what you think it does.
D'oh! I forgot . is a metacharacter in regex.

Quote:
I'm also not convinced this will ever work. From a few small tests it looks like ICU's regular expressions (the engine behind NSRegularExpression) does not support variable capture groups. It's explained well – albeit for Java – in this Stack Overflow post.

I would strongly recommend looking at other approaches than regular expression for this kind of task. The expression you're using already requires a fair level of effort to understand, and doesn't handle all valid function calls (whitespace is an obvious omission). You're going to end up with an unwieldy regular expression that still doesn't accurately match all valid calls, while disallowing all invalid ones.

NSScanner might be worth a look, but you'll likely end up requiring a proper parser.
I'm looking into NSScanner right now... I'm feeling like this should be simple enough a task for NSScanner + NSRegularExpression to make short work of and I shouldn't need anything like ParseKit (which, although I've found it's quite powerful, the documentation available for it is very sparse and often entirely incorrect.)
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)
ArtOfWarfare is online now   0 Reply With Quote
Old Jun 17, 2013, 03:17 PM   #11
chown33
macrumors 603
 
Join Date: Aug 2009
Quote:
Originally Posted by ArtOfWarfare View Post
D'oh! I forgot . is a metacharacter in regex.
Google search terms: regex buddy mac os


Quote:
I'm looking into NSScanner right now... I'm feeling like this should be simple enough a task for NSScanner + NSRegularExpression to make short work of and I shouldn't need anything like ParseKit (which, although I've found it's quite powerful, the documentation available for it is very sparse and often entirely incorrect.)
I don't understand why you'd build a parser from scratch. As already pointed out above, an existing C parser might be a better fit. For example, googling c interpreter finds this C interpreter:
http://code.google.com/p/picoc/
It's entirely in C, has a makefile, and is offered under the BSD license. Even if you don't use it, it can be an example of how to write a C parser.

Pretty much any parser that produces an AST would work. And finding one of those for C, even if it's in lex, yacc, bison, etc. would get you farther along than a scanner and reg-ex.

Heed this quote:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Also see:
http://www.codinghorror.com/blog/200...-problems.html

I've worked on a past projects that was started using reg-ex for parsing both C and assembly language. It was one of the most complex monstrosities I've ever seen, and its performance was abysmal. It would take anywhere from many seconds to several minutes to parse what seemed like relatively simple things, and it was still incomplete. The reg-ex was eventually scrapped and replaced with a proper parser (lex & yacc) and it became much simpler and far faster.
chown33 is offline   0 Reply With Quote
Old Jun 17, 2013, 05:08 PM   #12
ArtOfWarfare
Thread Starter
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
Quote:
Originally Posted by chown33 View Post
Google search terms: regex buddy mac os
That app doesn't exist in OS X, but I used the lite web version, regexpal.com. It doesn't highlight subexpressions making it of limited value for this particular issue.

I just found reggy which seems like it might be particularly helpful... also, it's free.

Quote:
I don't understand why you'd build a parser from scratch.
I thought this task was sufficiently simple that I didn't need a more robust parser than regex. But as I'm progressing, I'm sensing that I was wrong and that I do need something better like you mentioned.

Quote:
As already pointed out above, an existing C parser might be a better fit. For example, googling c interpreter finds this C interpreter:
http://code.google.com/p/picoc/
It's entirely in C, has a makefile, and is offered under the BSD license. Even if you don't use it, it can be an example of how to write a C parser.

Pretty much any parser that produces an AST would work. And finding one of those for C, even if it's in lex, yacc, bison, etc. would get you farther along than a scanner and reg-ex.
Thanks, I'll look into that.
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)
ArtOfWarfare is online now   0 Reply With Quote
Old Jun 17, 2013, 05:34 PM   #13
subsonix
macrumors 68040
 
Join Date: Feb 2008
Quote:
Originally Posted by chown33 View Post
Pretty much any parser that produces an AST would work. And finding one of those for C, even if it's in lex, yacc, bison, etc. would get you farther along than a scanner and reg-ex.
Syntax trees are powerful in that they represent an expression precisely including nesting, but also unnecessary complex if they are not strictly needed IMO.

Quote:
Originally Posted by ArtOfWarfare View Post
I thought this task was sufficiently simple that I didn't need a more robust parser than regex. But as I'm progressing, I'm sensing that I was wrong and that I do need something better like you mentioned.
You could also simplify the regex to match anything between the parenthesis, then making sense of it when you have the individual components. Adding support for variables and hex values in the arguments for example would mean that anything but C reserved symbols would be legal.
subsonix is offline   0 Reply With Quote
Old Jun 18, 2013, 12:17 AM   #14
ArtOfWarfare
Thread Starter
macrumors 603
 
ArtOfWarfare's Avatar
 
Join Date: Nov 2007
Send a message via Skype™ to ArtOfWarfare
First screenshot of my first working prototype (attached to this post)

I got it working with a tiny bit of regex (to remove comments and whitespace) coupled with this NSScanner code:

Code:
    NSMutableArray *commands = [[NSMutableArray alloc] init];
    NSScanner *scanner = [NSScanner scannerWithString:string];
	
	while (![scanner isAtEnd]) {
		NSString *commandName;
		[scanner scanUpToAndOverString:@"(" intoString:&commandName];
		NSMutableArray *arguments = [[NSMutableArray alloc] init];
		while (![scanner scanOverString:@")"]) {
			NSString *argument;
			[scanner scanOverString:@","];
			[scanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:@",)"] intoString:&argument];
			[arguments addObject:argument];
			if ([scanner isAtEnd]) {
				NSLog(@"Error! Command unfinished!");
				break;
			}
		}
		if (![scanner scanOverString:@";"]) {
			NSLog(@"Error! Command not terminated by semicolon!");
			break;
		}
        [commands addObject:[[CGLCommand alloc] initWithString:commandName
                                                  andArguments:arguments]];
	}
I also added these methods to NSScanner in a category... they're both just convenience methods so that I don't have to have NULLs and repeated strings all over my code.

Code:
- (BOOL)scanOverString:(NSString *)string
{
	return [self scanString:string intoString:NULL];
}

- (BOOL)scanUpToAndOverString:(NSString *)endString
				   intoString:(NSString **)string;
{
	[self scanUpToString:endString intoString:string];
	return [self scanOverString:endString];
}
I've tested it a bit and it's quite snappy with code of this length. I haven't tested how much code you have to type before it starts getting bogged down.
Attached Thumbnails
Click image for larger version

Name:	FirstScreenshot.png
Views:	10
Size:	41.2 KB
ID:	418204  
__________________
Don't tell me Macs don't last: 2007 iMac, 2007 Mac Mini, 2008 MacBook Air, all Vintage.
(iMac obsoletion: April 28, 2015, MBA: October 14, 2015, Mac Mini: March 9, 2016)

Last edited by ArtOfWarfare; Jun 18, 2013 at 12:23 AM.
ArtOfWarfare is online now   0 Reply With Quote

Reply
MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Similar Threads
thread Thread Starter Forum Replies Last Post
Screen Capture Software - 24 hour capture yomatta OS X Mavericks (10.9) 5 Oct 30, 2013 12:30 PM
New edition of Ondesoft Screen Capture for Mac with Menu Capture Function ondesoft Mac Applications and Mac App Store 0 Nov 27, 2012 03:54 AM
Ondesoft Screen Capture 1.12.9 with powerful scrolling capture function is released ondesoft Mac Applications and Mac App Store 1 Nov 25, 2012 10:24 PM
retina screen capture that can capture regions of websites? stevelam Mac Applications and Mac App Store 4 Nov 7, 2012 03:39 PM
Log and capture keeps creating multiple clips during capture... why? jbonante Digital Video 6 Jun 26, 2012 01:07 PM

Forum Jump

All times are GMT -5. The time now is 05:14 PM.

Mac Rumors | Mac | iPhone | iPhone Game Reviews | iPhone Apps

Mobile Version | Fixed | Fluid | Fluid HD
Copyright 2002-2013, MacRumors.com, LLC