PDA

View Full Version : Question to the pros about deconstructing NSStrings




neptunet
Dec 29, 2011, 10:34 PM
Hello,

I'm new to the Mac Programming forum but I've been reading MacRumors for years.

My question is about how to take apart an NSString. I'm reading the Apple reference docs and there's some confusing stuff about NSRange, Scanners, blocks, and not to mention that all of the "substring" methods start with "range".

So what I'd like to do is look at my NSString and count the number of non-numerical characters. Then, I'd like to look at the first character of my NSString, determine if it's a number or not, make note of that, and move on to the next and so on, to the end of the string.

What NSObjects/methods/scanners/ranges should I be looking at to do that?

Thank you guys :)



lee1210
Dec 29, 2011, 11:00 PM
characterAtIndex: is probably what you want.

-Lee

JoshDC
Dec 30, 2011, 10:46 AM
characterAtIndex: should be fine for your case, but Apple recommends (see WWDC 2011's "Advanced Text Processing" session) using the string enumeration method enumerateSubstringsInRange:options:usingBlock: with NSStringEnumerationByComposedCharacterSequences as one option. The main reason is that using characterAtIndex: requires extra effort to correctly handle composed character sequences, which I guess is what you mean by character.

kainjow
Dec 30, 2011, 12:46 PM
If you need backwards compatibility, use substringWithRange: in a loop, with the range's length set to 1.

Sydde
Dec 30, 2011, 01:39 PM
Seems to me that NSScanner would be the most efficient way to go. Just use the -scanUpToCharactersInSet: to find the first digit, then you could use one of the number-scanning methods (like -scanDecimal) to capture a numeric value, or -scanCharactersInSet: to collect the digits into a string. You could use -scanLocation with accumulating variables or a NSMutableIndexSet if you need to keep track of character counts. I have taken the attitude that the less you muck around directly with NSString contents, the better off you are.

neptunet
Dec 30, 2011, 06:11 PM
characterAtIndex: should be fine for your case, but Apple recommends (see WWDC 2011's "Advanced Text Processing" session) using the string enumeration method enumerateSubstringsInRange:options:usingBlock: with NSStringEnumerationByComposedCharacterSequences as one option. The main reason is that using characterAtIndex: requires extra effort to correctly handle composed character sequences, which I guess is what you mean by character.

Actually, what is a composed character sequence? In my case I want to check for colons in a string of digits. This may be a dumb question, but what exactly does that string enumeration method do?

If you need backwards compatibility, use substringWithRange: in a loop, with the range's length set to 1.

Ohh, a range of one. I'm still not sure exactly what that means, but would that be like using CharacterAtIndex:? What did you mean by backwards compatibility?

Seems to me that NSScanner would be the most efficient way to go. Just use the -scanUpToCharactersInSet: to find the first digit, then you could use one of the number-scanning methods (like -scanDecimal) to capture a numeric value, or -scanCharactersInSet: to collect the digits into a string. You could use -scanLocation with accumulating variables or a NSMutableIndexSet if you need to keep track of character counts. I have taken the attitude that the less you muck around directly with NSString contents, the better off you are.

Does scanUpToCharactersInSet mean it will read the characters sequentially (into another string?) up until it hits my colon?

jiminaus
Dec 30, 2011, 06:43 PM
Actually, what is a composed character sequence?

An accented character, for example , could be encoded in two different ways in Unicode. One way would be encode the precomposed character 00E1 (Latin Small Letter A with acute). Another way would use the composed character sequence of 0061 (Latin Small Letter A) followed by 0301 (Combining Acute Accent). They are visually the same, but they are not equal.

chown33
Dec 30, 2011, 06:46 PM
Actually, what is a composed character sequence?

Essentially, it's a single character, such as the letter 'A', along with all subsequent combining accents and other combining forms.

Here's a bunch of Frequently Pasted Links, though some are doubtless obsolete:
Minimum Knowledge of Charsets - Joel on Software
http://www.joelonsoftware.com/articles/Unicode.html
Additional links:
http://lists.apple.com/archives/cocoa-dev/2008/Aug/msg00940.html

UTF-16 for Processing:
http://www.unicode.org/notes/tn12/
Canonical Equivalence in Applications:
http://www.unicode.org/notes/tn5/
UAX #15: Unicode Normalization:
http://www.unicode.org/reports/tr15/

Sydde
Dec 30, 2011, 07:46 PM
Does scanUpToCharactersInSet mean it will read the characters sequentially (into another string?) up until it hits my colon?

If you use the default numerics NSCharacterSet and -scanCharactersInSet:, it will fill the string with the characters it finds in the set until it runs into a character that is not in the set. Then you can get to the next numeric digit with -scanUpToCharactersInSet:, back and forth until you run out of string. Read the docs on those methods and on NSCharacterSet.

JoshDC
Dec 30, 2011, 07:56 PM
On re-reading your request and seeing Sydde's suggestion, I think that may be the way to go.

The method I suggested will go through each composed character sequence and perform the block, for example:

[@"hello" enumerateSubstringsInRange:NSMakeRange(0, [@"hello" length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(@"%@",substring);
}];

Will print:

h
e
l
l
o

Then it'll be up to you to do something based on the value of substring. It's a little heavy-handed for a number of cases, I think this being one of them.

seepel
Dec 31, 2011, 06:55 AM
I think in a case like this to give you the best solution it would help to know what the end goal is. As you can see from previous posts there are a few ways to do this that have certain strengths. So what do you want to end up with at the end? And what do you want to do with it? So far I would vote for NSScanner as you're best bet.

neptunet
Mar 1, 2012, 03:06 PM
In reply to that last post,

Thanks for asking about that. I'm trying to work with video timecode. As far as I know the only way to input timecode is into a string with digits and colons. Once I have the string, I want to see what the timecode is. So I need to read the digits out of the string and discard the colons. When I need to display the timecode, I'll put the colons back in.

Reading over the suggestions so far, I realize that I don't know what a range or scanner is. Could someone be so kind as to explain? Especially the range. I've read the class references but I'm still clueless. Sounds like range is the actual memory range? Why would I ever want to know that just to figure out what kind of character it is? *head explodes*

I noticed in C# (unrelated I know) there are some nice properties like

char.IsDigit(char)
char.IsPunctuation(char)

Wow, how useful that would be right now! Is there anything that simple in Objective-C?

Thanks again for the wonderful help. :)

neptunet
Mar 1, 2012, 05:41 PM
Oh, am I talking about "tokenizing"? My numbers are delimited by colons. So I just want to get the numbers. It's funny how a little jargon can help.

Sydde
Mar 2, 2012, 12:14 AM
Well, if you are working with timecodes, it might be easier (and faster in the code) to just get the raw timecode data and convert it mathemagically to usable numbers. According to Wikipedia, timecodes are stored in BCD, meaning each byte is two decimal digits, which you can convert to an int or whatever with a little simple math
timeValueComponent = ( timeByte >> 4 ) * 10 + ( timeByte & 0x0F );
though the frame number might require more conversion. QuickTime can provide you with this raw data in a QTTime record, not sure how it works in AV Foundation.

hchung
Mar 2, 2012, 12:39 PM
Oh, am I talking about "tokenizing"? My numbers are delimited by colons. So I just want to get the numbers. It's funny how a little jargon can help.

Try this....

NSArray* timeElements = [timecodeString componentsSeparatedByString:@":"];

This gets you an array of strings from your timecode.
Then you'd get [timeElements objectAtIndex:0] for your hours, [timeElements objectAtIndex:1] for your minutes, and so on.

In the future, if you're not sure of how to ask how to do what you want to do, it helps if you describe the overall problem statement first because we might be able to provide easier ways to accomodate your task. :) "I have video timecode strings and want to break them up into pieces; they're delimited by colons like 11:22:33:44" versus "I want to read a string character by character, determine if number or punctuation, and then record that and move on" will gets you very different results.

PatrickCocoa
Mar 2, 2012, 06:57 PM
In Cocoa, there is almost always a method that does what you want. Your programming technique needs to change from:

"what sequence of atomic level operations can I string together to create a procedure that does what I want"

to

"where has Steve hidden the secret method in Cocoa that does what I want".