PDA

View Full Version : creative text file parsing, revisited




dj.mooky
Jul 8, 2009, 04:02 PM
This refferences the post from http://forums.macrumors.com/showthread.php?t=734883

So I have successfully used reg ex's to accomplish my tasks, and I give a sincere thank you shout out to all those that helped me, but I have run into a new issue.

I'm not seeing many things on google about this, that would serve my purposes anyway however on to the punchline

I have a text file that I am parsing for information. I am adding a new format that I am going to support this time, however the sly foxes have written unicode characters into their txt files.

I have been breaking down the text files by the 4 carriage returns "\n". So every "\n\n\n\n" I break the string down into an array, from there I break it down every single return to get the line-by-line information, and generally parse from there. However this is not working because of a unicode character... specificially "\Ufeff" Which seems to be a character that dictates to use standard spacing something something text.

This code comes at the beginning of every line, so I am getting errors, and unable to break the string into multiple strings based upon "\n" because it assumes that "\n\Ufeff" is a single character, and will not break it off at the \n without taking the \Ufeff along with it. Furthermore, when I attempt to

anArray = [aString componentsSeparatedByString:@"\n\n\n\n\Ufeff"]

it tosses an error saying that \U is an "incomplete universal character name \Ufeff"

Has anyone dealt with anything like this, and come across a fancy way to remove this specific unicode character? It is really turning into a thorn in my side right now.

Thanks in advance



dj.mooky
Jul 8, 2009, 07:01 PM
Update:

So I've discovered that my issue is that the code \Ufeff only exists in UTF16, and if I go into text edit, and save-as to a UTF8 file, it works perfectly. So while I do want to figure out how to parse a UTF16 file without doing anything, does anyone know of a good way to downgrade the string once you import it from a file?

Thanks

HiRez
Jul 8, 2009, 08:11 PM
So while I do want to figure out how to parse a UTF16 file without doing anything, does anyone know of a good way to downgrade the string once you import it from a file?

You can convert it using a number of different NSString methods, such as dataUsingEncoding:allowLossyConversion: followed by initWithData:encoding:

dj.mooky
Jul 8, 2009, 09:36 PM
You can convert it using a number of different NSString methods, such as dataUsingEncoding:allowLossyConversion: followed by initWithData:encoding:

Excellent thanks... off to the races

I don't suppose I could get you to give me an example of that code in action? I am getting nsstrings that throw selector errors for initWithBytes:length:encoding: type of things.... I know this has to be simpler than I am making it... push come to shove I may just write an apple script to save them as UTF8 files in text edit....

But i'm convinced there has to be an easier way


FOR HARK! there was a better way


anArray = [aString componentsSeparatedByString:@"\uFEFF"];
aNewString = anArray[1]+[2]...etc

My problem was my "u" was capitalized, and my "FEFF" was not.... this successfully removed all BOM from the file and allowed parsing regularly in my app...

Much thanks for all the help though, I always appreciate it, and enjoy learning better or different ways to do things.

Until next time, hasta luego mis amigos