creative text file parsing, revisited

Discussion in 'Mac Programming' started by dj.mooky, Jul 8, 2009.

  1. macrumors newbie

    Joined:
    Feb 26, 2008
    #1
    This refferences the post from http://forums.macrumors.com/showthread.php?t=734883

    So I have successfully used reg ex's to accomplish my tasks, and I give a sincere thank you shout out to all those that helped me, but I have run into a new issue.

    I'm not seeing many things on google about this, that would serve my purposes anyway however on to the punchline

    I have a text file that I am parsing for information. I am adding a new format that I am going to support this time, however the sly foxes have written unicode characters into their txt files.

    I have been breaking down the text files by the 4 carriage returns "\n". So every "\n\n\n\n" I break the string down into an array, from there I break it down every single return to get the line-by-line information, and generally parse from there. However this is not working because of a unicode character... specificially "\Ufeff" Which seems to be a character that dictates to use standard spacing something something text.

    This code comes at the beginning of every line, so I am getting errors, and unable to break the string into multiple strings based upon "\n" because it assumes that "\n\Ufeff" is a single character, and will not break it off at the \n without taking the \Ufeff along with it. Furthermore, when I attempt to
    Code:
    anArray = [aString componentsSeparatedByString:@"\n\n\n\n\Ufeff"]  
    
    it tosses an error saying that \U is an "incomplete universal character name \Ufeff"

    Has anyone dealt with anything like this, and come across a fancy way to remove this specific unicode character? It is really turning into a thorn in my side right now.

    Thanks in advance
     
  2. thread starter macrumors newbie

    Joined:
    Feb 26, 2008
    #2
    Update:

    So I've discovered that my issue is that the code \Ufeff only exists in UTF16, and if I go into text edit, and save-as to a UTF8 file, it works perfectly. So while I do want to figure out how to parse a UTF16 file without doing anything, does anyone know of a good way to downgrade the string once you import it from a file?

    Thanks
     
  3. macrumors 601

    HiRez

    Joined:
    Jan 6, 2004
    Location:
    Western US
    #3
    You can convert it using a number of different NSString methods, such as dataUsingEncoding:allowLossyConversion: followed by initWithData:encoding:
     
  4. thread starter macrumors newbie

    Joined:
    Feb 26, 2008
    #4
    Excellent thanks... off to the races

    I don't suppose I could get you to give me an example of that code in action? I am getting nsstrings that throw selector errors for initWithBytes:length:encoding: type of things.... I know this has to be simpler than I am making it... push come to shove I may just write an apple script to save them as UTF8 files in text edit....

    But i'm convinced there has to be an easier way


    FOR HARK! there was a better way

    Code:
    anArray = [aString componentsSeparatedByString:@"\uFEFF"];
    aNewString = anArray[1]+[2]...etc
    
    My problem was my "u" was capitalized, and my "FEFF" was not.... this successfully removed all BOM from the file and allowed parsing regularly in my app...

    Much thanks for all the help though, I always appreciate it, and enjoy learning better or different ways to do things.

    Until next time, hasta luego mis amigos
     

Share This Page