NSString decoding issue

Discussion in 'iOS Programming' started by nashyo, Jul 22, 2012.

  1. nashyo macrumors 6502

    nashyo

    Joined:
    Oct 1, 2010
    Location:
    Bristol
    #1
    I'm scraping a website and my output seems to be encoded. I'm trying to decode but I'm having no luck. Can anyone help?

    Code:
    NSString *modified_first = [hit_ stringByReplacingOccurrencesOfString:@"\n" withString:@""];
    NSString *modified_second = [modified_first stringByReplacingOccurrencesOfString:@" " withString:@""];
    const char *c = [modified_second cStringUsingEncoding:NSUTF8StringEncoding];
    NSString *modified_third = [[NSString alloc] initWithCString:c encoding:NSUTF8StringEncoding];
    Output
    "
    SalePrice:\U00a389,950.00
    "
     
  2. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #2
    I assume your question is about the \U00a3.

    Have you looked at the actual characters in the strings, either the starting string (hit_) or the resulting one? If not, then how do you know that what you perceive to be "encoded" is really present in the string or not? It might be that \U00a3 is simply how the NSString itself outputs, or it might be how your debug console outputs when certain characters appear.

    Have you looked at the characters in modified_second and compared them to those in modified_third? Are you certain there will be a difference? I ask because UTF8 is a perfectly invertible encoding. So if you get a C string in UTF8, and then make an NSString from those chars, the result should be the same. Whatever you're trying to do by getting a C string and then making another NSString is likely having no effect. You didn't say what you're trying to do, and I can't guess what it might be, since it seems like a pointless no-op to me.
     
  3. nashyo thread starter macrumors 6502

    nashyo

    Joined:
    Oct 1, 2010
    Location:
    Bristol
    #3
    Apologies for not being clear.

    The \U00a is supposed to be '£', as it appears on the website (sale price).

    The modified first and modified second strings are taking the \n and @" " characters out. That happens successfully.

    However, the my last attempt to decode the \U00a is failing.

    Sure I could replace is with '£', but I would rather decode it.

    I'm using the c string approach because another forum suggested it. However, it doesn't work and I am wondering if anyone knows what I am doing wrong.

    I don't know why the above methods are not working.

    ----------

    The class method stringWithUTF8Encoding requires a const char. Should I convert the scrapped html content into data then convert back from data to a string?
     
  4. nashyo thread starter macrumors 6502

    nashyo

    Joined:
    Oct 1, 2010
    Location:
    Bristol
    #4
    Figured it out thanks to Chown

    The debugger was showing \U00a3 but at run time (when sent to a uilabel) it appears as '£'.
     

Share This Page