High ASCII encoding differences Mac and Windows

Discussion in 'Mac Programming' started by anandds, Feb 9, 2010.

  1. anandds macrumors newbie

    Joined:
    May 8, 2009
    #1
    Hi,

    We have a server client application where the client runs on Mac and the Server runs on windows.

    The client sends some Mac file pathnames to the server and the server displays this to the user. I have problems in sending the high-ASCII characters.

    For example, I send the HIRAGANA LETTER BO ぼ from the Mac system, but in Windows the server displays this as HIRAGANA LETTER HO ほ followed by a junk character (looks like a dot in the UI).

    The Mac client UI displays this rightly, but its only the Windows display that is causing the problem.

    Any help would be great

    Thanks,
    Anand
     
  2. robbieduncan Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #2
    Are you sure they are using ASCII? Seems like Unicode is much more likely. I imagine the Mac is sending as UTF-16. What is the server interpreting it as?
     
  3. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #3
    There's no such thing as "high ASCII". ASCII is a 7-bit code. Anything outside the range 0x00-0x7F isn't ASCII.


    HFS+ filenames are stored on disk in "a form very nearly the same as Unicode Normalization Form D (NFD)":
    http://en.wikipedia.org/wiki/HFS_Plus

    Look at the article, read the links, and then refer to the references.

    One of the references will be this table:
    http://developer.apple.com/mac/library/technotes/tn/tn1150table.html

    You will almost certainly need to understand Unicode, normalization, and composed vs. decomposed forms in order to solve this.
     
  4. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #4
    Posix filenames are in UTF-8 encoding, using canonically decomposed Unicode. Go to www.unicode.org to find out what that means. While the Mac handles this fine, whatever code you are using on your Windows box doesn't (which means it is broken - any process that cannot handle composition of Unicode characters is not conforming with the Unicode standard). Either convert the Unicode text to canonically precomposed Unicode on the Macintosh or on Windows.
     
  5. anandds thread starter macrumors newbie

    Joined:
    May 8, 2009
    #5
    Looks like we had to normalize (CFNormalize) the path name before sending it across to the windows box. I normalize using the kCFStringNormalizationFormC argument and it seemed to work fine now.

    At the same time, we deal with Mac path names (HIRAGANA Japanese) coming in from the windows box to Mac. Here again, we have to normalize the paths before storing it in Mac filesystem. But here, we have to use the argument kCFStringNormalizationFormD.

    Its actually Unicode, but people conventionally call it high ASCII meaning higher ascii values or whatever it means.

    Thanks for the help.
     
  6. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #6
    Anyone calling it "high ASCII" doesn't have the slightest clue what they are talking about.

    And you may have to be careful about Macintosh filenames that are not allowed in the Windows filesystem. On the Macintosh, all characters are allowed in a filename except the nul character and the slash character ("/"). In Windows, many characters are not allowed. You might have fun if I name a file "*.*" on the Macintosh and you try to create a file with that name on a Windows machine.
     
  7. Sydde macrumors 68020

    Sydde

    Joined:
    Aug 17, 2009
    #7
    Would it help in such a case to use the NSString method -stringByAddingPercentEscapesUsingEncoding: to convert to a standard URL? How would a windows box handle the translation back to a filename?
     
  8. savar macrumors 68000

    savar

    Joined:
    Jun 6, 2003
    Location:
    District of Columbia
    #8
    Holy smokes.

    Unicode is not conventionally called "high ASCII". Please seek the path of character encoding enlightenment:

    http://www.joelonsoftware.com/articles/Unicode.html
     
  9. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #9
    High ASCII is like unicorn tears.

    IF ascii had more bits
    AND those bits were standardized
    THEN high ascii would exist.

    IF an albino equid had a single horn extending from its forehead
    AND it wept tears for emotional reasons instead of eye irritation
    THEN unicorn tears would exist.

    The logic is impeccable.
     

Share This Page