Oh, man -- I am sooo confused (UTF-8 quest.)

Discussion in 'Web Design and Development' started by idea_hamster, Jul 14, 2006.

  1. idea_hamster macrumors 65816

    idea_hamster

    Joined:
    Jul 11, 2003
    Location:
    NYC, or thereabouts
    #1
    OK. So I was working on a really small website project that I am doing as a favor for a friend of a relative with a single text file on my HD and a copy on a webserver.

    Part of the site includes the job title of someone who works at the Louvre Museum, and the person wanted the title in French. This meant that the word "Museum" should be "Musée."

    When I viewed the file that I was working on in a browser (Safari or Firefox), the accented characters would be broken, much like they are in this post. But -- when I viewed the same file from the webserver, the letters looked fine! :eek:

    :confused: Now, I suppose on that pragmatic level of my mind, I don't care how this works if it does, in fact, work. At the same time, I can't quite understand how a remotely hosted file would be more robust than a locally hosted one. :confused:

    Any thoughts?
     
  2. frankblundt macrumors 65816

    frankblundt

    Joined:
    Sep 19, 2005
    Location:
    South of the border
    #2
    Maybe it's your browser - the characters aren't broken in your post for me. Or on any other similar sites (i do wine sites so they're stuffed full of French and German characters).

    I presume you included the encoding in the page header?
    I just use the Latin 1 set (ISO-8859-1) - seems to work fine.
    like the one for this page: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

    You might also (not that it will make any difference with this issue) include <span lang="fr"> tags around those terms (it helps text readers pronounce the words correctly for the visually impaired)
     
  3. idea_hamster thread starter macrumors 65816

    idea_hamster

    Joined:
    Jul 11, 2003
    Location:
    NYC, or thereabouts
    #3
    These thoghts crossed my mind too. That's why I tried both Safari and Firefox, and in Firefox tried just about every View -> Character Encoding option that I could find. The only one that worked was "Western (MacRoman)" and while I use a Mac, I don't really think it's reasonable to code that into a page.

    Generally, I use the UTF-8 because I understand (understood?) that to be the most international and most multi-language compatible --

    Code:
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    -- specifically.

    So I was surprised to see simple accents not working. And I did, in fact, try the <span lang="fr"></span> tags -- to (as you surmised) no avail.

    Update -- turns out that there are more problems with the way I was going about that project, so I'm taking a new tack that obviates my (non-)problem. Thanks all.
     
  4. radiantm3 macrumors 65816

    radiantm3

    Joined:
    Oct 16, 2005
    Location:
    San Jose, CA
    #4
    Did you try using

    & eacute ;

    (without the spaces in between them) instead of typing the actual character in? For the web, you want to html encode characters that are not common.
     
  5. peterparker macrumors regular

    peterparker

    Joined:
    Mar 12, 2005
    Location:
    Houston
    #5
    Couple of things:
    - what application are you using to save the file?
    - when it is not working locally, you mean opening the file in a browser directly, not through HTTP?

    First thing to do is make sure the file is being saved with character encoding of UTF-8 if that is what you want. UTF-8 is a variable-width encoding, so characters will take from 1 to 4 bytes when saved. And whatever application is reading the file must know that it is UTF-8 to read it correctly. ISO-8859-1 (Latin 1) is a fixed-width 1 byte per character encoding. Which makes it harder to goof when reading.

    An HTTP server can provide the character encoding in a response header, and you can also provide the meta tag as you said. Both help a browser render characters correctly.

    All of this just means there are plenty of places things can go awry. And yes Unicode is "the most international and most multi-language compatible". UTF-8 is just one way of encoding Unicode to save space. Straight Unicode would take 4 bytes per character which can waste a lot of space.
     
  6. peterparker macrumors regular

    peterparker

    Joined:
    Mar 12, 2005
    Location:
    Houston
    #6
    Another option, HTML entity references. If your content is only for display in a web browser.
     
  7. idea_hamster thread starter macrumors 65816

    idea_hamster

    Joined:
    Jul 11, 2003
    Location:
    NYC, or thereabouts
    #7
    I was using SubEthaEdit for my markup and just opening that file using Firefox, so if that means that there wasn't going through HTTP, then you're right on.

    I think that's what the problem was. I think the &_; idea would probably be the best solution. I guess it's a n00b's curse not to think of that!
     
  8. mnkeybsness macrumors 68030

    mnkeybsness

    Joined:
    Jun 25, 2001
    Location:
    Moneyapolis, Minnesota
    #8
    Think about UTF-8 this way... if you can't see the character on your keyboard, you need to use HTML Entities
     
  9. idea_hamster thread starter macrumors 65816

    idea_hamster

    Joined:
    Jul 11, 2003
    Location:
    NYC, or thereabouts
    #9
    That sounds like a pretty sound rule -- it's got to help in the long run. Besides, it's probably good to get familiar with them. Thx.
     

Share This Page