Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,

I am unable to save file using function xmlSaveFormatFileEnc using libxml2 library on mac operating system in that path contain special international character.

I am working on some cross platform application and try to save a project file that contain path as :

/home/yogesh/Electrónica/newproj.proj file.

xmlSaveFormatFileEnc Function works fine with window and mac os but unable to run on mac operating system and return -1.
================================
xmlSaveFormatFileEnc( ( const char * )sProjFile.char_str(), pXML, "ASCII", 1 );
xmlFreeDoc( pXML );
=====================================

Unable to understand why it is happening while it is working fine on mac also if i take simple path without special character like /home/yogesh/test/newproj.proj

I thing this is encoding issue with Mac machine but not sure.

Mac Os version : 10.6

Yogesh
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
In which 8-bit encoding is sProjFile, or which 8-bit encoding is returned by char_str()?

I understand the POSIX functions (which I presume libxml2 would be using) expect decomposed Unicode encoded in UTF-8.
 

gnasher729

Suspended
Nov 25, 2005
17,980
5,565
hi,

I am unable to save file using function xmlSaveFormatFileEnc using libxml2 library on mac operating system in that path contain special international character.

I am working on some cross platform application and try to save a project file that contain path as :

/home/yogesh/Electrónica/newproj.proj file.

xmlSaveFormatFileEnc Function works fine with window and mac os but unable to run on mac operating system and return -1.
================================
xmlSaveFormatFileEnc( ( const char * )sProjFile.char_str(), pXML, "ASCII", 1 );
xmlFreeDoc( pXML );
=====================================

Unable to understand why it is happening while it is working fine on mac also if i take simple path without special character like /home/yogesh/test/newproj.proj

I thing this is encoding issue with Mac machine but not sure.

Mac Os version : 10.6

Yogesh

To make sure, it is an encoding issue with your code, not with the Macintosh.

Is your path in UTF-8 format? Does the library modify it or reject it?

Just wondering what that "ASCII", 1 at the end of your call means?
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
Just wondering what that "ASCII", 1 at the end of your call means?

This is about which encoding to use when writing out the XML text representation. The 1 is a boolean true to turn on indentation.

(Makes you appreciate Objective-C naming, doesn't it.)

UPDATE:
So I fired up XCode and starting with the tree2.c sample code from libxml2, played around with filenames.

It wants a UTF-8 string. It doesn't matter if the characters are decomposed or composed.

For example, the following successfully saves an XML document into /var/tmp/Electrónica.xml
Code:
    const char *path = "/var/tmp/Electr\xc3\xb3nica.xml";
    xmlSaveFormatFileEnc(path, doc, "ASCII", 1);
C3 B3 is the UTF-8 sequence for Unicode character 00F3 LATIN SMALL LETTER O WITH ACUTE.

(Interesting thing, while playing around with encodings I got filenames with embedded NULLs, which /usr/bin/ls showed, but Finder truncated the name at the NULL.)
 
Last edited:

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
Thanks for your reply ,

I understand that this is issue with encoding and as example given by jiminaus its work fine, In my case my file name is given by user , then how can i check that whether i have special character in path, as given in mail we use C3 B3 for Special character but is there any method through which we are able to know special character in file name and convert it to proper UTF-8 sequence ,

i am newbie and may be foolish question .

I am using wxWidget library for coding.

Thanks in advance
Yogesh
 

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,

Sorry i forgot to add one thing, as in reply it is written it a issue with encoding and does not related to mac, my question is that it is working fine with Linux and window plate form as it is no changes required either Path contain Special charecter or not , but it create problem only on mac when i try to give path as special char,


Yogesh
 

gnasher729

Suspended
Nov 25, 2005
17,980
5,565
Sorry i forgot to add one thing, as in reply it is written it a issue with encoding and does not related to mac, my question is that it is working fine with Linux and window plate form as it is no changes required either Path contain Special charecter or not , but it create problem only on mac when i try to give path as special char,

So your code isn't portable. How exactly do you put a "Special" character into the path? Does your Windows code handle cyrillic characters, like АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩ? Is the path in UTF-8? Do you know what UTF-8 means? Can you post the bytes in the path in hexadecimal?

When I said the problem is with the encoding and not with the Macintosh, what I meant that it is absolutely well documented that paths on the Macintosh are supposed to be in UTF-8, so if you pass in a path that isn't UTF-8 encoded, then the problem is in _your_ code and _you_ need to fix it. My code has no problems writing files with Cyrillic, Kanji, Devanagari, Hebrew, Arabic, whatever characters in the path.
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
Try:

Code:
xmlSaveFormatFileEnc( sProjFile.mb_str(wxConvFile), pXML, "ASCII", 1 );

This assumes you've compiled wxWidgets in Unicode mode.

The parameter to wxString's mb_str method is a reference to a wxMBConv object. wxConvFile is a wxMBConv object appropriate for file names.

If that doesn't work, try wxConvUTF8 instead wxConvFile. But this might not work in Linux and most likely won't work in Windows, so you might need to conditionally compile it.

But also gnasher729 points out a deficiency of using 8-bit char file paths under Windows. If I'm using an English locale, I won't be able to create a path containing Cyrillic characters, because the Western European code page under Windows doesn't have those.
 
Last edited:

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,
Thanks both of you to help me in resolving issue,

To gnasher729 :
1. your code isn't portable.
Reply : Do't know exactly, but working with window and linux

2. Does your Windows code handle cyrillic characters, like АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩ?
Reply : I have tried to create folder with above given name and try to create xml file using my code: Work Fine on Linux(ubunto) but not working on Window Platform

3.Is the path in UTF-8:
Reply : I do't know how to check this or set this , in current application using code like:

This function is used to save or open file , first if condition is for opening existing file and else case is for new saving .
Code:
bool MyProjectXMLFile::Open( const wchar_t* pcProjFile, bool bReadOnly, bool bNew )
{
  wxString sProjFile = wxString( pcProjFile );

  if ( !bNew )
  {
    pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).char_str() );

    if ( pXML )
    {
      sProjFileLoc = wxString( pcProjFile );

      return( ParseXML( pcProjFile ) );
    }
  }
  else if ( bNew )
  {
    sProjFileLoc = wxString( pcProjFile );

    xmlNodePtr root_node = NULL;
    xmlDtdPtr dtd = NULL;

    pXML = xmlNewDoc( BAD_CAST "1.0" );
    root_node = xmlNewNode( NULL, BAD_CAST "PRJ" );

    xmlDocSetRootElement( pXML, root_node );

    //wxString AppPath = ueApp()->GetAppPath();
    wxString dtdName = sProjFile;
    wxString pPath = sProjFile;
    pPath = pPath.BeforeLast( wxFileName::GetPathSeparator() );
    pPath.Append( wxFileName::GetPathSeparator() );
    dtdName = dtdName.AfterLast( wxFileName::GetPathSeparator() );
    dtdName.RemoveLast( 3 );
    dtdName.Append( _T( "dtd" ) );

    dtd = xmlCreateIntSubset( pXML, BAD_CAST "PRJ", NULL, BAD_CAST ( char * ) dtdName.char_str() );

    xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "ASCII", 1 );

    xmlFreeDoc( pXML );
  return( ParseXML( pcProjFile ) );

4. Do you know what UTF-8 means

As i told it is my application that handle with encoding(newbie on mac also) , as far as i know UTF-8 is a multibyte encoding in which each character can be encoded in as little as one byte and as many as four bytes. Most Western European languages require less than two bytes per character, and it is compatible with ASCII in other word single byte ASCII characters retain their encoded value in UTF-8.

5. Can you post the bytes in the path in hexadecimal?
/home/psidm/electronica/Electrónica/myapp.prj:
2f686f6d652f707369646d2f656c656374726f6e6963612f456c65637472f36e696361

6.what I meant that it is absolutely well documented that paths on the Macintosh are supposed to be in UTF-8.
Reply: As i given code in above section is only one location i am giving as path , is there anything wrong.

To jiminaus:

I have tried code as given by you , Through which i am able to save file at given location contain path with special path as i given above.


But when i want to reload file with open project option it give me null to pXML
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).char_str();

Is there anything i am doing wrong, is char_str is ok here or we need to change this mb_str(wxConvUTF8) also.

Yogesh
 
Last edited:

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
I have tried code as given by you , Through which i am able to save file at given location contain path with special path as i given above.

But when i want to reload file with open project option it give me null to pXML
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).char_str();

Is there anything i am doing wrong, is char_str is ok here or we need to change this mb_str(wxConvUTF8) also.

Of course you need to do that same thing with loading!


BTW If you want proper internationalization on all 3 platforms you might need something like this for saving.
Code:
#if defined(WIN32) || defined(_WIN32)
    FILE *file = _wfopen(pcProjFile, _T("wb"));
#else
    FILE *file = fopen(sProjFile.mb_str(wxConvUTF8), "w");
#endif
    xmlOutputBufferPtr buf = xmlOutputBufferCreateFile(file, NULL);
    xmlSaveFormatFileTo(buf, pXML, "ASCII", 1);
    // buf already closed by xmlSaveFormatFileTo
    fclose(file);

And something like this for loading:
Code:
#if defined(WIN32) || defined(_WIN32)
    FILE *file = _wfopen(pcProjFile, _T("rb"));
    int fd = _fileno(file);
#else
    FILE *file = fopen(sProjFile.mb_str(wxConvUTF8), "r");
    int fd = fileno(file);
#endif
    pXML = xmlReadFd(fd, NULL, "ASCII", 0);
    fclose(file);
 
Last edited:

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,
Thanks for reply , I understand, i have tried your suggessions but fail , I want to ask one thing more do we need to change encoding type of each function that deal with xml file in my case
for example as i asked in my last mail that do we need to

pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).char_str(); we need to change it as
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).mb_str(wxConvUTF8);

As same i am using a function in attached code as
dtd = xmlCreateIntSubset( pXML, BAD_CAST "PRJ", NULL, BAD_CAST ( char * ) dtdName.char_str() );

Do we need to change its char_str() to mb_str(wxConvUTF8) and each function that deal with xml file or this changes effect only when we are saving or loading file.

Thanks in advance
Yogesh
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
hi,
Thanks for reply , I understand, i have tried your suggessions but fail , I want to ask one thing more do we need to change encoding type of each function that deal with xml file in my case
for example as i asked in my last mail that do we need to

pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).char_str(); we need to change it as
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).mb_str(wxConvUTF8);

As same i am using a function in attached code as
dtd = xmlCreateIntSubset( pXML, BAD_CAST "PRJ", NULL, BAD_CAST ( char * ) dtdName.char_str() );

Do we need to change its char_str() to mb_str(wxConvUTF8) and each function that deal with xml file or this changes effect only when we are saving or loading file.

Thanks in advance
Yogesh

You need to understand that every time you convert a Unicode string to an 8-bit string you're doing a conversion. If you convert to UTF-8, you can preserve all Unicode characters. If you convert to a legacy 8-bit code page, only the characters in that code page can be converted and the presence of other characters in the Unicode string may cause the conversion to fail and NULL to be returned.

The default argument to wxString's char_str() and mb_str() is wxConvLibc. It will convert the Unicode string using the std C function wcstombs. The result of this conversion is platform and locale specific.

Every time you go from a wchar_t* string or wxString to a char* string, you need to consider what type of conversion is appropriate. If the std string is going to be passed to a function, it will be requirements of that function that will determine the type of conversion necessary.

On Mac OS X (and seemingly Linux as well), file-related functions require a UTF-8 string, so wxConvUTF8 is appropriate. But on Windows, file-related functions require a string encoded in the code page of the user's locale, so wxConvLibc is appropriate. That's why there's the wxConvFile, so you get the appropriate converter for file names on the platform being compiled for.

Other functions may want a Latin 1 encoding, in which case a wxConvISO8859_1 converter is appropriate.

When it comes to libxml2, xmlChar* is defined in libxml/xmlstring.h to be a UTF-8 string. So you should use wxConvUTF8 when you're converting to a xmlChar* string

BTW I just found that wxWidgets 2.8.4 added utf8_str() to wxString. A little easier to use.
 
Last edited:

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,

Very very thanks for your help,

It means when i am working on mac machine and use code
xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "ASCII", 1 );

Then i am converting sProjFile to UTF8 format its ok, but what "ASCII" means here, because when i save file using this function parameter i found that xml file has been created as
Code:
<?xml version="1.0" encoding="ASCII"?>

What it means encoding is still ASCII and we only change file name to UTF8 encoding.

In my case file save properly but when i try to open file using function
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).mb_str(wxConvUTF8);

It still give me pXML as null means unable to open file.
I am confuse here what appropriate encoding we need to define here because xml file show it is in ASCII format and we try to open it as UTF8 encoding.

Yogesh
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
hi,

Very very thanks for your help,

It means when i am working on mac machine and use code
xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "ASCII", 1 );

Then i am converting sProjFile to UTF8 format its ok, but what "ASCII" means here, because when i save file using this function parameter i found that xml file has been created as
Code:
<?xml version="1.0" encoding="ASCII"?>

What it means encoding is still ASCII and we only change file name to UTF8 encoding.

In my case file save properly but when i try to open file using function
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).mb_str(wxConvUTF8);

It still give me pXML as null means unable to open file.
I am confuse here what appropriate encoding we need to define here because xml file show it is in ASCII format and we try to open it as UTF8 encoding.

Yogesh

There are 2 different things being coded where with 2 different codings.

First, the file name is being encoded as UTF-8. This is because of sProjFile.mb_str(wxConvUTF8).

Secondly, the XML document inside the file is being encoded as ASCII. This is because of the "ASCII" encoding parameter of xmlSaveFormatFileEnc. If you want the XML document inside the file to be encoded as UTF-8, specify that.

A NULL result from xmlParseFile does not just occur because the file couldn't be found. It can also occur if the file does not contain a well-formed document.

Try 3 things, in turn. Stop if/when one of these doesn't work.

Firstly. Try replacing wxString(pcProjFile).mb_str(wxConvUTF8) in the call to xmlParseFile with a hard-coded path to file with only ascii characters (no "international" characters) in its path. Does that work?

Secondly. Rename the file to include a non-ascii character. Change the hard-coded path by manually encoding the new path as UTF-8. http://www.utf8-chartable.de/ can help you find the UTF-8 encoding of a Unicode character. Does that work? If not, what did you rename the file and what hard-coded path did you code?

Thirdly. Revert the hardcoded string back to wxString(pcProjFile).mb_str(wxConvUTF8) in the call to xmlParseFile. Log the result of wxString(pcProjFile).mb_str(wxConvUTF8) as hexadecimal immediately before that. Is the logged out hexadecimal as expected?
 

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,
Sorry i could not reply earlier as there was some problem with my net connection.

I have tried as you told and results are as :
Code:
Firstly. Try replacing wxString(pcProjFile).mb_str(wxConvUTF8) in the call to xmlParseFile with a hard-coded path to file with only ascii characters (no "international" characters) in its path. Does that work?
1. I have created a path as /home/psidm/Electronica/testproject.prj that is simple path without any special character
Result : I am able to save file and open file
2. Now i change Path to a special chrarecter and now path become
/home/psidm/Electrónica/testproject.prj
Result : now i am unable to open file

////Second try i have done with
1. First i created a path using special charecter and path become /home/psidm/Electrónica/testproject.prj
Result : Save project now try to open file but fail to open
2. I have changed the name to simple charecter as given
/home/psidm/Electronica/testproject.prj
Result : Try to open Existing file but fail

Yogesh
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
Sorry, I need to clarify.

When the file's path was /home/psidm/Electronica/testproject.prj, the following code worked?
Code:
pXML = xmlParseFile("/home/psidm/Electronica/testproject.prj");

But when you renamed the folder to Electrónica, the following code didn't work?
Code:
pXML = xmlParseFile("/home/psidm/Electr\xc3\xb3nica/testproject.prj");

Just to be clear, this kind of code does work for me, at least on my Mac with Mac OS X 10.6.7, XCode 4.0.2, libxml2 2.7.8, iconv 1.13.1, 64-bit build.

However if I corrupt the XML file then I get NULL, but I also get the following output on stderr.
Code:
/var/tmp/Electrónica.xml:19: parser error : Premature end of data in tag root line 18

^
/var/tmp/Electrónica.xml:19: parser error : Premature end of data in tag root line 3

^

Try adding some error handling and reporting code. Maybe something like this:
Code:
    pXML = xmlParseFile( sProjFile.mb_str(wxConvFile) );

    if ( pXML )
    {
      sProjFileLoc = sProjFile;
      return( ParseXML( pcProjFile ) );
    }
    else
    {
       xmlErrorPtr errorPtr = xmlGetLastError();
       wxString msg =
           wxString::Format(
               wxT("File: %s\nLine: %d\nMessage: %s\n\n(Domain: %d Code: %d)"),
               errorPtr->file, errorPtr->line, errorPtr->message,
               errorPtr->domain, errorPtr->code );
       wxMessageBox( msg, "Load Error", wxOK|wxICON_EXCLAMATION );
       return NULL;  // Or do whatever is appropriate to signal that the load failed
    }

This should tell you why you're getting NULL.
 
Last edited:

ychaturvedi

macrumors newbie
Original poster
May 23, 2011
16
0
hi,
Hey you are right
Code:
A NULL result from xmlParseFile does not just occur because the file couldn't be found. It can also occur if the file does not contain a well-formed document.

I did some more debugging the code and found position where it create xml file

xmlNewChild( node, NULL, BAD_CAST ( ( char * )key.char_str() ), BAD_CAST ( ( char * )sValue.char_str() ) );

That is problem it actually taking the project path and due to sValue.char_str() , i chnage it to
xmlNewChild( node, NULL, BAD_CAST ( ( char * )key.char_str() ), BAD_CAST ( ( const char * )sValue.mb_str(wxConvUTF8)) );

and parse file using
pXML = ( xmlDocPtr ) xmlParseFile( wxString( pcProjFile ).mb_str(wxConvUTF8);

and now it save and open file with special character as
/home/psidm/Electrónica/testproject.prj

It is first appearance of project file , i need to do test more.

You are really great and have good knowledge on topic, thanks again for your help.

One more question :

What is better approach to make code solid ? Either we use :

xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "ASCII", 1 );
as it is

or Change it to
xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "UTF8", 1 );

Thanks again
Yogesh
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
What is better approach to make code solid ? Either we use :

xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "ASCII", 1 );
as it is

or Change it to
xmlSaveFormatFileEnc( ( const char * )sProjFile.mb_str(wxConvUTF8), pXML, "UTF8", 1 );

Undoubtedly UTF8 is better than ASCII. Non-Unicode should only be used when interfacing with legacy code. There's no reason not to use a Unicode encoding when it's available to you.
 
Last edited:

gnasher729

Suspended
Nov 25, 2005
17,980
5,565
2. Now i change Path to a special chrarecter and now path become
/home/psidm/Electrónica/testproject.prj
Result : now i am unable to open file

So what is the path, with every character printed in hexadecimal? Especially the ó? You are having problems with encodings. That means it doesn't matter which characters you _see_, what matters is the actual bytes.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.