Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

cthesky

macrumors member
Original poster
Aug 21, 2011
91
0
Hi,

I am using Libxml2 to parse Xml files in my Iphone project.

In using Libxml2, there are some called back functions. One of them is charactersFoundSAX function. This function will get called when characters was found. I noticed that this function not only called between start element and end element of a XML file, but also get called before start element and after end element. Is it correct? Is it the actual behaviour of charactersFoundSAX called back function?

I also noticed that between start and end element of an XML file, charactersFoundSAX function mostly will get called one time, but sometimes I noticed that it get called a few times between start and end element. Why? For the number of times of charactersFoundSAX function get called, is it depends on the number or length of characters?

I hope someone can give me some ideas and opinions. Any comments regarding parsing XML using Libxml2 are welcome. :)

Thanks a lot. :)
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
One of them is charactersFoundSAX function.

Is this function set as your SAX handler's characters member? I'm going to assume it is. If it's not then none of the following may be true.


I noticed that this function not only called between start element and end element of a XML file, but also get called before start element and after end element. Is it correct? Is it the actual behaviour of charactersFoundSAX called back function?

Yes it's correct. I find a pretty much need to build a state machine when using SAX. I'd start in a state were I'd ignore characters until my startElement callback was called for an element where I was interested in the text context.


I also noticed that between start and end element of an XML file, charactersFoundSAX function mostly will get called one time, but sometimes I noticed that it get called a few times between start and end element. Why? For the number of times of charactersFoundSAX function get called, is it depends on the number or length of characters?

The number of calls to your characters callback will, as you've seen, be unpredictable. libxml2 does not buffer all the text and then call your characters callback. It calls the callback as soon it is has some text available.

You'll most likely see this happen when parsing XML streamed over a network. Some of the text will arrive in a packet, and libxml2 will immediately call your characters callback. Then as more packets arrive with more text, libxml2 continues to call your characters callback for the text in each packet.

You need to buffer the text if necessary. A technique I use for elements with pure text content is to create a buffer when my startElement callback is called, collect text into the buffer as my characters callback is called, and then only when my endElement callback is called do I process the text in the buffer.
 

cthesky

macrumors member
Original poster
Aug 21, 2011
91
0
Hi Jiminaus,

Thanks a lot for your reply. Your reply is really helpful and make me more understand about charactersFoundSax call-back function. :)

Is this function set as your SAX handler's characters member?
Yes, this function is set as my SAX handler's characters member.

Yes it's correct. I find a pretty much need to build a state machine when using SAX. I'd start in a state were I'd ignore characters until my startElement callback was called for an element where I was interested in the text context.
Jiminaus, I have some questions about state machine as you mentioned in your replied. So, can I said that this state machine can control when the charactersFoundSax will get called? For example, with this state machine, charactersFoundSax can only get called between Start and End element of a XML file or will get called only when the startElement callback function was called for an element where I was interested in the text context. Am I right?

Thanks. :)
 

jiminaus

macrumors 65816
Dec 16, 2010
1,449
1
Sydney
Jiminaus, I have some questions about state machine as you mentioned in your replied. So, can I said that this state machine can control when the charactersFoundSax will get called? For example, with this state machine, charactersFoundSax can only get called between Start and End element of a XML file or will get called only when the startElement callback function was called for an element where I was interested in the text context. Am I right?

The state machine would work the opposite way. You can't control when libxml2 calls your characters callback, you can only control what you do in that callback.

Note that I'm using the term state machine in a very abstract way, more as a design device than as an implementation device.

Unless you've got a complex XML structure, you may not need to implement an actual state machine. It should just be as simple as setting a boolean to true in your startElement callback, checking that boolean in your characters callback, and resetting the boolean to false in your endElement callback.
 

cthesky

macrumors member
Original poster
Aug 21, 2011
91
0
The state machine would work the opposite way. You can't control when libxml2 calls your characters callback, you can only control what you do in that callback.

Note that I'm using the term state machine in a very abstract way, more as a design device than as an implementation device.

Unless you've got a complex XML structure, you may not need to implement an actual state machine. It should just be as simple as setting a boolean to true in your startElement callback, checking that boolean in your characters callback, and resetting the boolean to false in your endElement callback.

Hi Jiminaus,

Thanks a lot for your reply. :) I understand what you mean. :)

I have another questions. For example, if every element of an XML file consists of attributes.

For example, let say I have this XML file:

<Song>
<title id = "0001" name = "Song 1"/>
<title id = "0002" name = "Song 2"/>
<title id = "0003" name = "Song 3"/>
</Song>

I noticed that the libxml will not call this charactersFoundSax callback function between Start Element and End Element (In this case, charactersFoundSax callback function will not get called between <title> and </title>). Am I right?

If this is one of the actual behaviour of charactersFoundSax call-back function, can I said that in this case I no need to do anything in charactersFoundSax function? And what I need to do is do some control and implementation in startElement callback and endElement callback function. Am I correct?

Thanks. :)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.