how to get data, HTML tags in XML file, while xml parsing

Discussion in 'iOS Programming' started by psudheer28, Aug 25, 2010.

  1. macrumors newbie

    Joined:
    Aug 10, 2009
    #1
    Hi all,

    Thanks in advance,

    I am using NSXMLParser to parse xml file, in my application,

    my xml file is like this

    < item >
    < ID >
    123456
    < /ID >
    < category >
    Films
    < /category >
    < Heading >
    HollyWood films
    < /Heading >
    < Author >
    samule
    < /Author >
    < imageFull >
    http://tree_one.jpg
    < /imageFull >
    < contentFull >
    New York, the stars will fly to Las Vegas for another one. On New Year’s eve no shoot because it’s been left free for partying.
    < strong >
    Costly choices
    < /strong >
    < b > A source < /b >
    adds that Sajid wants to make up for the missed family
    < br >time by allowing them to have
    a blast without bothering about anything.
    < /contentFull >
    < PubDate >
    Monday, 14 December 2009
    < /PubDate >
    < /item >

    my question is when i get the content in < contentFull > tag, it is not coping the content to the string.
    i think because of the internal HTML tags its not getting the content.

    how can i solve this, to ignore HTML tags to perform as mentioned in the xml file(bold, break, strong,... etc).

    plz guide me, is it possible in NSXMLParser?,
     
  2. macrumors 6502

    Joined:
    Jun 22, 2010
    Location:
    @
    #2
    No XML parser is going to be happy with inter-mixed XML and HTML elements.

    Are you parsing an RSS feed? RSS feeds typically include HTML as CDATA, for example:

    Code:
    <?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
    <title>Foo</title>
    <entry>
            <content type="html" xml:base="http://example.com/" xml:lang="en"><![CDATA[
    <strong>Costly choices</strong>
    <b>A source</b> adds that Sajid wants to make up for the missed family 
    <br>time by allowing them to have a blast without bothering about anything.
            ]]></content>
    </entry>
    </feed>
    
    Hope this helps :)
     
  3. thread starter macrumors newbie

    Joined:
    Aug 10, 2009
    #3
    Hi ianray,

    thanks for reply,

    i don't know XML (language), but i am using that xml file for parsing.

    i googled about this, i get some information about CDATA, but i didn't understand how to use.

    in XML, is coding format has to be changed??? or
    in my code, i have to change the code???

    this is my code, for xml parsing::::

    Code:
    - (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI 
     qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict{
    	if(nil != qualifiedName){
    		elementName = qualifiedName;
    	}
    	if ([elementName isEqualToString:@"item"]) {
    		self.currentItem = [[[BlogRss alloc]init]autorelease];
    	} else if([elementName isEqualToString:@"ID"] || 
    	            [elementName isEqualToString:@"category"] ||
    	            [elementName isEqualToString:@"imageFull"]||
    	            [elementName isEqualToString:@"Heading"] ||
    		[elementName isEqualToString:@"Author"] ||
    		[elementName isEqualToString:@"contentFull"]||
    		[elementName isEqualToString:@"PubDate"]) {
    		self.currentItemValue = [NSMutableString string];
    	} else {
    		self.currentItemValue = nil;
    	}	
    }
    
    
    
    - (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
    	if(nil != qName){
    		elementName = qName;
    	}
    	if([elementName isEqualToString:@"ID"]){
    	self.currentItem.ID = self.currentItemValue;
    			
    	}else if([elementName isEqualToString:@"category"]) {		
    	self.currentItem.category = self.currentItemValue ;
    			
    	}else if([elementName isEqualToString:@"Heading"]){
    	self.currentItem.heading = self.currentItemValue;
    					
    	}else if([elementName isEqualToString:@"Author"]){
    	self.currentItem.author = self.currentItemValue;
    					
    	}else if([elementName isEqualToString:@"imageFull"]){
    	self.currentItem.imageUrl = self.currentItemValue;
    		
    	}else if([elementName isEqualToString:@"contentFull"]){
    	self.currentItem.content = self.currentItemValue;
    				;
    	}else if([elementName isEqualToString:@"PubDate"]){
    	self.currentItem.pubDate = self.currentItemValue;
    
    	}else if([elementName isEqualToString:@"item"]){
    	[selectedNewsArray addObject:self.currentItem];
    	}
    
    }
    
    - (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
    	if(nil != self.currentItemValue){
    		[self.currentItemValue appendString:string];
    	}
    }


    thank you
     
  4. macrumors regular

    Joined:
    Aug 26, 2010
    #4
    In your DidStartElement init a new currentItemValue you will need to ignore <Strong>, <b>, <br startingElements and continue appending the string until you hit didEndElement </contentFull>.


    edit:

    I wanted to add KVC is your friend when working with XML. The example you posted is eek.
     
  5. macrumors 6502

    Joined:
    Sep 18, 2009
    #5
    You might want to read this. That way, you can use XPath and extract all the data you want really quick.
     
  6. macrumors 6502

    Joined:
    Jun 22, 2010
    Location:
    @
    #6
    If I'm not mistaken that strategy will not work. A lone "br" tag is not valid XML, and the NSXMLParser will fail.

    Code:
    2010-08-26 20:49:30.634 PhonePlayground[13456:207] didStartElement contentFull
    2010-08-26 20:49:30.634 PhonePlayground[13456:207] didStartElement b
    2010-08-26 20:49:30.635 PhonePlayground[13456:207] didEndElement b
    2010-08-26 20:49:30.635 PhonePlayground[13456:207] didStartElement br
    2010-08-26 20:49:30.637 PhonePlayground[13456:207] parseErrorOccurred Error Domain=NSXMLParserErrorDomain Code=76 "The operation couldn’t be completed. (NSXMLParserErrorDomain error 76.)"
    
     

Share This Page