PDA

View Full Version : How to convert/decode html entities




pashik
Nov 25, 2008, 03:36 AM
Hello.

i parse xml using NSXMLParser.
But i get xml encoded strings looks like


<![CDATA[We & #8217;ve been ....]]>

(i inserted space between & #8217; otherwise it is converted to correct symbol here)

and text extracted from this looks like We’ve been .... instead of We&'ve been ....

How i can avoid/remove/convert all such html encoded entities?

thanx for tip.

Here is code i use

NSData *dat = [NSData dataWithContentsOfURL:[NSURL URLWithString:@"someurl"]];

NSString* data = [[[NSString alloc] initWithData:dat encoding:NSUTF8StringEncoding] autorelease];

XMLParser* parser = [[XMLParser alloc] init];
[parser parseXMLData: data];
NSData* ndata = [data dataUsingEncoding: NSUTF8StringEncoding];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:ndata];



robbieduncan
Nov 25, 2008, 04:31 AM
NSString has methods to both encode and decode HTML entities. Look at the NSString documentation.

pashik
Nov 25, 2008, 05:46 AM
NSString has methods to both encode and decode HTML entities. Look at the NSString documentation.

There are only encoding parameter NSStringEncoding to work with string.
can u tell what is method/fucntion for decoding html entitites?

robbieduncan
Nov 25, 2008, 05:49 AM
Did you even look at the NSString documentation as I suggested. The ability to read and understand the documentation is a key programming skill, and one that you appear to be lacking.

Anyway the method I was referring to is this one (http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/stringByReplacingPercentEscapesUsingEncoding:).

pashik
Nov 25, 2008, 08:50 AM
Did you even look at the NSString documentation as I suggested. The ability to read and understand the documentation is a key programming skill, and one that you appear to be lacking.

Anyway the method I was referring to is this one (http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/stringByReplacingPercentEscapesUsingEncoding:).

Its strange that i really looked at it and what i see there: just string encoding parameters and CFURLCreateStringByReplacingPercentEscapes

But anyway - thanx for link

xsmasher
Nov 26, 2008, 11:38 AM
Did you even look at the NSString documentation as I suggested. The ability to read and understand the documentation is a key programming skill, and one that you appear to be lacking.

Anyway the method I was referring to is this one (http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/stringByReplacingPercentEscapesUsingEncoding:).

That is for URL encoding/decoding, not HTML entity encoding/decoding.

Here's HerbertHansen's solution, from the apple boards:
http://discussions.apple.com/message.jspa?messageID=8064367#8064367


@implementation MREntitiesConverter
@synthesize resultString;
- (id)init
{
if([super init]) {
resultString = [[NSMutableString alloc] init];
}
return self;
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
[self.resultString appendString:s];
}
- (NSString*)convertEntiesInString:(NSString*)s {
if(s == nil) {
NSLog(@"ERROR : Parameter string is nil");
}
NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSXMLParser* xmlParse = [[NSXMLParser alloc] initWithData:data];
[xmlParse setDelegate:self];
[xmlParse parse];
NSString* returnStr = [[NSString alloc] initWithFormat:@"%@",resultString];
return returnStr;
}
- (void)dealloc {
[resultString release];
[super dealloc];
}
@end