PDA

View Full Version : NSXMLParser and Windows encoding.. How?




Soulstorm
Oct 16, 2009, 11:35 AM
I have an application and I want to make it read greek characters based on NSWindowsCP1253StringEncoding . NSXMLParser refuses to parse string created with this encoding, but I need that encoding, since my application loads data from these pages.

Creating an NSString with dataUsingEncoding didn't work...

Is there any way to convert NSWindowsCP1253StringEncoding to NSUTF8StringEncoding?



PhoneyDeveloper
Oct 16, 2009, 11:43 AM
Creating an NSString with dataUsingEncoding didn't work...

Why not? How not?

If that doesn't work then your text isn't in the specified encoding, or you did something wrong.

Soulstorm
Oct 16, 2009, 12:53 PM
Creating an NSString with dataUsingEncoding didn't work...

Why not? How not?

If that doesn't work then your text isn't in the specified encoding, or you did something wrong.

You are right, I should be more specific. I was angry at myself because Apple rejected my app update for the second time, and I wasn't thinking at the moment :)

Suppose you have already downloaded the data in NSData format:

Here is my code:

NSString *str = [[[NSString alloc]initWithData:self.xmlData encoding:NSWindowsCP1253StringEncoding]autorelease];
rssParser = [[NSXMLParser alloc]initWithData:[str dataUsingEncoding:NSUTF8StringEncoding]];

And here is how I handle the errors:

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
NSString * errorString = [NSString stringWithFormat:@"Unable to download story feed from web site (Error code %i )", [parseError code]];
NSLog(@"error parsing XML: %@", errorString);

UIAlertView * errorAlert = [[UIAlertView alloc] initWithTitle:@"Error loading content" message:errorString delegate:self cancelButtonTitle:@"OK" otherButtonTitles:nil];
[errorAlert show];
[errorAlert release];
}

The parser throws an error of type 31, which is an unknown encoding error. I am positive that the content is formatted in this encoding, because I pull my data from a site, and that site has the following header:


<?xml version="1.0" encoding="windows-1253" ?>
<rss version="2.0">
..............................
<channel>

dejo
Oct 16, 2009, 01:48 PM
Have you tried using NSString's canBeConvertedToEncoding: to ensure it can be converted without any loss of information?

PhoneyDeveloper
Oct 16, 2009, 02:02 PM
Why are you converting the string to utf-8? If you convert it to utf-8 but the data says it's in windows-1253 the data will almost certainly be wrong.

I haven't used the xml parser but if it accepts an NSData* then just give it the NSData* that you get from the network.

Alternatively if the xml parser doesn't accept windows-1253 and you must convert it to utf-8 then I think you need to modify the xml code so it says that it's utf-8.

I assume that your str string isn't nil.

dejo
Oct 16, 2009, 02:11 PM
What happens if you just do this?:

rssParser = [[NSXMLParser alloc] initWithData:self.xmlData];

Soulstorm
Oct 16, 2009, 07:53 PM
What happens if you just do this?:

rssParser = [[NSXMLParser alloc] initWithData:self.xmlData];

The parser tells me that there is an error of type 31. Which means an unsupported encoding. That's why I am trying to convert it to UTF-8.

As for NSString's canBeConvertedToEncoding, how can I use that? I am accepting NSData from the site, and converting that data to nsstring requires me to specify the encoding, which defies the purpose, right?

this is my code:
//
// FeedURLConnection.m
// RSSTest2
//
// Created by Christos Sotiriou on 10/10/09.
// Copyright 2009 Tei of Pireus. All rights reserved.
//

#import "FeedURLConnection.h"
#import "NewsPapersSingleton.h"
#import <CFNetwork/CFNetwork.h>

@implementation FeedURLConnection
@synthesize stories, xmlFeedConnection, xmlData, url;

- (id) init
{
self = [super init];
if (self != nil) {

}
return self;
}

- (id) initWithURL:(NSString *)urlString
{
self = [super init];
if (self != nil) {
self.url = urlString;
}
return self;
}

- (void)connectAndParse
{
NSURLRequest *feedURLRequest = [NSURLRequest requestWithURL:[NSURL URLWithString:self.url]];
//NSURLResponse *response;
//NSData *data = [NSURLConnection sendSynchronousRequest:feedURLRequest returningResponse:&response error:NULL];
//[self.xmlData setData:data];
self.xmlFeedConnection = [[[NSURLConnection alloc]initWithRequest:feedURLRequest delegate:self]autorelease];
}
#pragma mark -
#pragma mark NSURLConnection delegate methods
- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response {
//NSLog(@"Did Receive Response with name: %@", [response textEncodingName]);
self.xmlData = [NSMutableData data];
myResponce = [response retain];
}

- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
//NSLog(@"did receive data! %@ with length: %i", [[[NSString alloc]initWithData:data encoding:NSASCIIStringEncoding]autorelease], [data length]);

//[xmlData appendData:[self dataFromData:data withEncoding:[myResponce textEncodingName]]];
[xmlData appendData:data];
//if ([[[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding]canBeConvertedToEncoding:NSUTF8StringEncoding]) {
// NSLog(@"yes, it can!");
//}

}

//I wonder what this does... I found it on Apple
- (NSData *)dataFromData:(NSData *)data withEncoding:(NSString *)encoding
{
NSStringEncoding nsEncoding = NSUTF8StringEncoding;
if (encoding) {
CFStringEncoding cfEncoding = CFStringConvertIANACharSetNameToEncoding((CFStringRef)encoding);
if (cfEncoding != kCFStringEncodingInvalidId) {
nsEncoding = CFStringConvertEncodingToNSStringEncoding(cfEncoding);
}
}
NSString *formattedString = [[[NSString alloc]initWithData:data encoding:nsEncoding]autorelease];
NSLog(formattedString);
return [[formattedString dataUsingEncoding:nsEncoding]retain];
}

- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
[UIApplication sharedApplication].networkActivityIndicatorVisible = NO;
if ([error code] == kCFURLErrorNotConnectedToInternet) {
// if we can identify the error, we can present a more precise message to the user.
NSDictionary *userInfo = [NSDictionary dictionaryWithObject:NSLocalizedString(@"No Connection Error", @"Error message displayed when not connected to the Internet.") forKey:NSLocalizedDescriptionKey];
NSError *noConnectionError = [NSError errorWithDomain:NSCocoaErrorDomain code:kCFURLErrorNotConnectedToInternet userInfo:userInfo];
[self handleError:noConnectionError];
} else {
// otherwise handle the error generically
[self handleError:error];
}
self.xmlFeedConnection = nil;
[[NSNotificationCenter defaultCenter]postNotificationName:FEED_TABLEVIEW_NEEDS_REFRESH_NOTIFICATION object:self];
}

- (void)handleError:(NSError *)error {
NSString *errorMessage = [error localizedDescription];
UIAlertView *alertView = [[UIAlertView alloc] initWithTitle:NSLocalizedString(@"Error Title", @"Title for alert displayed when download or parse error occurs.") message:errorMessage delegate:nil cancelButtonTitle:@"OK" otherButtonTitles:nil];
[alertView show];
[alertView release];
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
self.xmlFeedConnection = nil;
//[UIApplication sharedApplication].networkActivityIndicatorVisible = NO;
// Spawn a thread to fetch the earthquake data so that the UI is not blocked while the application parses the XML data.
//
// IMPORTANT! - Don't access UIKit objects on secondary threads.
//
//[NSThread detachNewThreadSelector:@selector(parseXMLFileAtURL:) toTarget:self withObject:sel];
// earthquakeData will be retained by the thread until parseEarthquakeData: has finished executing, so we no longer need
// a reference to it in the main thread.
//self.xmlData = nil;
//NSLog(@"content: %@", [[[NSString alloc]initWithData:self.xmlData encoding:NSUTF8StringEncoding]autorelease]);
[NSThread detachNewThreadSelector:@selector(parseXMLFileAtURL:) toTarget:self withObject:self.url];
}

#pragma mark -
#pragma mark NSXMLParser Delegations

- (void)parserDidStartDocument:(NSXMLParser *)parser{
NSLog(@"found file and started parsing");

}

- (void)parseXMLFileAtURL:(NSString *)URL
{
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc]init];

stories = [[NSMutableArray alloc] init];

//you must then convert the path to a proper NSURL or it won't work
//NSURL *xmlURL = [NSURL URLWithString:URL];

// here, for some reason you have to use NSClassFromString when trying to alloc NSXMLParser, otherwise you will get an object not found error
// this may be necessary only for the toolchain
//NSString *xmlString = @"SADfsd";
//[xmlString dataUsingEncoding:NSWindowsCP1250StringEncoding];
//rssParser = [[NSXMLParser alloc] initWithContentsOfURL:xmlURL];
//NSString *str = [[[NSString alloc]initWithData:self.xmlData encoding:NSWindowsCP1253StringEncoding]autorelease];
//NSLog(str);

//rssParser = [[NSXMLParser alloc]initWithData:self.xmlData];
//NSData *data = [NSData dataWithContentsOfURL:[NSURL URLWithString:self.url]];
//NSString *result = [NSString stringWithContentsOfURL:[NSURL URLWithString:self.url] encoding:NSUTF8StringEncoding error:NULL];

rssParser = [[NSXMLParser alloc]initWithData:self.xmlData];
//rssParser = [[NSXMLParser alloc]initWithData:[result dataUsingEncoding:NSUTF8StringEncoding]];
// Set self as the delegate of the parser so that it will receive the parser delegate methods callbacks.
[rssParser setDelegate:self];

// Depending on the XML document you're parsing, you may want to enable these features of NSXMLParser.
[rssParser setShouldProcessNamespaces:YES];
[rssParser setShouldReportNamespacePrefixes:YES];
[rssParser setShouldResolveExternalEntities:YES];

[rssParser parse];

[pool release];
}

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError {
NSString * errorString = [NSString stringWithFormat:@"Unable to download story feed from web site (Error code %i )", [parseError code]];
NSLog(@"error parsing XML: %@", errorString);

UIAlertView * errorAlert = [[UIAlertView alloc] initWithTitle:@"Error loading content" message:errorString delegate:self cancelButtonTitle:@"OK" otherButtonTitles:nil];
[errorAlert show];
[errorAlert release];
}

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict{
//NSLog(@"found this element: %@", elementName);
currentElement = [elementName copy];
if ([elementName isEqualToString:@"item"]) {
// clear out our story item caches...
item = [[NSMutableDictionary alloc] init];
currentTitle = [[NSMutableString alloc] init];
currentDate = [[NSMutableString alloc] init];
currentSummary = [[NSMutableString alloc] init];
currentLink = [[NSMutableString alloc] init];
}

}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
//NSLog(@"ended element: %@", elementName);
if ([elementName isEqualToString:@"item"]) {
// save values to an item, then store that item into the array...
[item setObject:currentTitle forKey:@"title"];
[item setObject:currentLink forKey:@"link"];
[item setObject:currentSummary forKey:@"summary"];
[item setObject:currentDate forKey:@"date"];

[stories addObject:[item copy]];
[item release];
//NSLog(@"adding story: %@", currentTitle);
}

}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
//NSLog(@"found characters: %@", string);
// save the characters for the current item...
if ([currentElement isEqualToString:@"title"]) {
[currentTitle appendString:string];
} else if ([currentElement isEqualToString:@"link"]) {
[currentLink appendString:string];
} else if ([currentElement isEqualToString:@"description"]) {
[currentSummary appendString:string];
//NSLog(string);
} else if ([currentElement isEqualToString:@"pubDate"]) {
[currentDate appendString:string];
}

}

- (void)parserDidEndDocument:(NSXMLParser *)parser {

//[activityIndicator stopAnimating];
//[activityIndicator removeFromSuperview];

//NSLog(@"all done!");
//NSLog(@"stories array has %d items", [stories count]);
//[[NSNotificationCenter defaultCenter]postNotificationName:FEED_TABLEVIEW_NEEDS_REFRESH_NOTIFICATION object:self];
//[newsTable reloadData];
[self performSelectorOnMainThread:@selector(returnToMainThreadWithNotificationPosted) withObject:nil waitUntilDone:NO];
}

- (void)returnToMainThreadWithNotificationPosted
{
NSLog(@"all done!");
NSLog(@"stories array has %d items", [stories count]);
[[NSNotificationCenter defaultCenter]postNotificationName:FEED_TABLEVIEW_NEEDS_REFRESH_NOTIFICATION object:self];
}

#pragma mark -
- (void) dealloc
{
[url release];
[stories release];
[xmlFeedConnection release];
[xmlData release];
[super dealloc];
}

@end

Soulstorm
Oct 16, 2009, 08:10 PM
Seems every developer has problems with Windows encodings, but I think I have found a workaround. I will post back with results.

PhoneyDeveloper
Oct 16, 2009, 09:49 PM
One suggestion

download a response from your server and write it out to a file. Then make that file work with the XMLParser. You can open the file with a text editor like TextWrangler and it will tell you the encoding of the file. You can inspect the file to see if it looks OK or if there are obvious problems with characters that aren't of the specified encoding.

Soulstorm
Oct 17, 2009, 04:12 AM
One suggestion

download a response from your server and write it out to a file. Then make that file work with the XMLParser. You can open the file with a text editor like TextWrangler and it will tell you the encoding of the file. You can inspect the file to see if it looks OK or if there are obvious problems with characters that aren't of the specified encoding.

Thanks. Apple has an example for OS X that is called "XMLBrowser". It asks for an XML from a server and it then explores its properties. I tried using that with the page that I'm having problems and I see that there is no problem whatsoever.

However, Apple uses NSXMLDocument in this example, which isn't available on the iPhone. Nevertheless, NSXMLParser should implement the same encoding mechanics.

In the data I accept from the server, I will try to erase the header that specifies the encoding of the file, and I will check if that makes a difference.

PhoneyDeveloper
Oct 17, 2009, 12:43 PM
Does the parser throw the error before parsing anything or does it throw the error somewhere in the middle?

dejo
Oct 17, 2009, 01:13 PM
As for NSString's canBeConvertedToEncoding, how can I use that? I am accepting NSData from the site, and converting that data to nsstring requires me to specify the encoding, which defies the purpose, right?
Well, if the data from the site is using NSWindowsCP1253StringEncoding, you shouldn't need to check if you can convert it to an NSString with the same encoding. There should be no problem there. If there is, than I would suspect the remote file is not in that encoding, although there may be other reasons for it not converting. You should check your string to make sure it's not nil before proceeding because, as the doc says, it "returns nil if the initialization fails for some reason (for example if data does not represent valid data for encoding)."

Then, if you still need to convert to NSUTF8StringEncoding, you should be able to ensure that's possible without loss of information by checking canBeConvertedToEncoding:, like so:
if ([str canBeConvertedToEncoding:NSUTF8StringEncoding) ...
Most of this is just speculation since, without access to the remote file itself, there's not much I can do to actually test out what I'm suggesting.

Soulstorm
Oct 18, 2009, 11:03 AM
It seems that the fault was in the header. No matter what encoding the file was, the parser was always using the line "<?xml version="1.0" encoding="windows-1253" ?>". Removing the line from the string before the parser touched it solved the problem.