Search code examples
xmliphonecharacter-encodingnsxmlparser

Parsing ISO-8859-1 w/ NSXmlParser


I am usin the nsxmlparser and am wondering how I can parse ISO-8859-1 correctly into an NSString.

Currently, I am getting results w/ Â for two-byte characters.

The XML I'm using (not created by me) starts with <?xml version="1.0" encoding="ISO-8859-1"?>

Here are the basic calls I'm using (omitted the NSThread calls).

NSString *xmlFilePath = [[NSBundle mainBundle] pathForResource:sampleFileName ofType:@"xml"];

NSString *xmlFileContents = [NSString stringWithContentsOfFile:xmlFilePath encoding:NSUTF8StringEncoding error:nil];

NSData *data = [xmlFileContents dataUsingEncoding:NSUTF8StringEncoding];

NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];

[parser setDelegate:self];

[parser parse];

Solution

  • The XML specification recommends an explicit character encoding declaration in the document prolog. Your input document likely has one; that will tell you the encoding that the parser must use to interpret the character input.

    In the absence of an explicit declaration, the same section says to treat the input as UTF-8 or UTF-16 (and the document is in error if it turns out not to be encoded as either of those).

    So, if your XML parser is either ignoring the explicit encoding declaration, or using the wrong encoding if there's no explicit declaration, your parser is Doing It Wrong™ and needs to be fixed to conform to the XML specification.