Search code examples
objective-cxmlios5xcode4.3

Parsing XML files with special characters


I try to parse a list of persons and pollute a UITableView with the names. But the persons I want to parse have special character (ä, ö, ü). Now if I start parsing the name "Gött" it is "ött" afterwards. Really strange, any ideas? Thanks a lot!

-(id) loadXMLByURL:(NSString *)urlString
{
    tweets          = [[NSMutableArray alloc] init];
    NSURL *url      = [NSURL URLWithString:urlString];
    NSData  *data   = [[NSData alloc] initWithContentsOfURL:url];
    parser          = [[NSXMLParser alloc] initWithData:data];
    parser.delegate = self;
    [parser parse];
    return self;
}

- (void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{
    if ([elementname isEqualToString:@"lehrer"]) 
    {
        currentTweet = [Tweet alloc];
    }
}

- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementname isEqualToString:@"name"]) 
    {
        currentTweet.content = currentNodeContent;
    }
    if ([elementname isEqualToString:@"vorname"]) 
    {
        currentTweet.vorname = currentNodeContent;
    }
    if ([elementname isEqualToString:@"created_at"]) 
    {
        currentTweet.dateCreated = currentNodeContent;
    }
    if ([elementname isEqualToString:@"lehrer"]) 
    {
        [tweets addObject:currentTweet];
        [currentTweet release];
        currentTweet = nil;
        [currentNodeContent release];
        currentNodeContent = nil;
    }
}

- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}

- (void) dealloc
{
    [parser release];
    [super dealloc];
}

@end

Solution

  • This is normal behaviour - parser:foundCharacters can be called multiple times for one string (and tends to be for accented characters). Your string isn't complete until the end of the element, so store them and use the full string when you get to the end of the block. It is in the documentation for foundCharacters

    Apple developer docs on NSXMLParser

    The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.

    Edit as per question:

    the code in general is fine but in the characters function, do

    - (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
    {
        if(nil == currentNodeContent)
            currentNodeContent = [[NSMutableString alloc] initWithString:string];
        else
            [currentNodeContent appendString:string];
    }
    

    then in both didStart and didEnd call a method that checks to see if the string is nil, do whatever it was you were going to do with it in the first place, and then release the string (and null it).

    The string is ended at both the start of a new element (ie, the text before an opening <), and at the end (the bit of text before the