Search code examples
objective-ciosnsstringnsxmlparser

whitespaceAndNewlineCharacterSet seems to be removing white space before special characters


I'm using NSXMLParser to parse an rss feed. But I'm getting some strange behavior that I believe I've narrowed down to stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet].

If I have a sentence like this:

Hello, my name is "Sonny."

It will end up getting displayed like this:

Hello, my name is"Sonny."

Here is my foundCharacters method:

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string { 
    if(!currentNodeContent) 
        currentNodeContent = [[NSMutableString alloc] initWithString:string];
    else
    {
        [currentNodeContent appendString:string];        
        NSString *trimmedString = currentNodeContent;
        trimmedString = [trimmedString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
        [currentNodeContent setString:trimmedString];
    }
}

I tried changing whitespaceAndNewlineCharacterSet to newlineCharacterSet, which fixed the problem but caused all kinds of unwanted whitespace and carriage returns to show up. Any thoughts on why this is happening and what I can do to fix it?

UPDATE

So I updated my code based on Dirk's answer below, this seems to have done the trick nicely.

- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{    
    if ([elementname isEqualToString:@"item"]) 
    {
        [comments addObject:currentComment];
        currentComment = nil;
    }

    NSString *trimmedString = [tempString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
    [currentNodeContent setString:trimmedString];
    tempString = nil;
    currentNodeContent = nil;
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string { 
    if(!currentNodeContent) {
        currentNodeContent = [[NSMutableString alloc] initWithString:string];
        tempString = [[NSMutableString alloc] init];
    } else {
        [tempString appendString:string];
    }
}

Solution

  • In a situation like this:

    <element>Some Content</element>
    

    you should not rely on receiving exactly the following sequence of events:

    • startElement "element"
    • characterData "Some Content"
    • endElement "element"

    It could just as well be (depending on interna of the parser like buffer size, etc.):

    • startElement "element"
    • characterData "So"
    • characterData "me Cont`
    • characterData "ent"
    • endElement "element"

    To be safe, you should simply store the characters received until the end-of-element event is seen, and only then apply the trimming operation on the result.

    From the NSXMLParser documentation:

    The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.