Search code examples
iosobjective-cspecial-charactersnsxmlparser

NSXMLParser parsing numeric and Chinese characters in Objective-C


This is my XML:

<?xml version="1.0" encoding="UTF-8"?>
<Tests>
    <Test case="1">
        <str>200000</str>
    </Test>
    <Test case="2">
        <str>200thousand</str>
    </Test>
    <Test case="3">
        <str>共20萬</str>
    </Test>
    <Test case="4">
        <str>20萬</str>
    </Test>
</Tests>

This is a part of parser, which is very standard since I found it in most of tutorials:

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string { 
    if(!currentElementValue)
        currentElementValue = [[NSMutableString alloc] initWithString:string];
    else
        currentElementValue = 
            (NSMutableString *) [string stringByTrimmingCharactersInSet:
                [NSCharacterSet whitespaceAndNewlineCharacterSet]];
}

And then I use this line to parse each currentElementValue into each variable of testObj

[testObj setValue:currentElementValue forKey:elementName];

The code successfully parse XML into testObj. However, the problem is in case 4, "20" is disappeared. i.e. once a element start with numeric and then follow with Chinese characters, numeric is then disappeared.

Besides, if I use:

        [currentElementValue appendString:string];

instead of:

        currentElementValue = 
            (NSMutableString *) [string stringByTrimmingCharactersInSet:
                [NSCharacterSet whitespaceAndNewlineCharacterSet]];

The elements can show all the characters but starts with many white spaces.

I would like to figure out why numeric is disappeared and look for solutions to show all characters without white spaces leading.

Thanks in advance for any help you are kind enough to provide!


Solution

  • See the documentation of the parser:foundCharacters: delegate method:

    The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.

    So you have to use appendString in foundCharacters. The additional white space could be caused by the fact that you did not reset the current string in didStartElement and didEndElement:

    - (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
    {
         currentElementValue = nil;
    }
    
    - (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
    {
         if (currentElementValue) {
             NSLog(@"%@ = %@", elementName, currentElementValue);
             currentElementValue = nil;
         }
    }
    

    If necessary, you can remove unwanted white space in didEndElement.