I've recently been playing with code for an iPhone app to parse XML. Sticking to Cocoa, I decided to go with the NSXMLParser class. The app will be responsible for parsing 10,000+ "computers", all which contain 6 other strings of information. For my test, I've verified that the XML is around 900k-1MB in size.
My data model is to keep each computer in an NSDictionary hashed by a unique identifier. Each computer is also represented by a NSDictionary with the information. So at the end of the day, I end up with a NSDictionary containing 10k other NSDictionaries.
The problem I'm running into isn't about leaking memory or efficient data structure storage. When my parser is done, the total amount of allocated objects only does go up by about 1MB. The problem is that while the NSXMLParser is running, my object allocation is jumping up as much as 13MB. I could understand 2 (one for the object I'm creating and one for the raw NSData) plus a little room to work, but 13 seems a bit high. I can't imaging that NSXMLParser is that inefficient. Thoughts?
Code...
The code to start parsing...
NSXMLParser *parser = [[NSXMLParser alloc] initWithData: data];
[parser setDelegate:dictParser];
[parser parse];
output = [[dictParser returnDictionary] retain];
[parser release];
[dictParser release];
And the parser's delegate code...
-(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict {
if(mutableString)
{
[mutableString release];
mutableString = nil;
}
mutableString = [[NSMutableString alloc] init];
}
-(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
if(self.mutableString)
{
[self.mutableString appendString:string];
}
}
-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if([elementName isEqualToString:@"size"]){
//The initial key, tells me how many computers
returnDictionary = [[NSMutableDictionary alloc] initWithCapacity:[mutableString intValue]];
}
if([elementName isEqualToString:hashBy]){
//The unique identifier
if(mutableDictionary){
[mutableDictionary release];
mutableDictionary = nil;
}
mutableDictionary = [[NSMutableDictionary alloc] initWithCapacity:6];
[returnDictionary setObject:[NSDictionary dictionaryWithDictionary:mutableDictionary] forKey:[NSMutableString stringWithString:mutableString]];
}
if([fields containsObject:elementName]){
//Any of the elements from a single computer that I am looking for
[mutableDictionary setObject:mutableString forKey:elementName];
}
}
Everything initialized and released correctly. Again, I'm not getting errors or leaking. Just inefficient.
Thanks for any thoughts!
Can't say anything specific about your code but take a look at Apple's XMLPerformance sample - it compares NSXMLParser and libxml performance - results are definitely in favour of the latter. In one of my projects switching from NSXMLParser to libxml gave a great performance boost, so I'd suggest using it.