I am parsing a simple XML file, however sometimes there are tags that contain ampersands (&) in the node. I've done some research here and here but the problem is persisting. The problem is that the parser simply stops when it encounters the offending XML element. The XML looks like this:
<video>
<video_id>42</video_id>
<video_header>Six & Eight</video_header>
<video_subheader>So Long</video_subheader>
</video>
The parser is updating an object, called DisStep
, that has a parsedVideoArray
attribute. The attribute is just an array of Parsed_Video
objects. So the problem would be that when the the parser gets to foundCharacters
for the element video_header
it will not continue to didEndElement
. In fact, an NSLog
in the foundCharacters method of currentNodeContent
is just "Six "
.
And here is the code for the parser. All it does is look for videos and gather info about them.
-(void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName
attributes:(NSDictionary *)attributeDict
{
if ([elementName isEqualToString:@"video"])
{
videoBeingParsed = [[Parsed_Video alloc] init];
}
}
-(void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
string = [string stringByReplacingOccurrencesOfString:@"&" withString:@"&"];
currentNodeContent = (NSMutableString *) string;
}
- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName
{
if ([elementName isEqualToString:@"video_id"])
{
videoBeingParsed.Video_ID = currentNodeContent;
currentNodeContent = nil;
}
else if ([elementName isEqualToString:@"video_header"])
{
videoBeingParsed.Video_Header = currentNodeContent;
currentNodeContent = nil;
}
else if ([elementName isEqualToString:@"video_subheader"])
{
videoBeingParsed.Video_SubHeader = currentNodeContent;
currentNodeContent = nil;
}
else if ([elementName isEqualToString:@"video"])
{
[DisStep.parsedVideoArray addObject:videoBeingParsed];
currentNodeContent = nil;
videoBeingParsed = nil;
}
}
@end
I tried the stringByReplacingOccurrencesOfString: withString:
but the parser still stops working. Is there a way around this other than changing the XML?
The issue is that you have not been given XML and the parser legitimately gets in a mess as it sees data that is not legal.. The XML specification says
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings
"&"
and"<"
respectively.
Thus you have to alter the XML and replace & by &