Search code examples
cocoansxmlparserfoundation

XMLParser splits elements on “ß” character


I have this code in an iOS Playground (Swift 3, Xcode 8.2.1):

import UIKit
import PlaygroundSupport

PlaygroundPage.current.needsIndefiniteExecution = true

class ParserDelegate: NSObject, XMLParserDelegate {

    @objc func parser(_ parser: XMLParser, foundCharacters string: String) {
        print("found string:", string)
    }

    func parser(_ parser: XMLParser, parseErrorOccurred parseError: Error) {
        print("error:", parseError)
    }

    func parserDidEndDocument(_ parser: XMLParser) {
        PlaygroundPage.current.finishExecution()
    }

}

let string = "<xml>straße</xml>"
let parser = XMLParser(data: string.data(using: .utf8)!)
let delegate = ParserDelegate()
parser.delegate = delegate
parser.parse()

// prints this:
// found string: stra
// found string: ße

Why does XMLParser split straße into stra and ße, instead of parsing it all as one string? Is there an easy way around this, other than to concatenate all strings found by parser(_:foundCharacters:) until I get a call to parser(_:didEndElement:namespaceURI:qualifiedName:)?


Solution

  • It is not your business to care how the parser breaks up a run of text. It is your business to implement parser(_:foundCharacters:) in such a way as to accumulate the text no matter how many times it is called until didEndElement arrives. A typical implementation will look like this:

    func parser(_ parser: XMLParser, foundCharacters string: String) {
        self.text = self.text + string
    }
    

    ...where self.text is a property, managed in didStartElement and didEndElement.

    Is there an easy way around this

    That's a very silly way to look at it. It isn't something you need a "way around". There's a right way to implement foundCharacters. Do it and get on with life.