Search code examples
xmlscalastreamxmlstreamreader

XMLStream Reader Scala - Proper Handle of END_DOCUMENT


I am actually restructuring an existing large software and switching from XMLEventReader to XMLStreamReader for memory efficiency purposes.

Consider following definition of a function which currently just read XML event and prints them:

def evalStreamReader(source: StreamSource): Unit

    val StreamReader = XMLInputFactory.newInstance().createXMLStreamReader(source)
    while (StreamReader.hasNext) {
      val eventType = StreamReader.getEventType
      eventType match {
        case 1 => println("Start Element " + eventType + " : " + StreamReader.getLocalName)
        case 2 => println("End Element " + eventType + " : " + StreamReader.getLocalName)
        case 4 => println("Characters " + eventType + " : " + StreamReader.getText)
        case 7 => println("Start Document " + eventType)
        case 8 => println("End Document: " + eventType)

      }
      StreamReader.next()
    }
}

and the simple XML file :

<a>
  <c></c>
</a>

The output would be :

Start Document 7
Start Element 1 : a
Characters 4 : 
  
Start Element 1 : c
End Element 2 : c
Characters 4 : 

End Element 2 : a

Is there anyway I can handle the event END_DOCUMENT separately/correctly within the While condition/case matching, and not outside the loop? I tried many conditions and do while but no success.

The idea is that StreamReader.next() will switch the cursor the END_DOCUMENT event and hasNext will return false by its definition in the API:

boolean hasNext()
         throws XMLStreamException
Returns true if there are more parsing events and false if there are no more events. 
This method will return false if the current state of the XMLStreamReader is END_DOCUMENT

Solution

  • A sample naive non-idiomatic approach:

    def evalStreamReader(source: StreamSource): Unit = {
      val streamReader: XMLStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(source)
      var finished: Boolean = false
    
      do {
        val eventType = streamReader.getEventType
    
        if (eventType != 8) {
          eventType match {
            case 1 => println("Start Element " + eventType + " : " + streamReader.getLocalName)
            case 2 => println("End Element " + eventType + " : " + streamReader.getLocalName)
            case 4 => println("Characters " + eventType + " : " + streamReader.getText)
            case 7 => println("Start Document " + eventType)
          }
          streamReader.next()
        } else {
          println("End Document: " + eventType)
          finished = true
        }
      } while(!finished)
      streamReader.close()
    }
    

    It would output:

    Start Document 7
    Start Element 1 : a
    Characters 4 : 
      
    Start Element 1 : c
    End Element 2 : c
    Characters 4 : 
    
    End Element 2 : a
    End Document: 8
    

    basically you need to separate flow control from the reader.