Search code examples
scalascala-xml

scala-xml child method from an xml node gets trailling white space


I am actually working on windows and I have to parse xml from a file.

The issue is when i parse the root element, and get the children via the child method, I am getting empty children.

XML.load("my_path\\sof.xml").child

res0: Seq[scala.xml.Node] = List(
    , <b/>, 
)

This is my xml file

sof.xml
<a>
    <b></b>
</a>

But when I remove every \n and \r of the file like this :

sof.xml
<a><b></b></a>

I got the following result which is expected

res0: Seq[scala.xml.Node] = List(<b/>)

My question is, is there an option to read it correctly from the intended form?


Solution

  • The issue is the newlines/whitespace are treated as Text nodes. The scala.xml.Utility.trim(x: Node) method will remove the unnecessary whitespace:

    scala> val a = XML.loadString("""<a>
         |     <b></b>
         | </a>""")
    a: scala.xml.Elem =
    <a>
        <b/>
    </a>
    
    scala> scala.xml.Utility.trim(a)
    res0: scala.xml.Node = <a><b/></a>
    

    Note that this differs from the .collect method if you have actual Text nodes inbetween elements, e.g.:

    scala> val a = XML.loadString("""<a>
         |    <b>Test </b>   Foo    
         |    </a>""")
    a: scala.xml.Elem =
    <a>
       <b>Test </b>   Foo
    </a>
    
    scala> scala.xml.Utility.trim(a).child
    res0: Seq[scala.xml.Node] = List(<b>Test</b>, Test)
    
    scala> a.child.collect { case e: scala.xml.Elem => e }
    res1: Seq[scala.xml.Elem] = List(<b>Test </b>)
    

    Using .collect method, the "Foo" string is excluded from the children list.