Search code examples
xmlscalascala-xml

Scala convert XML to key value map


Related to This topic

Problem is as follows, imagine an XML without any particular schema

<persons>
  <total>2</total>
  <someguy>
     <firstname>john</firstname>
     <name>Snow</name>
  </someguy>
  <otherperson>
     <sex>female</sex>
  </otherperson>
</persons>

For processing I want to have this in a Key Value Map:

"Persons/total" -> 2
"Persons/someguy/firstname" -> john
"Persons/someguy/name" -> Snow
"Persons/otherperson/sex" -> female

Preferably I have some nice recursive function where I traverse the XML code depth-first and simply stack all labels until I find a value and return that value together with the stack of labels. Unfortunately I am struggling to connect the return type with the input type as I return a Sequence of my input.. Let me show you what I have so far, clearly the foreach is a problem as this returns Unit, but the map would also not work as it returns a Seq.

def dfs(n: NodeSeq, keyStack: String, map: Map[String,String])
 :(NodeSeq, String, Map[String,String]) = {
  n.foreach(x => {
    if (x.child.isEmpty) {
      dfs(x.child, keyStack, map + (keyStack+ x.label + " " -> x.text))
    }
    else {
      dfs(x.child, keyStack+ x.label + "/", map)
    }
  }
  )
}

Would greatly appreciate the help!


Solution

  • After some playing around, this is the most elegant way in which I could do it. What I don't like is:

    • It goes depth-first for every child, so you need to flat out the result afterwards. This is also why I miss the root node label.
    • It drags a lot of XML along the way, so it might be too memory intensive?

    Please improve if you have ideas!

    import scala.xml._
    
    val xml = "<persons><total>2</total><someguy><firstname>john</firstname><name>Snow</name></someguy><otherperson><sex>female</sex></otherperson></persons>"
    val result: Elem = scala.xml.XML.loadString(xml)
    
    def linearize(node: Node, stack: String, map: Map[String,String])
    : List[(Node, String, Map[String,String])] = {
      (node, stack, map) :: node.child.flatMap {
        case e: Elem => {
          if (e.descendant.size == 1) linearize(e, stack, map ++ Map(stack + "/" + e.label -> e.text))
          else linearize(e, stack + "/" + e.label, map)
        }
        case _ => Nil
      }.toList
    }
    
    linearize(result, "", Map[String,String]()).flatMap(_._3).toMap
    

    We still need to flatten the Map afterwards but at least the recursive part is rather short. Code above should work in your Scala worksheet.