Search code examples
scalapattern-matchingscala-xml

Using Scala pattern matching to extract XML elements with a certain name, regardless of content


Given the following XML elements --

val nodes = List(
    <foo/>,
    <bar/>,
    <baz/>,
    <bar>qux</bar>,
    <bar quux="corge"/>,
    <bar quux="grauply">waldo</bar>,
    <bar quux="fred"></bar>
)

-- how do I construct a pattern that matches all <bar/>s? I've tried, for instance:

nodes flatMap (_ match {
  case b @ <bar/> => Some(b)
  case _ => None
})

but this matches only the empties.

res17: List[scala.xml.Elem] = List(<bar/>, <bar quux="corge"/>, <bar quux="fred"></bar>)

And if I allow a placeholder for content:

nodes flatMap (_ match {
  case b @ <bar>{content}</bar> => Some(b)
  case _ => None
})

this matches only the non-empties.

res20: List[scala.xml.Elem] = List(<bar>qux</bar>, <bar quux="grauply">waldo</bar>)

I could of course give up on XML literals and just write

nodes flatMap (_ match {
  case e: Elem if e.label == "bar" => Some(e)
  case _ => None
})

but it seems like there must be a more clever way.


Solution

  • You can use the Elem object to match:

    nodes collect { case b @ Elem(_, "bar", _, _, _*) => b }
    

    The source for Elem is here, so you can see the definition of unapplySeq. The source even has a comment:

    It is possible to deconstruct any Node instance (that is not a SpecialNode or a Group) using the syntax case Elem(prefix, label, attribs, scope, child @ _*) => ...

    Another alternative is to use pattern alternatives:

     nodes collect { case b @ (<bar/> | <bar>{_}</bar>) => b }
    

    Note that pattern alternatives cannot bind variables except wildcards.

    If this is a common operation for you, then you might consider writing your own extractor (as documented here). For example:

    object ElemLabel { 
        def unapply(elem: Elem): Option[String] = Some(elem.label) 
    }
    

    And then:

    nodes collect { case b @ ElemLabel("bar") => b }
    

    Of course, in the examples you've provided, you're only filtering, in which case:

    nodes filter { _.label == "bar" }
    

    would suffice, and might be your best bet. Even if you are planning on doing some other operations after the filter, and you are concerned about performance and constructing intermediary collections, you can use a view and obviate this concern.

    Also note the use of collect throughout, which is a more idiomatic way to do the filter, mapping, and matching you're doing with flatMap, match, and Option.