Search code examples
scalascala-xml

Subsequent RewriteRules don't transform elements added in previous transform


import scala.xml._
import scala.xml.transform.{RewriteRule, RuleTransformer}

object TransformIssue {
  def addNewElement(): RewriteRule = new RewriteRule {
    override def transform(n: Node): Seq[Node] = n match {
      case <element></element> => <element><new></new></element>
    }
  }

  def addThingElement(): RewriteRule = new RewriteRule {
    override def transform(n: Node): Seq[Node] = n match {
      case <element>{ children@_*}</element> => <element>{ children  }</element>
      case <new></new> => <new><thing></thing></new>
    }
  }

  def change(node: Node): Node =
    new RuleTransformer(
      addNewElement(),
      addThingElement()
    ).transform(node).head

  def changeWorkaround(node: Node): Node = {
    val out1 = new RuleTransformer(
      addNewElement()
    ).transform(node).head

    new RuleTransformer(
      addThingElement()
    ).transform(out1).head
  }

}

--

import org.scalatest.{FlatSpec, FunSpec}
import org.scalatest._

class TransformIssueSpec extends FlatSpec with Matchers {

  it should "apply transform to created elements" in {
    val output = TransformIssue.change(<element></element>)
    output should be(<element><new><thing></thing></new></element>)
  } // fails

  it should "work the same as the workaround imo" in {
    TransformIssue.change(<element></element>) should equal(TransformIssue.changeWorkaround(<element></element>))
  } // fails

}

When we apply a transform with two rewrite rules: the first one adding a new element, the second one adding children to the new element; then the second rewrite rule does not match on the elements added in the first rule.

When we apply the same RewriteRules in two separate RuleTransformers it does add the children to the elements added in the first step. We would expect the change and changeWorkaround functions to produce the same output.

Issue raised at scala xml


Solution

  • You are not applying it to the children.

      def addThingElement(): RewriteRule = new RewriteRule {
        override def transform(n: Node): Seq[Node] = n match {
          case <element>{ children@_*}</element> => <element>{ transform(children)  }</element>
          case <new></new> => <new><thing></thing></new>
        }
      }
    

    That works.

    So here's the deal: on BasicTransformer, def transform(n: Node): Seq[Node] applies def transform(ns: Seq[Node]): Seq[Node] to all children of n, and the latter method applies the former to every node.

    RuleTransformer overrides the former method, then calls it and then applies the RewriteRule on the result, so it works recursively.

    This is so confusing it took me a while to re-trace the code to what I recalled from it. Here it is:

    1. RuleTransformer Node => Seq[Node] calls super (BasicRewrite)
    2. BasicRewrite Node => Seq[Node] calls Seq[Node] => Seq[Node] on child
    3. BasicRewrite Seq[Node] => Seq[Node] calls Node => Seq[Node] on each
    4. Since Node => Seq[Node] is overriden, if Seq isn't empty it recurses back to 1
    5. RuleTransformer now applies each RewriteRule in sequence.

    In the broken case, then, it will go like this:

    RuleTransformer.transform(<element/>)
    BasicRewrite.transform(<element/>)
    BasicRewrite.transform(Seq.empty)
    addNewElement(<element/>)
    addThingElement(<element><new/></element>)
    

    In the working case it will go like this:

    RuleTransformer.transform(<element/>)
    BasicRewrite.transform(<element/>)
    BasicRewrite.transform(Seq.empty)
    addNewElement(<element/>)
    RuleTransformer.transform(<element><new/></element>)
    BasicRewrite.transform(<element><new/></element>)
    BasicRewrite.transform(Seq(<new/>))
    RuleTransformer.transform(<new/>)
    BasicRewrite.transform(<new/>)
    BasicRewrite.transform(Seq.empty)
    addThingElement(<new/>)
    addThingElement(<element><new><thing/></new></element>)