Search code examples
rxmlxml2

Using xml_replace leaves behind some formatting


I am trying to replace some nodes of an XML document with text using the xml2 library in R. In the example below I'm trying to turn all the "name" nodes into text, but the final result still has the "<" and "/>" around the text.

library(xml2)
x <- read_xml(
  "<scenario>
  <event>
  <dataProbeEvent>
  <name>LogSurvResHigh</name>
  </dataProbeEvent>
  </event>
  <event>
  <accumulateEvent>
  <name>SetSurvOut</name>
  </accumulateEvent>
  </event>
  </scenario>")
x
> {xml_document}
<scenario>
[1] <event>\n  <dataProbeEvent>\n    <name>LogSurvResHigh</name>\n  </dataProbeEvent>\n ...
[2] <event>\n  <accumulateEvent>\n    <name>SetSurvOut</name>\n  </accumulateEvent>\n</ ...
namerefs <- xml_find_all(x, './/name')
replacements = namerefs %>%xml_text()
xml_replace(namerefs, replacements)
> {xml_document}
<scenario>
[1] <event>\n  <dataProbeEvent>\n    <LogSurvResHigh/>\n  </dataProbeEvent>\n</event>
[2] <event>\n  <accumulateEvent>\n    <SetSurvOut/>\n  </accumulateEvent>\n</event>

What I want it to look like is:

> {xml_document}
<scenario>
[1] <event>\n  <dataProbeEvent>\n    LogSurvResHigh\n  </dataProbeEvent>\n</event>
[2] <event>\n  <accumulateEvent>\n    SetSurvOut\n  </accumulateEvent>\n</event>

Solution

  • You should use the following:-

    x <- as.character(x)
    x_sub <- gsub("<name[^>]*>|<\\/name>","",x)
    x <- read_xml(x_sub)
    x
    
    {xml_document}
    <scenario>
    [1] <event>\n  <dataProbeEvent>\n      LogSurvResHigh\n  </dataProbeEvent>\n</event>
    [2] <event>\n  <accumulateEvent>\n      SetSurvOut\n    </accumulateEvent>\n</event>
    

    This will remove ref-type="bibr" rid="CR8" kind of attributes from the name node.