Search code examples
rxmlrvest

when getting data from xml file, get missing value if node is missing


I have the following problem:

library(xml2)

test_xml <- "
<test>
<Trial>
<ID> 1 </ID>
<plouf> plouf </plouf>
</Trial>
<Trial>
<ID> 2 </ID>
</Trial>
</test>"

test <- as_xml_document(test_xml)

When I do :

test %>%
  xml_find_all("//plouf") %>%
  xml_text()

I only get

[1] " plouf "

How can I get a vector of same length than the initial xml (here 2), with missing value when the node <plouf> is absent from <Trial>?


Solution

  • You can first extract only the <Trial> nodes and then use xml_find_first() to extract <plouf> nodes from them. Unlike xml_find_all(), xml_find_first() returns a missing node when there is no match so you will get NA when there is no <plouf>.

    test %>%
      xml_find_all("//Trial") %>%
      xml_find_first("plouf") %>%
      xml_text()
    # [1] " plouf " NA