Search code examples
rxmlxml2

get value from xml with r by attribute


I'm trying to get values from xml that looks like this:

<data>
    <result name="r">
        <item>
            <str name="id">123</str>
            <str name="xxx">aaa</str>
        </item>
        <item>
            <str name="id">456</str>
            <str name="xxx">aaa</str>
        </item>
    </result>
</data>

So far, I can get the id value in the following way:

xmlfile <- xmlParse(url)
data <- xmlRoot(xmlfile) 
result <- xmltop[["result"]]
for (i in xmlSize(result)) {
  print(xmlValue(result[[i]][[1]]))
}

This seems highly inefficient and only works if "id" is stored in the first child element. So, is there a way to get the value of an element (123, 456) by searching for the attribute (name) and value (id)?


Solution

  • The xml2 package is very good for solving this type of problem.

    library(xml2)
    page<-read_xml('<data>
        <result name="r">
                   <item>
                   <str name="id">123</str>
                   <str name="xxx">aaa</str>
                   </item>
                   <item>
                   <str name="id">456</str>
                   <str name="xxx">aaa</str>
                   </item>
                   </result>
                   </data>')
    
    #find all str nodes
     nodes<-xml_find_all(page, ".//str")
    #filter out the nodes where the attribute name=id
     nodes<-nodes[xml_attr(nodes, "name")=="id"]
    #get values (as character strings)
     xml_text(nodes)
    

    Update

    Using Xpath selectors everything can be accomplished in 1 line

    #R verison >4.0
    xml_find_all(page, ".//str[@name='id']") |> xml_text()
    

    Here is a link to a handy xpath path cheat sheet: https://www.red-gate.com/simple-talk/development/dotnet-development/xpath-css-dom-and-selenium-the-rosetta-stone/