Search code examples
rwml

xml to R Attributes


I have a problem when extracting attributes from xml to R. I have the xml file as follow:

- <export>
  + <ExportRef>
  - <BookNodes>
      - <Book label="romance">
        + <Showing>
        - <Data>
             + <Char1 label="Char1">
             - <Char2 label="Char2">
                   + <SubChar21>
                   - <SubChar22>
                        <Range unit="nm">4</Range>
                        <Range unit="nm">8</Range>
                     </SubChar22>
             - <Char3 label="Char3">
                   + <SubChar31>
                   - <SubChar32>
                        <Range Id="1">voc</Range>
                        <Range Id="2">buc</Range>
                     </SubChar32>
          </Data>
      </Book>
      - <Book label="horror">
        + <Showing>
        - <Data>
             + <Char1 label="Char1">
             - <Char2 label="Char2">
                   + <SubChar21>
                   - <SubChar22>
                        <Range unit="nm">4</Range>
                        <Range unit="nm">8</Range>
                     </SubChar22>
             - <Char3 label="Char3">
                   + <SubChar31>
                   - <SubChar32>
                        <Range Id="1">voc</Range>
                        <Range Id="2">buc</Range>
                     </SubChar32>
          </Data>
      </Book>
    </BookNodes>
 </export>

I would like to have a list of the Range Id only for each book categories. For example:

romance:

id id
1  2

horror:

id id
1  2

When I do something like that:

RangeID_1<-xpathSApply(AC_Node[[1]][[2]], ".//Range", xmlAttrs)

I get:

unit unit  id  id
"nm"  "nm" "1"  "2"

How to say to R that I only want the Range Id and not the Range unit?

Thank you very much!!


Solution

  • My two cents with rvest:

    library(rvest)
    read_xml("your_xml_file.xml") %>% 
      xml_nodes("Range") %>% 
      xml_attr("Id")