Search code examples
rxmlrename

R - How to rename xml parent node based on child element (or associate child elements)?


I'm trying to extract a couple of elements from XML files into an R dataframe but the parent nodes are all named the same, so I don't know how to associate child elements. I'm very new to xml (about 3 hours) so apologies if I use the wrong terminology. I did not find any R-based solutions.

This is the general structure of the xml files:

<Annotations>
    <Version>1.0.0.0</Version>
    <Annotation>
        <MicronLength>14.1593438418</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>1</ObjIndex>
    </Annotation>
    <Annotation>
        <MicronLength>5.7578076896</MicronLength>
        <MicronHeight>0.0000000000</MicronHeight>
        <ObjIndex>2</ObjIndex>
    </Annotation>
</Annotations>

There are many "Annotation" nodes. There are also several other children node names in there but they don't matter as I'm just trying to extract MicronLength and ObjIndex into a dataframe. So I need to either:

  1. Associate and get both elements from within each "Annotation" node

OR

  1. Rename each "Annotation" based on the ObjIndex within (e.g. "Annotation 1", "Annotation 2", etc.) and then get parent name and child element into the df.

I also have several xml files so I want to iterate over each one to eventually create a DF like the example below.

| filename           | ObjIndex | MicronLength  |

| ------------------ | -------- | ------------- |

| examplefile1(.xml) | 1        | 14.1593438418 |

| examplefile1       | 2        | 5.7578076896  |

| examplefile2       | 1        | 12.6345661343 |

The filenames (with or without extension) will then be str_split into some more columns but I can do that myself.

Much appreciated!


Solution

  • I have previously used xml_find_all() for this kind of simple conversion. This works as long as each Annotation node always has exactly one ObjIndex and MicronLength child node:

    library(xml2)
    
    xml <- read_xml("
    <Annotations>
        <Version>1.0.0.0</Version>
        <Annotation>
            <MicronLength>14.1593438418</MicronLength>
            <MicronHeight>0.0000000000</MicronHeight>
            <ObjIndex>1</ObjIndex>
        </Annotation>
        <Annotation>
            <MicronLength>5.7578076896</MicronLength>
            <MicronHeight>0.0000000000</MicronHeight>
            <ObjIndex>2</ObjIndex>
        </Annotation>
    </Annotations>
    ")
    
    data.frame(
      ObjIndex = xml_integer(xml_find_all(xml, "Annotation/ObjIndex")),
      MicronLength = xml_double(xml_find_all(xml, "Annotation/MicronLength"))
    )
    #>   ObjIndex MicronLength
    #> 1        1    14.159344
    #> 2        2     5.757808