I'm trying to extract a couple of elements from XML files into an R dataframe but the parent nodes are all named the same, so I don't know how to associate child elements. I'm very new to xml (about 3 hours) so apologies if I use the wrong terminology. I did not find any R-based solutions.
This is the general structure of the xml files:
<Annotations>
<Version>1.0.0.0</Version>
<Annotation>
<MicronLength>14.1593438418</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>1</ObjIndex>
</Annotation>
<Annotation>
<MicronLength>5.7578076896</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>2</ObjIndex>
</Annotation>
</Annotations>
There are many "Annotation" nodes. There are also several other children node names in there but they don't matter as I'm just trying to extract MicronLength and ObjIndex into a dataframe. So I need to either:
OR
I also have several xml files so I want to iterate over each one to eventually create a DF like the example below.
| filename | ObjIndex | MicronLength |
| ------------------ | -------- | ------------- |
| examplefile1(.xml) | 1 | 14.1593438418 |
| examplefile1 | 2 | 5.7578076896 |
| examplefile2 | 1 | 12.6345661343 |
The filenames (with or without extension) will then be str_split into some more columns but I can do that myself.
Much appreciated!
I have previously used xml_find_all()
for this kind of simple conversion.
This works as long as each Annotation
node always has exactly
one ObjIndex
and MicronLength
child node:
library(xml2)
xml <- read_xml("
<Annotations>
<Version>1.0.0.0</Version>
<Annotation>
<MicronLength>14.1593438418</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>1</ObjIndex>
</Annotation>
<Annotation>
<MicronLength>5.7578076896</MicronLength>
<MicronHeight>0.0000000000</MicronHeight>
<ObjIndex>2</ObjIndex>
</Annotation>
</Annotations>
")
data.frame(
ObjIndex = xml_integer(xml_find_all(xml, "Annotation/ObjIndex")),
MicronLength = xml_double(xml_find_all(xml, "Annotation/MicronLength"))
)
#> ObjIndex MicronLength
#> 1 1 14.159344
#> 2 2 5.757808