Through a POST request using the httr
package, I get back XML in the following format:-
<ReportDelivery responsecode="0" responsetext="descriptive text">
<Terminal isn="DCC000000001" imo="111111111" name="MV Vessel A">
<Report>
<DateTime>01/10/2014 15:30:45</DateTime>
<Lat>99.9999999</Lat>
<Lon>999.9999999</Lon>
<Cog>999</Cog>
<Sog>999</Sog>
<Voltage>99</Voltage>
<Status>Description of status</Status>
</Report>
<Report>
..
</Report>
</Terminal>
<Terminal isn="DCC000000002" imo="222222222" name="MV Vessel B">
..
</Terminal>
</ReportDelivery>
I am able to get the "Report" part in a data frame using two functions available here:-
#Using functions from https://rud.is/rpubs/xml2power/
xtrct <- function(doc, target) { xml_find_all(doc, target) %>% xml_text() %>% trimws() }
xtrct_df <- function(doc, top) {
xml_find_first(doc, sprintf(".//%s", top)) %>%
xml_children() %>%
xml_name() %>%
map(~{
xtrct(doc, sprintf(".//%s/%s", top, .x)) %>%
list() %>%
set_names(tolower(.x))
}) %>%
flatten_df() %>%
readr::type_convert()
}
x <- xtrct_df(doc, "Report")
Within each Terminal node, there are multiple reports pertaining to a particular ship whose attributes are given in the Terminal node.
Currently, the columns in x
are:
names(x)
[1] "datetime" "lat" "lon" "cog" "sog" "voltage" "status"
How can I add the name of the ship as a column to this dataframe? I can extract the name attribute using :-
xattrs <- xpathSApply(z, "//*/Terminal/@name")
BBut have no clue on how to include this as a variable in the dataframe. Would appreciate some help please.
Taking a somewhat different route from @hrbrmstr, we can map_df
on each element, while also finding the parent and extracting the appropriate attr
:
library(xml2)
library(purrr)
col_names <- read_xml(x) %>%
xml_find_first('.//Report') %>%
xml_children() %>%
xml_name()
read_xml(x) %>%
xml_find_all(".//Report") %>%
map_df(~{
parent_name <- xml_parent(.x) %>%
xml_attr('name') %>%
as.character()
xml_children(.x) %>%
as_list() %>%
data.frame(stringsAsFactors = FALSE) %>%
set_names(col_names) %>%
cbind(VesselName=parent_name)
})
#> DateTime Lat Lon Cog Sog Voltage
#> 1 01/10/2014 15:30:45 99.9999999 999.9999999 999 999 99
#> 2 01/10/2014 15:30:45 99.9999999 999.9999999 999 999 99
#> 3 01/10/2014 15:30:45 99.9999999 999.9999999 999 999 99
#> Status VesselName
#> 1 Description of status MV Vessel A
#> 2 Description of status MV Vessel A
#> 3 Description of status MV Vessel B
Created on 2018-05-07 by the reprex package (v0.2.0).
Data:
x <- '<ReportDelivery responsecode="0" responsetext="descriptive text">
<Terminal isn="DCC000000001" imo="111111111" name="MV Vessel A">
<Report>
<DateTime>01/10/2014 15:30:45</DateTime>
<Lat>99.9999999</Lat>
<Lon>999.9999999</Lon>
<Cog>999</Cog>
<Sog>999</Sog>
<Voltage>99</Voltage>
<Status>Description of status</Status>
</Report>
<Report>
<DateTime>01/10/2014 15:30:45</DateTime>
<Lat>99.9999999</Lat>
<Lon>999.9999999</Lon>
<Cog>999</Cog>
<Sog>999</Sog>
<Voltage>99</Voltage>
<Status>Description of status</Status>
</Report>
</Terminal>
<Terminal isn="DCC000000002" imo="222222222" name="MV Vessel B">
<Report>
<DateTime>01/10/2014 15:30:45</DateTime>
<Lat>99.9999999</Lat>
<Lon>999.9999999</Lon>
<Cog>999</Cog>
<Sog>999</Sog>
<Voltage>99</Voltage>
<Status>Description of status</Status>
</Report>
</Terminal>
</ReportDelivery>'