Through a POST request using the httr
package, I get back XML in the following format:-
<ReportDelivery responsecode="0" responsetext="descriptive text">
<Terminal isn="DCC000000001" imo="111111111" name="MV Vessel A">
<DateTime>01/10/2014 15:30:45</DateTime>
<Status>Description of status</Status>
<Terminal isn="DCC000000002" imo="222222222" name="MV Vessel B">
I am able to get the "Report" part in a data frame using two functions available here:-
#Using functions from
xtrct <- function(doc, target) { xml_find_all(doc, target) %>% xml_text() %>% trimws() }
xtrct_df <- function(doc, top) {
xml_find_first(doc, sprintf(".//%s", top)) %>%
xml_children() %>%
xml_name() %>%
xtrct(doc, sprintf(".//%s/%s", top, .x)) %>%
list() %>%
}) %>%
flatten_df() %>%
x <- xtrct_df(doc, "Report")
Within each Terminal node, there are multiple reports pertaining to a particular ship whose attributes are given in the Terminal node.
Currently, the columns in x
[1] "datetime" "lat" "lon" "cog" "sog" "voltage" "status"
How can I add the name of the ship as a column to this dataframe? I can extract the name attribute using :-
xattrs <- xpathSApply(z, "//*/Terminal/@name")
BBut have no clue on how to include this as a variable in the dataframe. Would appreciate some help please.
Taking a somewhat different route from @hrbrmstr, we can map_df
on each element, while also finding the parent and extracting the appropriate attr
col_names <- read_xml(x) %>%
xml_find_first('.//Report') %>%
xml_children() %>%
read_xml(x) %>%
xml_find_all(".//Report") %>%
parent_name <- xml_parent(.x) %>%
xml_attr('name') %>%
xml_children(.x) %>%
as_list() %>%
data.frame(stringsAsFactors = FALSE) %>%
set_names(col_names) %>%
#> DateTime Lat Lon Cog Sog Voltage
#> 1 01/10/2014 15:30:45 99.9999999 999.9999999 999 999 99
#> 2 01/10/2014 15:30:45 99.9999999 999.9999999 999 999 99
#> 3 01/10/2014 15:30:45 99.9999999 999.9999999 999 999 99
#> Status VesselName
#> 1 Description of status MV Vessel A
#> 2 Description of status MV Vessel A
#> 3 Description of status MV Vessel B
