Search code examples
rxmlxml2

XML: Extracting all time series from the xml query


I am looking for an efficient solution on extracting all times series behind an xml query. My code is:

library(xml2)

# URL of the data provider
url.iscb <- "http://www.sedlabanki.is/xmltimeseries/"

# The data frame to store all the time series
iscb.rates <- data.frame()

# Dates defining the time range
d.all <- as.Date("1990-01-01")
d.now <- Sys.Date()

# XML
u <- paste0(url.iscb,"Default.aspx?DagsFra=",d.all,"T00%3a00%3a00&DagsTil=",
        d.now,"T23%3a59%3a59&GroupID=1&Type=xml")

# Obtaining the data from the web site...
f <- xml2::read_xml(u)
doc <- xml2::as_list(f)

So far, I cannot extract all the time series that are in f. The variable doc seems to store just one time series.


Solution

  • Try this:

    library(xml2)
    library(magrittr)
    
    # URL of the data provider
    url.iscb <- "http://www.sedlabanki.is/xmltimeseries/"
    
    # Dates defining the time range
    d.all <- as.Date("1990-01-01")
    d.now <- Sys.Date()
    
    # XML
    u <- paste0(url.iscb,"Default.aspx?DagsFra=",d.all,"T00%3a00%3a00&DagsTil=",
                d.now,"T23%3a59%3a59&GroupID=1&Type=xml")
    
    # Obtaining the data from the web site...
    f <- xml2::read_xml(u)
    
    #Find the timeseries
    timeseries <-  xml_find_all(f, ".//TimeSeries")
    timeseriesID <- timeseries %>% xml_attr("ID")
    #timeseries %>% xml_find_all(".//Name") %>% xml_text()
    
    #now step through each timeseries and extract the data
    dfs <- lapply(1:length(timeseries), function(index){
       
       currentNode <- timeseries[index]
       #Find all of the Entry Nodes
       entries <-  xml_find_all(currentNode, ".//Entry")
       
       #Extract the Date and Value from each node
       dates <- xml_find_first(entries, ".//Date") %>% xml_text()
       values <- xml_find_first(entries, ".//Value") %>% xml_double()
       
       # The data frame to store all the time series
       iscb.rates <- data.frame(timeseriesID[index], dates, values)
    })
    
    #dfs is a list of dataframes
    #combine into 1 dataframe
    dplyr::bind_rows(dfs)