Search code examples
rxmlkml

Extract specific data from a KML attribute table using R


I am automating a process to download and extract data from a KML from the internet. The values I am trying to extract are in a large string and I cannot figure out how to get the single needed values out.

basins <- c('CARSON - CRSN CITY L (STWN2LLF)',
            'CARSON - CRSN CITY L (STWN2LUF)',
            'EF CARSON - GRDNVL L (GRDN2LLF)',
            'EF CARSON - GRDNVL L (GRDN2LUF)',
            'EF CARSON-MRKLEEVLLE (CEMC1HLF)',
            'EF CARSON-MRKLEEVLLE (CEMC1HUF)',
            'WF CARSON - WOODFRDS (WOOC1HOF)')

date <- c('Mar_04_2023')
url_upper <- paste('https://www.cnrfc.noaa.gov/archive/sweBasins/SWEbasinsVal_', date, ".kml", sep = "")
kml_upper <- st_read(url_upper)
kml_upper <- subset(kml_upper, kml_upper$Name %in% basins)
kml_upper$geometry <- NULL
head(kml_upper,3)

The data looks like this:
enter image description here

I need to extract out the 43.81 in, which comes after the "...Simulated Basin Snow Water Equivalent for Mar 04, 2023". I am not able to add the specific text, because SO recognizes it as formatting.

What is the best way to extract the data?


Solution

  • It looks like the information you are requesting is stored as a HTML table inside the xml/xml code. I am sure there is a way for sf to extract out the information.
    This is not that way. I am extracting the HTML out of the XML using the xml2 library. Then I am convert the text into HTML with rvest to extract the table out into a usable form.

    Out of your basin list, I was only able to find 3 in the downloaded file.

    library(dplyr)
    library(rvest)
    library(xml2)
    
    page <- xml2::read_xml(url_upper)
    #xml2::xml_ns(page)
    xml_ns_strip(page)  #strip the name space
    
    #find all of the placemarks and extract out
    places <- page %>% xml_find_all(".//Placemark")
    #get the names 
    namesOfPlaces <- places %>% xml_find_first(".//name") %>% xml_text()
    #find the places from the names o which are in basin list (3 in this case)
    placesOfInterest <- which(namesOfPlaces %in% basins)
    descriptionsInPlaces <- places %>% xml_find_all(".//description") %>% xml_text()
    #reduce descriptions down to ones of interest
    descriptionsInPlaces <- descriptionsInPlaces[placesOfInterest]
    
    #loop through the list extracting the desired information
    answer <- lapply(descriptionsInPlaces, function(node){
       convertHTML <- read_html(node)
       output <- convertHTML %>% html_elements("table") %>% html_table()
    })
    names(answer) <- namesOfPlaces[placesOfInterest]
    #a list of table with the requested information
    

    Here is the resultant data. I leave it to the reader to extract the first row for each data frame in the list.

        answer
    $`EF CARSON-MRKLEEVLLE (CEMC1HUF)`
    $`EF CARSON-MRKLEEVLLE (CEMC1HUF)`[[1]]
    # A tibble: 3 × 2
      X1                                                      X2       
      <chr>                                                   <chr>    
    1 Simulated Basin Snow Water Equivalent for Mar 04, 2023: 43.81 in.
    2 SWE Percent of Normal:                                  207%     
    3 Average month-to-date SWE through Mar 04, 2023:         21.12 in.
    
    
    $`EF CARSON - GRDNVL L (GRDN2LUF)`
    $`EF CARSON - GRDNVL L (GRDN2LUF)`[[1]]
    # A tibble: 3 × 2
      X1                                                      X2       
      <chr>                                                   <chr>    
    1 Simulated Basin Snow Water Equivalent for Mar 04, 2023: 16.07 in.
    2 SWE Percent of Normal:                                  429%     
    3 Average month-to-date SWE through Mar 04, 2023:         3.75 in. 
    
    
    $`CARSON - CRSN CITY L (STWN2LUF)`
    $`CARSON - CRSN CITY L (STWN2LUF)`[[1]]
    # A tibble: 3 × 2
      X1                                                      X2       
      <chr>                                                   <chr>    
    1 Simulated Basin Snow Water Equivalent for Mar 04, 2023: 15.26 in.
    2 SWE Percent of Normal:                                  307%     
    3 Average month-to-date SWE through Mar 04, 2023:         4.97 in.