Search code examples
rdatabasedata-miningrnoaa

RNOAA R package data access


I've been trying to use the r package rnoaa to download climate data from weather stations closest to my sites of study (essentially almost every state or national park in the state of Florida) over the course of two decades.

I have not found any vignettes or tutorials that help or really make sense to me especially considering the number of parks I'm working with. I was wondering if someone on here has any experience working with this package and could show an example on how to do this with a few parks from my list?

I also have the park longitudes and latitudes:

df<-structure(list(ParkName = structure(c(2L, 6L, 4L, 7L, 5L, 6L, 
3L, 3L, 1L), .Label = c("Big Talbot Island State Park", "Fakahatchee Strand Preserve State Park", 
"Jonathan Dickinson State Park", "Key Largo Hammocks", "Myakka River State Park", 
"Paynes Prairie Preserve State Park", "Sebastian Inlet State Park"
), class = "factor"), ParkLatitude = c(26.02109, 29.57728, 25.25342, 
27.86018, 27.2263, 29.57728, 27.00857, 27.00857, 30.47957), ParkLongitude = c(-81.42208, 
-82.30675, -80.31574, -80.45221, -82.26661, -82.30675, -80.13897, 
-80.13897, -81.43955), Year = c(2004L, 2000L, 1996L, 1997L, 2008L, 
2002L, 2004L, 2002L, 1995L)), .Names = c("ParkName", "ParkLatitude", 
"ParkLongitude", "Year"), class = "data.frame", row.names = c(NA, 
-9L))

The end goal from this example data would be to have annual temperatures, humidity and other environmental variables from weather stations closest to these parks (or park coordinates) for the years listed in the data. I know that there might be missing data for those years depending on the weather station.


Solution

  • This should get you started (using df from your question):

    library(rnooa)
    
    # load station data - takes some minutes
    
    station_data <- ghcnd_stations()
    
    # add id column for each location (necessary for next function)
    
    df$id <- 1:nrow(df)
    
    # retrieve all stations in radius (e.g. 20km) using lapply
    
    stations <- lapply(1:nrow(df),
                       function(i) meteo_nearby_stations(df[i,],lat_colname = 'ParkLatitude',lon_colname = 'ParkLongitude',radius = 20,station_data = station_data)[[1]])
    
    # pull data for nearest stations -  x$id[1] selects ID of closest station
    
    stations_data <- lapply(stations,function(x)  meteo_pull_monitors(x$id[1]))
    

    This will give you all variables for the nearest station. Of course, you can specify which variables you need with var in meteo_pull_monitors from all the available variables.

    Your next step would be to check if the variables you want are available for these stations within your desired time frame. If not, you could use the next closest one.

    E.g.

    The closest station to your first park only has precipitation, min and max temperature:

    stations_data[[1]]
    
    # # A tibble: 4,077 x 5
    # id       date  prcp  tmax  tmin
    # <chr>     <date> <dbl> <dbl> <dbl>
    # 1 USW00092826 2007-02-01    NA    NA    NA
    # 2 USW00092826 2007-02-02    NA    NA    NA
    # 3 USW00092826 2007-02-03    NA    NA    NA
    # 4 USW00092826 2007-02-04    NA    NA    NA
    # 5 USW00092826 2007-02-05    NA    NA    NA
    # 6 USW00092826 2007-02-06    NA    NA    NA
    # 7 USW00092826 2007-02-07    NA    NA    NA
    # 8 USW00092826 2007-02-08    NA    NA    NA
    # 9 USW00092826 2007-02-09    NA    NA    NA
    #10 USW00092826 2007-02-10    NA    NA    NA
    # # ... with 4,067 more rows
    

    And you can see that there's missing measurements which you'll need to handle.