Search code examples
rfor-loopuniqueidentifierdata-retrieval

R - loop script that pulls data from online database and have unique name for each iteration


I'm fairly new to R and have been struggling with simplifying some code using a for loop. I am attempting to pull water quality data from an online database using the package dataRetrieval. I currently have duplicated the code for each site and changed the site number and output name, but have been trying to simplify this by putting the script in a for loop and am having trouble with creating the separate data tables with unique identifiers.

Original Code that creates a data table for each site. The only variables that changes are the siteNumbers and the data table name "x"_dataTable

#BW00A
siteNumbers = c("383652091125002")
parameterCode = c("00010","00095", "00300", "00400", "34475", "34485", "45617")
startDate = "1900-01-01"
endDate = "2020-12-01"

BW00A_dataTable <- readNWISqw(siteNumbers, parameterCode,
                             startDate, endDate)
#BW01
siteNumbers = c("383648091124501")
parameterCode = c("00010","00095", "00300", "00400", "34475", "34485", "45617")
startDate = "1900-01-01"
endDate = "2020-12-01"

BW01_dataTable <- readNWISqw(siteNumbers, parameterCode,
                             startDate, endDate)
#BW01A
siteNumbers = c("383648091124502")
parameterCode = c("00010","00095", "00300", "00400", "34475", "34485", "45617")
startDate = "1900-01-01"
endDate = "2020-12-01"

BW01A_dataTable <- readNWISqw(siteNumbers, parameterCode,
                             startDate, endDate)

New Code I can't get to work. I've placed the siteNumbers and siteNames into a data frame. What I want is for the script inside the for loop to iterate through the siteNumbers to pull the data and then attribute the newly created data table to the corresponding siteNames aka unique_siteName. I'm not sure if this is even possible.

df <- data.frame(
  siteNumbers = c("383652091125001",    "383652091125002",  "383648091124501",  "383648091124502",  "383506091132201",  "383508091132002",  "383508091132004",  "383519091133701",  "383544091132601",  "383544091132502",  "383628091124801",  "383639091125902",  "383639091125901",  "383638091125001",  "383638091125002",  "383631091124803",  "383631091124804",  "383631091124801",  "383631091124802",  "383636091123801",  "383636091123811",  "383616091125701",  "383640091130701",  "383640091130702",  "383621091130701",  "383621091130703",  "383621091130702",  "383624091130501",  "383624091130502",  "383616091130801",  "383616091130802",  "383644091131601",  "383627091130201",  "383622091130604",  "383622091130605",  "383557091132001",  "383614091132801"),
  siteName = c("BW-00", "BW-00A",   "BW-01",    "BW-01A",   "MW-04",    "MW-04A",   "MW-04B",   "MW-11",    "BW-21",    "BW-21A",   "210TB-C6", "Bates Spring", "Bates Spring below dam",   "BW-02",    "BW-02A",   "BW-04A-D", "BW-04A-S", "BW-04D",   "BW-04S",   "BW-05",    "BW-05A",   "BW-07",    "BW-08",    "BW-08A",   "BW-11",    "BW-11A-D", "BW-11A-S", "BW-13",    "BW-13A",   "BW-14",    "BW-14A",   "BW4-15",   "BW4-16",   "BW4-17",   "BW4-18",   "W3",   "W4")
)

parameterCode = c("00010","00095", "00300", "00400", "34475", "34485", "45617")
startDate = "1900-01-01"
endDate = "2020-12-01"

for (row in df)
{
 unique_siteName <- readNWISqw(siteNumbers, parameterCode,
                             startDate, endDate)  
  
}

Thanks for your help!


Solution

  • You need to loop over the row index and reference the data frame with row number in the loop, and create a list to accumulate the results:

    results <- list()
    for (row in 1:nrow(df)) {
     results[[i]] <- readNWISqw(df$siteNumbers[i], parameterCode,
                                 startDate, endDate)  
    }
    names(results) <- df$siteName
    

    R also offers lapply as a way to simplify this common pattern. The above loop is equivalent to this:

    results <- lapply(df$siteNumbers, FUN = readNWISqs, parameterCode, startDate, endDate)
    names(results) <- df$siteName
    

    I'd suggest reading my answer at How to make a list of data frames? for more discussion and explanation, both for why we do it this way and what good next steps are (for example, combining the results list into a single data frame).