Search code examples
rfor-loopdataframecensus

how to loop through list and create separate dataframes in R


I am trying to pull in Census Bureau data on migration for the entire US by county. Because of the size of the data, Census requires that you specify a "regionin" (i.e., state or county) for the data import. So I need to run through a list of all the states (by fips code) in order to get all of the data imported. The output I need are separate dataframes for each state that I can then work with and combine into one large dataframe. Here is an example of the code I have written:

library(censusapi)

states <- c("01","02")
for(i in 1:length(states)) {
   region = str_glue("state:{states[i]}")
   migr = str_glue("migr2010_{states[i]}")
   migr <- getCensus(name = "acs/flows", vintage = 2010,
                     key = "*myAPIkey*",
                     vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                     region = "county:*", regionin = region)
}

What I want to get out are separate dataframes for each state named "migr2010_01", "migr2010_02", etc. What I am actually getting out is one dataframe named "migr" with only the data from the last state on the list. I know there is something wrong in my loop, but I am not sure where I need to make the change as I am new to R loops. Thanks for any ideas.


Solution

  • Your existing code creates an object called migr, and assigns it a string with the name of the data.frame you want to create. Then you overwrite the the migr object with the data.frame that you pull from Census. Each iteration of the loop, you overwrite migr, which is why only the data from the last iteration of the loop is saved, and then only as a data.frame named migr.

    Instead, you need to use the assign command to assign the data you pull from Census to the value stored in migr, as follows:

    library(censusapi)
    
    states <- c("01","02")
    for(i in 1:length(states)) {
       region = str_glue("state:{states[i]}")
       migr = str_glue("migr2010_{states[i]}")
       assign(
         x = migr,
         value = getCensus(name = "acs/flows", vintage = 2010,
                           key = "*myAPIkey*",
                           vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                           region = "county:*", regionin = region)
       )
    }
    

    Edit

    As others have mentioned, it may be easier to work with a list of data.frames, rather than creating several in the global environment. The easiest way to create that is using lapply, as follows:

     migr2010 <- lapply(
       paste0("state:", c("01", "02")),  # replaces region in the original
       getCensus,
       name = "acs/flows",
       vintage = 2010,
       key = "*myAPIkey*",
       vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
       region = "county:*"
       )
    

    Then, if you want to create a single data.frame out of those, you could use dplyr::bind_rows(migr2010), data.table::rbindlist(migr2010), or do.call(rbind, migr2010) (although do.call is much slower than the other two).