Search code examples
jsonrlistdataframecensus

R: Converting a list of lists into a data frame (Census data)


I am trying to use the Census API to download specific tables and save them in data frames. I have been successful in downloading the data. I assemble the appropriate URL for the call and then have used the package 'rjson' to read the URL into a list. For example:

library(rjson)    

get <- c("B19081_002M")                                      # create vector of vars
datafile <- "http://api.census.gov/data/2009/acs5?"          # ACS 05-09
get <- paste0("get=NAME,", paste(get, collapse = ','))       # variables
geo <- "for=county:*"                                        # all counties
api_key <- "key=KEYHERE"                                     # API key
url <- paste0(datafile, paste(get, geo, api_key, sep = "&")) # creates url
data <- fromJSON(file = url)                                 # read into R

# To see an example of a problematic observation
# (this should return "Hinsdale County, Colorado")

data[[273]]

However, I am having difficulty converting this into a data frame. The fromJSON() function creates a list object. In most cases, the elements of the list object are chr vectors for each spatial unit (e.g. county in the above example) and the vector contains the table information and associated metadeta. In this case, I use the approach in the below working example to convert the list into a data frame where each row is a different spatial unit and each column is a different variable.

# Create fake data
x1 <- seq(1:5)
x2 <- rep(5,5)
l1 <- list(x1,x2)

# Convert to df
cols_per_row <- length(unlist(l1[1]))
test1 <- data.frame(matrix(unlist(l1), byrow = TRUE, ncol = cols_per_row))

print(test1) # success!

X1 X2 X3 X4 X5
1  1  2  3  4  5
2  5  5  5  5  5

But when I use that same approach with the list-in-list object (which arises because I am including different tables from the API), I receive an error:

# Create fake data
x1 <- seq(1:5)
x2 <- rep(5,5)
x3 <- list(1,2,3,4,NULL)
l2 <- list(x1,x2,x3)

# Produces an error
cols_per_row <- length(unlist(l2[1]))
test2 <- data.frame(matrix(unlist(l2), byrow = TRUE, ncol = cols_per_row))

Warning message:
In matrix(unlist(l2), byrow = TRUE, ncol = cols_per_row) :
data length [14] is not a sub-multiple or multiple of the number of columns [5]

Does anyone have a solution for this?

  • I've noticed that the sub-lists are only appearing in cases where one of the variables has a NULL value.
  • In the cases where the element of the main list is also a list, the sub-list is of length equal to the length of the vectors for the element of the main lists that are vectors.

Notes

  • I do not need to use fromJSON and welcome alternatives that might make this easier.
  • I do not want to use the 'acs' package to accomplish this, so please do not suggest its use. I am trying to learn how to deal with this problem.

Solution

  • Maybe this is what you are after:

    simplify2array(l2)
    

    EDIT:

    The above solution did not work. As an alternative I would replace NULL values with NA:

    # Function to replace NULL values to NA values inside a list
    listNull2Na <- function(l) sapply(l, function(x) ifelse(is.null(x), NA, x))
    
    # Substitute NULL values in your list and get matrix:
    l2 <- sapply(l2, listNull2Na)