Search code examples
rlistapimissing-datarjson

Filling in missing values from API imported data


I'm using API to get data from the Census Bureau. The good news is that I'm able to retrieve the data. The bad news is that I can't get it into a format that is usable for analysis and mapping.

My question: Is there a way to modify the API call or a standard way of dealing with missing values when the data is in a list?

Here's what I'm doing with the actual data. A toy example is below because the census data requires a personal API token.

# Pull data from Census Bureau
mydata<-fromJSON(file=url(paste("http://api.census.gov/data/2010/acs5?key=", token,"&get=B25077_001E&for=block+group:*&in=state:47+county:037", sep = ""))) 
# create a data frame
mydata.df<-ldply(mydata)
# rename columns 
names(mydata.df)<-ldply(mydata)[1,] 

Here's some of my data. I've tried mydata[mydata == NULL] = 9999 but it didn't help.

   list(c("94400", "47", "037", "019200", "4"), c("350000", "47", "037", "019300", "1"), list(NULL, "47", "037", "019300", "2"), list(NULL, "47", "037", "019300", "3"), c("198200", "47", "037", "019400", "1"), c("176900", "47", "037", "019400", "2"), c("250000", "47", "037", "019400", "3"), c("166200", "47", "037", "019500", "1"), c("227200", "47", "037", "019500", "2"), c("210500", "47", "037", "019500", "3"), c("187500", "47", "037", "019500", "4"), c("140000", "47", "037", "019600", "1"), c("131300", "47", "037", "019600", "2"), list(NULL, "47", "037", "980100", "1"), list(NULL, "47", "037", "980200", "1"))

This is how I know that there are missing values; some have 5 values, some have 4.

unlist(lapply(mydata, function(x) length(unlist(x))))

In the event that this isn't an issue with fromJSON(), here's an example of what I'd like the data to do once it's in R.

mylist = list(a = c(1:4), b = c(1:3), c = c(1:4), d = )

Gives this:

$a
[1] 1 2 3 4
$b
[1] 1 2 3
$c 
[1] 1 2 3 4

But I would like this:

$a
[1] 1 2 3 4
$b
[1] 1 2 3 NA
$c 
[1] 1 2 3 4

Or something similar where an NA acts as a placeholder for missing values. If a 2 were missing, for example, the entry in the list would like like 1 NA 3 4.


Solution

  • mylist = list(a = 1:4, b = 1:3, c = c(1,3,4))
    Un <- unique(unlist(mylist))
    lapply(mylist, function(x) x[match(Un,x)])
    # $a
    # [1] 1 2 3 4
    
    # $b
    # [1]  1  2  3 NA
    
    # $c
    #[1]  1 NA  3  4
    

    Update

    Using the dput() data

     lst1 <- lapply(mylist, function(x) do.call(c,lapply(x, 
                          function(y) {y[is.null(y)] <- NA;y}))) 
    
       head(lst1,3)
      #[[1]]
      #[1] "94400"  "47"     "037"    "019200" "4"     
    
      #[[2]]
      #[1] "350000" "47"     "037"    "019300" "1"     
    
      #[[3]]
      #[1] NA       "47"     "037"    "019300" "2"