Search code examples
rplyrtapplysplit-apply-combine

Processing lists of lists by group


I would like to process a list of lists. Specifically I want to extract the dataframe that is the third member of each list by a grouping variable (the first member of each list) and then use several functions like mean(), median(), sd(), length() etc on the data in that group. The output is then returned in a dataframe and would look something like:

Grp   mean sd  ... 
 a    5.26 ... ...
 b    6.25 ... ...

#fake data
test<-list(
         #member 1=grouping var, 2=identity, 3=dataframe
         list("a", 54, data.frame(x=c(1,2)  ,y=c(3,4))),
         list("b", 55, data.frame(x=c(5,6)  ,y=c(7,8))),
         list("a", 56, data.frame(x=c(9 ,10),y=c(11,12))),
         list("b", 57, data.frame(x=c(13,14),y=c(15,NA)))
         )

#what I thought could work but kicks out a strange error

test2 <-ldply(test, .fun=unlist)
#note limited to just mean for now
tapply(test, factor(test$V1), FUN=function(x){mean(as.numeric(x[3:6]), na.rm=TRUE)}, simplify=TRUE)

So my questions are: 1. Why doesn't the above work? 2. This feels very clunky, is there a more efficient way to do this?


Solution

  • In base R you can do :

    df_list <- tapply(test, 
                      sapply(test, `[[`,1), 
                      FUN=function(x) do.call(rbind,lapply(x, `[[`,3)))
    t(sapply(df_list, function(x){
      list("mean"=mean(unlist(x), na.rm = T),
           "sd"=sd(unlist(x), na.rm = T),
           "median"=median(unlist(x), na.rm = T))}))
    
      mean     sd       median
    a 6.5      4.440077 6.5   
    b 9.714286 4.151879 8