Search code examples
rlistdataframeaggregategroup-summaries

Taking column mean over a list of data frames in R


Here's what I'm trying to do. My data frame has a factor variable, "country", and I want to split the data frame based on country. Then, I want to take the column mean over every variable for every country's data frame.

Data here: https://github.com/pourque/country-data

I've done this so far...

myList <- split(df1, df1$country)
for(i in 1:length(myList)) {
aggregate <- mapply(myList[[i]][,-c(38:39)], colMeans)
}

(I'm not including the 38th and 39th columns because those are factors.)

I've read this (function over more than one list) , which makes me think mapply is the answer here...but I'm getting this error:

Error in match.fun(FUN) : 
'myList[[i]][, -c(38:39)]' is not a function, character or symbol 

Maybe I'm formatting it incorrectly?


Solution

  • A data.table answer:

    library(data.table)
    
    setDT(df1)[, lapply(.SD, mean), by = country, .SDcols = -c('age', 'gender')]
    

    Now tidier syntax with deselection in .SDcols, thanks to user Arun

    To explain what's happening here:

    • setDT(df1) make the data.frame a data.table
    • lapply(.SD, mean) for each column in the subset of data, take the mean
    • by = county do this by groups split according to country
    • .SDcols = -c('age', 'gender') omit age and gender columns from the subset of data