Here's what I'm trying to do. My data frame has a factor variable, "country", and I want to split the data frame based on country. Then, I want to take the column mean over every variable for every country's data frame.
Data here: https://github.com/pourque/country-data
I've done this so far...
myList <- split(df1, df1$country)
for(i in 1:length(myList)) {
aggregate <- mapply(myList[[i]][,-c(38:39)], colMeans)
}
(I'm not including the 38th and 39th columns because those are factors.)
I've read this (function over more than one list) , which makes me think mapply is the answer here...but I'm getting this error:
Error in match.fun(FUN) :
'myList[[i]][, -c(38:39)]' is not a function, character or symbol
Maybe I'm formatting it incorrectly?
A data.table answer:
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = country, .SDcols = -c('age', 'gender')]
Now tidier syntax with deselection in .SDcols, thanks to user Arun
To explain what's happening here:
setDT(df1)
make the data.frame a data.tablelapply(.SD, mean)
for each column in the subset of data, take the mean
by = county
do this by groups split according to country
.SDcols = -c('age', 'gender')
omit age
and gender
columns from the subset of data