I can't find a dataset similar to my problem so I changed the dataset Iris (dataset in R) to look similar - it's close enough!
data = iris
data$type = gl(5,30,150,labels=c("group1","group2","group3","group4","group5"))
data$ID = gl(30,5,150)
Then I used the following code
xtabs(Sepal.Length ~ Species + type, aggregate(Sepal.Length ~ Species + type + ID, data, mean))
which results in
type
Species group1 group2 group3 group4 group5
setosa 30.16 19.90 0.00 0.00 0.00
versicolor 0.00 12.20 35.88 11.28 0.00
virginica 0.00 0.00 0.00 26.24 39.64
My understanding is that what my code is doing is adding together Sepal.Length for each ID then taking the mean of those values by each of Species and type.
Is this correct?
If not, how would I get this?
Additionally, how would I get this if my data is such that each ID has multiple types? (can't figure out how to construct this in R)
Actually, just to be perfectly clear
What I want is a code that sums together the Sepal.Length for each ID AND type then it will take the average of those sums over all IDs and post an average Sepal.Length by type and species/
With data.table
:
library(data.table)
setDT(data)
#sum of Sepal.Length for each ID AND type
data[, id_type_sum := sum(Sepal.Length), by = .(ID, type)]
# mean of this variable by type and species
data[, mean(id_type_sum), by = .(type, Species)]
# type Species V1
# 1: group1 setosa 25.13333
# 2: group2 setosa 24.87500
# 3: group2 versicolor 30.50000
# 4: group3 versicolor 29.90000
# 5: group4 versicolor 28.20000
# 6: group4 virginica 32.80000
# 7: group5 virginica 33.03333
And if you want this in table format, you can use data.table
's dcast
method:
library(magrittr) # for the %>% operator
data[, mean(id_type_sum), by = .(type, Species)] %>%
dcast(Species ~ type)
Result:
Species group1 group2 group3 group4 group5
1: setosa 25.13333 24.875 NA NA NA
2: versicolor NA 30.500 29.9 28.2 NA
3: virginica NA NA NA 32.8 33.03333