I have a dataset in which I have 18 populations. Each population has several individuals in it, each individual has a "Color" call. I would like to only compare two populations at once in a one-way ANOVA with Population as the main factor in order to get a pairwise MS-within and MS-among.
I know how to extract the MS from the omnibus ANOVA using the following code:
mylm <- lm(Color ~ Pop, data=PopColor)
anova(mylm)[["Mean Sq"]]
Which yields the among-subjects MS (PopColor$Pop) first, then the between-subjects MS (residual) respectively:
[1] 3.7079911 0.4536985
Below is a subset of my data with six populations:
dput(dat)
structure(list(Pop = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("pop1001", "pop1026",
"pop252", "pop254a", "pop311", "pop317"), class = "factor"),
Color = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L,
3L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 3L, 3L, 2L, 3L, 2L,
3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 2L,
3L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 4L,
2L, 3L, 2L, 4L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 3L, 2L, 4L,
4L, 1L, 2L, 2L, 2L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L,
2L, 3L, 4L, 2L, 2L, 4L, 3L)), .Names = c("Pop", "Color"), class = "data.frame", row.names = c(NA,
-94L))
Any help would be greatly appreciated! Thanks!
I don't understand you point 2 ( a little bit technical for not statistician like me). For the first point , I understand it as you want to apply lm/Anova over all pairs of your populations. You can use combn
:
combn: Generate all combinations of the elements of x taken m at a time
pops <- unique(PopColor$Pop)
ll <- combn(pops,2,function(x){
dat.pair <- PopColor[PopColor$Pop %in% pops[x],]
mylm <- lm(Color ~ Pop, data=dat.pair)
c(as.character(pops[x]),anova(mylm)[["Mean Sq"]])
},simplify=FALSE)
do.call(rbind,ll)
[,1] [,2] [,3] [,4]
[1,] "pop1026" "pop254a" "0.210291858678956" "0.597865353037767"
[2,] "pop1026" "pop1001" "0.52409988385598" "0.486874236874237"
[3,] "pop1026" "pop317" "15.7296466973886" "0.456486042692939"
[4,] "pop1026" "pop311" "1.34392057218144" "0.631962930099576"
[5,] "pop1026" "pop252" "0.339324116743472" "0.528899835796388"
[6,] "pop254a" "pop1001" "0.0166666666666669" "0.351785714285714"
[7,] "pop254a" "pop317" "14.45" "0.227777777777778"
[8,] "pop254a" "pop311" "1.92898550724637" "0.561430575035063"
[9,] "pop254a" "pop252" "0.8" "0.344444444444444"
[10,] "pop1001" "pop317" "20.4166666666667" "0.205357142857143"
[11,] "pop1001" "pop311" "3.55030333670374" "0.46474019088017"
[12,] "pop1001" "pop252" "1.35" "0.280357142857143"
[13,] "pop317" "pop311" "9.60474308300398" "0.429172510518934"
[14,] "pop317" "pop252" "8.45" "0.116666666666667"
[15,] "pop311" "pop252" "0.110803689064557" "0.496914446002805"
As you can see we have choose(6,2)=15
possibles pairs.