Search code examples
rstatisticsanovado-loopsmean-square-error

How to do iterative ANOVA and extract Mean Square Values from lm object in R


I have a dataset in which I have 18 populations. Each population has several individuals in it, each individual has a "Color" call. I would like to only compare two populations at once in a one-way ANOVA with Population as the main factor in order to get a pairwise MS-within and MS-among.

I know how to extract the MS from the omnibus ANOVA using the following code:

mylm <- lm(Color ~ Pop, data=PopColor)
anova(mylm)[["Mean Sq"]]

Which yields the among-subjects MS (PopColor$Pop) first, then the between-subjects MS (residual) respectively:

[1] 3.7079911 0.4536985
  1. Is there a way in which I can create a do-loop to do all pairwise one-way ANOVA between all populations, then extract the among and within MS?
  2. I would then like to move the two MS values from each comparison to their own symmetrical matrices: one among-subjects MS matrix labeled by population, and one within-subjects MS matrix labeled by population. These would have identical column and row names with the Population name.

Below is a subset of my data with six populations:

dput(dat)
structure(list(Pop = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("pop1001", "pop1026", 
"pop252", "pop254a", "pop311", "pop317"), class = "factor"), 
    Color = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 
    3L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 3L, 3L, 2L, 3L, 2L, 
    3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 
    3L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 4L, 
    2L, 3L, 2L, 4L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 3L, 2L, 4L, 
    4L, 1L, 2L, 2L, 2L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 
    2L, 3L, 4L, 2L, 2L, 4L, 3L)), .Names = c("Pop", "Color"), class = "data.frame", row.names = c(NA, 
-94L))

Any help would be greatly appreciated! Thanks!


Solution

  • I don't understand you point 2 ( a little bit technical for not statistician like me). For the first point , I understand it as you want to apply lm/Anova over all pairs of your populations. You can use combn:

    combn: Generate all combinations of the elements of x taken m at a time

    pops <- unique(PopColor$Pop)
    ll <- combn(pops,2,function(x){
                      dat.pair <- PopColor[PopColor$Pop %in% pops[x],]
                      mylm <- lm(Color ~ Pop, data=dat.pair)
                      c(as.character(pops[x]),anova(mylm)[["Mean Sq"]])
    },simplify=FALSE)
    do.call(rbind,ll)
         [,1]      [,2]      [,3]                 [,4]               
     [1,] "pop1026" "pop254a" "0.210291858678956"  "0.597865353037767"
     [2,] "pop1026" "pop1001" "0.52409988385598"   "0.486874236874237"
     [3,] "pop1026" "pop317"  "15.7296466973886"   "0.456486042692939"
     [4,] "pop1026" "pop311"  "1.34392057218144"   "0.631962930099576"
     [5,] "pop1026" "pop252"  "0.339324116743472"  "0.528899835796388"
     [6,] "pop254a" "pop1001" "0.0166666666666669" "0.351785714285714"
     [7,] "pop254a" "pop317"  "14.45"              "0.227777777777778"
     [8,] "pop254a" "pop311"  "1.92898550724637"   "0.561430575035063"
     [9,] "pop254a" "pop252"  "0.8"                "0.344444444444444"
    [10,] "pop1001" "pop317"  "20.4166666666667"   "0.205357142857143"
    [11,] "pop1001" "pop311"  "3.55030333670374"   "0.46474019088017" 
    [12,] "pop1001" "pop252"  "1.35"               "0.280357142857143"
    [13,] "pop317"  "pop311"  "9.60474308300398"   "0.429172510518934"
    [14,] "pop317"  "pop252"  "8.45"               "0.116666666666667"
    [15,] "pop311"  "pop252"  "0.110803689064557"  "0.496914446002805"
    

    As you can see we have choose(6,2)=15 possibles pairs.