Search code examples
rplotmodelanovasurface

Can I plot the ANOVA model for a two or three factor experiment?


I want to fit a model to a three factor factorial experiment. In an attempt to do this with R, I am reproducing examples from a textbook (Montgomery, DC (2013) Design and Analysis of Experiments, 8th ed. John Wiley & Sons ISBN: 9781118097939). The specific example I am attempting is Example 5.5, and although only a two factor example, I am hoping to learn the basics from it. Example 5.5 question statement

I can easily reproduce the ANOVA table in R, and I can retract the coefficients of the model (I think). Considering the model equation given on the image above, I assume that the four coefficients returned by R is β0, β1, β2 and β12. I have no idea how to plot the surface described by the model, which is my first problem. Secondly, the textbook discuss how a better model fit can be attained if the interaction parameters, i.e. β112, β122 and β1122 are included. Is it possible to do this in R as well? The surface fitted to the model including the interaction parameters is attached here. Surface plotted from the model including the interaction parameters

I am relatively comfortable in python, although I have never plotted surfaces using matplotlib. I am very new in R, and have never plotted anything in R. From surfing the web I could not find anything useful for what I am trying to do. My code is attached below.

lewensduur_data <- data.frame(A = rep(c(15, 20, 25), each = 2),
                         B = rep(c(125, 150, 175), each = 6),
                         lewe = c(-2, -1, 0, 2, -1, 0,
                                  -3, 0, 1, 3, 5, 6,
                                  2, 3, 4, 6, 0, -1))

lewensduur_anova <- aov(lewe ~ A * B, data = lewensduur_data)
lewensduur_anova

which yields the ANOVA table

Call:
   aov(formula = lewe ~ A * B, data = lewensduur_data)

Terms:
                       A        B      A:B Residuals
Sum of Squares   8.33333 21.33333  8.00000  86.33333
Deg. of Freedom        1        1        1        14

Residual standard error: 2.483277
Estimated effects may be unbalanced

I retrieved the coefficients as follows

coefficients(lewensduur_anova)

yielding

(Intercept)-34A1.36666666666667B0.213333333333333A:B-0.008

As an after thought, I noticed that aov() returns that the estimated effects may be unbalanced. From what I understand, aov() is best suited for factors having the same amount of levels and replicates. Is there a better ANOVA function to use for cases like my example?


Solution

  • There are many ways you can make this happen. I'm going to start with what you've done so far, though.

    Your aov() reflects that you have 1 degree of freedom for A and 1 for B. That's a sign that something is wrong. The degrees of freedom for a categorical field is going to be the number of unique values minus one. The degrees of freedom for both A and B should be two.

    Let's go over what went wrong...your entries from A and B are numbers, so the aov function did not interpret them correctly. The easiest way to fix this is to make these two columns factor-type.

    av <- aov(lewe ~ A * B, data = mutate(lewensduur_data, A = as.factor(A), B = as.factor(B)))
    summary(av)
    #             Df Sum Sq Mean Sq F value  Pr(>F)   
    # A            2  24.33  12.167   8.423 0.00868 **
    # B            2  25.33  12.667   8.769 0.00770 **
    # A:B          4  61.33  15.333  10.615 0.00184 **
    # Residuals    9  13.00   1.444                   
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
    

    These numbers are quite a bit different than your original numbers. You still need to ask yourself if this information is accurate. Well, ANOVA requires that the data meet the assumptions of both normal distribution and homogeneity. Normality and homogeneity where there are only two observations in each category? Your results are not going to be meaningful.

    This is not enough data to evaluate with ANOVA.

    However, for the sake of your original questions, let's look at this data using the jmv package. Note that I didn't make A or B factors for this call. Additionally, I used the sum of squares type 2 method (this is actually irrelevant, though.)

    an <- ANOVA(formula = lewe ~ A * B, 
          data = lewensduur_data, ss = "2", effectSize = "partEta",
          homo = T, norm = T, postHocCorr = 'tukey', postHoc = ~A * B)
    
    
    an # fails for homogeneity; passes for normality
    # 
    #  ANOVA
    # 
    #  ANOVA - lewe                                                                                
    #  ─────────────────────────────────────────────────────────────────────────────────────────── 
    #                 Sum of Squares    df    Mean Square    F            p            η²p         
    #  ─────────────────────────────────────────────────────────────────────────────────────────── 
    #    A                  24.33333     2      12.166667     8.423077    0.0086758    0.6517857   
    #    B                  25.33333     2      12.666667     8.769231    0.0077028    0.6608696   
    #    A:B                61.33333     4      15.333333    10.615385    0.0018438    0.8251121   
    #    Residuals          13.00000     9       1.444444                                          
    #  ─────────────────────────────────────────────────────────────────────────────────────────── 
    # 
    # 
    #  ASSUMPTION CHECKS
    # 
    #  Homogeneity of Variances Test (Levene's)     
    #  ──────────────────────────────────────────── 
    #    F               df1    df2    p            
    #  ──────────────────────────────────────────── 
    #    1.163368e+31      8      9    < .0000001   
    #  ──────────────────────────────────────────── 
    # 
    # 
    #  Normality Test (Shapiro-Wilk) 
    #  ───────────────────────────── 
    #    Statistic    p           
    #  ───────────────────────────── 
    #    0.9209304    0.1342718   
    #  ───────────────────────────── 
    

    If we ignored the insufficient data sample size, we still can't trust this information. ANOVA is very robust against deviations from normal distribution, but it's very sensitive to homogeneity. (Your homogeneity was never going to pass due to the sample size when compared to the number of groups.)

    To show you how you can plot this, I used the plotly package. To use this package for a surface plot, you need a matrix that essentially looks like the table in your book. (Where the speeds are the column names and the angles are the row names. You can only have one entry for each row/column, as well. I chose to use the average of the values by group. To collect the averages I used lapply.

    # get the average by speed and angle
    new_lewe <- lapply(seq(1, (nrow(lewensduur_data) - 1), by = 2),
                       function(i) {
                         with(lewensduur_data[i:(i + 1), ], mean(lewe))
                        }) %>% unlist()
    

    I placed these averages into a matrix and gave it row and column names.

    ld <- matrix(data = new_lewe,
                 nrow = 3, ncol = 3,
                 dimnames = with(lewensduur_data, list(unique(A), unique(B))))
    

    You can see a basic surface plot now.

    plot_ly(x = colnames(ld), y = rownames(ld),
            z = ld) %>% add_surface()
    

    enter image description here

    You can dress this surface plot up to look more like your book's surface plot, as well.

    plot_ly(x = colnames(ld), y = rownames(ld),
            z = ld) %>% 
      add_surface(contours = list(z = list(usecolormap = T, show = T,
                                           project = list(z = T)))) %>% 
      layout(scene = list(aspectmode = "manual", 
                          aspectratio = list(x = 2, y = 1.5, z = 1),
                          xaxis = list(title = "Cutting Speed"), 
                          yaxis = list(title = "Tool Angle"),
                          zaxis = list(title = "Life"),
                          camera = list(
                            center = list(x = .5, y = .5, z = .4),
                            eye = list(x = 1.5, y = 2, z = 2))))
    

    enter image description here

    You can project this data to a basic contour plot, as well.

    plot_ly(x = colnames(ld), y = rownames(ld),
            z = ld, type = "contour", line = list(width = 0)) 
    

    enter image description here

    In that basic contour plot (above) the x & y are opposite of those in your book.

    In the next contour plot, I've flipped the axes, so they match. You can dress it up, as well.

    plot_ly(y = colnames(ld), x = rownames(ld),
            z = t(ld), type = "contour", ncontours = 8,
            line = list(width = 0, smoothing = 1.3),
            contours = list(showlabels = T)) %>% 
      layout(xaxis = list(title = "Angle"), 
             yaxis = list(title = "Speed"))
    

    enter image description here

    If you have any questions, let me know!