Search code examples
rpolynomial-math

How to properly set contrasts in R


I have been asked to see if there is a linear trend in 3 groups of data (5 points each) by using ANOVA and linear contrasts. The 3 groups represent data collected in 2010, 2011 and 2012. I want to use R for this procedure and I have tried both of the following:

contrasts(data$groups, how.many=1) <- contr.poly(3)
contrasts(data$groups)  <- contr.poly(3)

Both ways seem to work fine but give slightly different answers in terms of their p-values. I have no idea which is correct and it is really tricky to find help for this on the web. I would like help figuring out what is the reasoning behind the different answers. I'm not sure if it has something to do with partitioning sums of squares or whatnot.


Solution

  • Both approaches differ with respect to whether a quadratic polynomial is used.

    For illustration purposes, have a look at this example, both x and y are a factor with three levels.

    x <- y <- gl(3, 2)
    # [1] 1 1 2 2 3 3
    # Levels: 1 2 3
    

    The first approach creates a contrast matrix for a quadratic polynomial, i.e., with a linear (.L) and a quadratic trend (.Q). The 3 means: Create the 3 - 1th polynomial.

    contrasts(x) <- contr.poly(3)
    # [1] 1 1 2 2 3 3
    # attr(,"contrasts")
    #              .L         .Q
    # 1 -7.071068e-01  0.4082483
    # 2 -7.850462e-17 -0.8164966
    # 3  7.071068e-01  0.4082483
    # Levels: 1 2 3
    

    In contrast, the second approach results in a polynomial of first order (i.e., a linear trend only). This is due to the argument how.many = 1. Hence, only 1 contrast is created.

    contrasts(y, how.many = 1) <- contr.poly(3)
    # [1] 1 1 2 2 3 3
    # attr(,"contrasts")
    #              .L
    # 1 -7.071068e-01
    # 2 -7.850462e-17
    # 3  7.071068e-01
    # Levels: 1 2 3
    

    If you're interested in the linear trend only, the second option seems more appropriate for you.