Search code examples
rlmlevelsr-factor

Calculation of intercepts & slopes in a linear model with two factors containing 3 levels and interaction in r


I'm trying to calculate the slopes of an linear model in R. I have aggregated my dataset with
agg_df <- aggregate(cbind(rate.output, crit.intercept) ~ lvl + treatment, data = d, FUN = mean)
This was recommended to do by Ben Bolker in this thread

Which makes this reproducible example of my data:

lvl <- as.factor(rep(c(1, 2, 3), 3)) 
treatment <- as.factor(c(rep(c("green"), 3), rep(c("purple"), 3)), 
rep(c("red"), 3)) 
o2 <- c(0.035941608, 0.042206981, 0.023556132, 
     0.016169792, 0.041431159, 0.054221145, 0.007571207, 0.008033468, 0.012353746)  
df <- as.data.frame(cbind(lvl, treatment, o2))

Then I run the linear model with the interaction term:

o2 <- lm(rate.output ~ treatment*lvl, data = agg_df) |>
   summary()

This returns (I have added what I think is the correct Treatment and Level after the 'Estimate') :

Coefficients:
                      Estimate <br>
(Intercept)           0.035942  Green Level 1 
treatmentpurple      -0.019772  Purple Level 1
treatmentRed         -0.028370  Red Level 1
lvl2                  0.006265  Green Level 2
lvl3                 -0.012385  Green Level 3
treatmentpurple:lvl2  0.018996  Purple Level 2
treatmentRed:lvl2    -0.005803  Red Level 2
treatmentpurple:lvl3  0.050437  Purple Level 3
treatmentRed:lvl3     0.017168  Red Level 3

I then want to calculate the intercept of my different treatments and their slopes. An example of how I thought this was done, but gives me the wrong result:
To caluclate the slope of Purple treatment lvl2:
0.035942+(-0.019772)+0.018996 == 0.035166

I was told that I can control my calculations by doing this:

o2_purp <- agg_df[agg_df$treatment=="purple",]  
fit_p <- lm(rate.output ~ lvl, data = o2_purp) |> summary() 

Output from fit_p model: 
Coefficients: 
            Estimate Std. Error t value Pr(>|t|) 
(Intercept)  0.035942        NaN     NaN      NaN 
lvl2         0.006265        NaN     NaN      NaN 
lvl3        -0.012385        NaN     NaN      NaN 

Which tells me that the slope of Purple Treatment Level 2 is 0.02526. And not similar to the equation above. Where am I going wrong here? How do I calculate the slope of Level 2 and Level 3 of Treatment Purple and Treatment Red?
After I have calculated the slopes I want to test for differences between the Treatment level means with e.g., tukeys post hoc analysis.

Thank you for taking the time to answer my questions.

Edit: My reason for doing this is so I can compare the normoxic O2-uptake of invertebrates I exposed to different test media. In the test the treatment is added as an gradient, therefore I want to test if there can be differences between the Levels and that's why I'm trying to calculate the slopes for the different levels. Hope this clarified it a bit.


Solution

  • lvl <- as.factor(rep(c(1, 2, 3), 3))
    treatment <- as.factor(rep(c("green", "purple","red"), each=3))
    o2 <- c(0.035941608, 0.042206981, 0.023556132, 0.016169792, 0.041431159, 0.054221145, 0.007571207, 0.008033468, 0.012353746)
    df <- data.frame(lvl, treatment, o2)
    
    mod <- lm(o2 ~ treatment*lvl, data = df)
    
    summary(mod)
    Call:
    lm(formula = o2 ~ treatment * lvl, data = df)
    
    Residuals:
    ALL 9 residuals are 0: no residual degrees of freedom!
    
    Coefficients:
                          Estimate Std. Error t value Pr(>|t|)
    (Intercept)           0.035942        NaN     NaN      NaN
    treatmentpurple      -0.019772        NaN     NaN      NaN
    treatmentred         -0.028370        NaN     NaN      NaN
    lvl2                  0.006265        NaN     NaN      NaN
    lvl3                 -0.012385        NaN     NaN      NaN
    treatmentpurple:lvl2  0.018996        NaN     NaN      NaN
    treatmentred:lvl2    -0.005803        NaN     NaN      NaN
    treatmentpurple:lvl3  0.050437        NaN     NaN      NaN
    treatmentred:lvl3     0.017168        NaN     NaN      NaN
    
    Residual standard error: NaN on 0 degrees of freedom
    Multiple R-squared:      1, Adjusted R-squared:    NaN 
    F-statistic:   NaN on 8 and 0 DF,  p-value: NA
    
    # To caluclate the slope of Purple treatment lvl2
    mod$coefficients["(Intercept)"] + mod$coefficients["treatmentpurple"] + mod$coefficients["treatmentpurple:lvl2"]
    0.03516579 
    mod2 = lm(o2 ~ treatment*lvl, data = df, subset = df$treatment == 'purple')
    
    Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
    contrasts can be applied only to factors with 2 or more level
    

    As you can see, you can't even run the second regression because you need two or more levels to use a factor variable in your regression. You should not expect to get the same result in both cases because the sample sizes as well as the specifications are different.