Search code examples

do both values of one predictor have to be observed for every value of the second predictor to include them in 2-way anova?

I am trying to find out whether there is a difference in the exit date of a commonly occupied area of two types of fish from data collected over several years.

I believe year has some influence on the exit date, however I do not have observations of both types of fish in every year that was monitored. To do include year as a second predictor in a two-way anova, do I have to filter the data to only include years where both types of fish were observed?

Here is my data.

df<-data.frame(type = c(rep('C',42),rep('S',19)),
Year = c(2012, 2008, 2008, 2012, 2010, 2010, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 2009, 2009, 2014, 2014, 2014, 2016, 2015, 2015, 2015, 2015, 2015, 2016, 2017, 2018, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2015, 2016, 2016, 2016, 2016, 2018, 2015, 2013, 2013, 2013, 2013, 2013, 2014,2015),
exit = c(195, 204, 216, 183, 194, 192, 195, 194, 190, 181, 191, 191, 196, 210, 216, 209, 193, 188, 194, 187, 186, 186, 149, 182, 197, 173,185, 182, 198, 189, 183, 177, 190, 198, 208, 190, 204, 185, 188, 189, 205, 179, 175, 180, 188, 191, 173, 186, 191, 196, 196, 192, 207, 192, 185, 176, 190, 192, 175, 196, 200))

I have tried reading requirements for an anova but can't find a case that matches mine. Results of my searches suggested using imputations, but I am not sure if that fits my situation or if I need to use a non-parametric test. I have read a lot of material and am overwhelmed and would really appreciate if someone could point me in the right direction or suggest an appropriate forum to read. Working in R.



  • Sort of.

    Best practice if you're going to think about inferences on main effects in the presence of interactions:

    options(contrasts = c("contr.sum", "contr.poly"))

    You can get away with "type 2" comparisons:

    car::Anova(lm(exit~factor(Year)*type, data = df, type = "3"))
    Note: model has aliased coefficients
          sums of squares computed by model comparison
    Anova Table (Type II tests)
    Response: exit
                      Sum Sq Df F value  Pr(>F)  
    factor(Year)      2471.7 10  2.3725 0.02325 *
    type                 3.6  1  0.0341 0.85428  
    factor(Year):type  239.2  3  0.7653 0.51934  
    Residuals         4792.3 46                  

    But "type 3" comparisons don't work/will be harder:

    car::Anova(lm(exit~factor(Year)*type, data = df), type = "3")
    Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) : 
      there are aliased coefficients in the model

    From ?car::Anova:

    The designations "type-II" and "type-III" are borrowed from SAS, but the definitions used here do not correspond precisely to those employed by SAS. Type-II tests are calculated according to the principle of marginality, testing each term after all others, except ignoring the term's higher-order relatives; so-called type-III tests violate marginality, testing each term in the model after all of the others. This definition of Type-II tests corresponds to the tests produced by SAS for analysis-of-variance models, where all of the predictors are factors, but not more generally (i.e., when there are quantitative predictors). Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense.