Search code examples
rmissing-dataanova

do both values of one predictor have to be observed for every value of the second predictor to include them in 2-way anova?


I am trying to find out whether there is a difference in the exit date of a commonly occupied area of two types of fish from data collected over several years.

I believe year has some influence on the exit date, however I do not have observations of both types of fish in every year that was monitored. To do include year as a second predictor in a two-way anova, do I have to filter the data to only include years where both types of fish were observed?

Here is my data.

df<-data.frame(type = c(rep('C',42),rep('S',19)),
Year = c(2012, 2008, 2008, 2012, 2010, 2010, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 2009, 2009, 2014, 2014, 2014, 2016, 2015, 2015, 2015, 2015, 2015, 2016, 2017, 2018, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2015, 2016, 2016, 2016, 2016, 2018, 2015, 2013, 2013, 2013, 2013, 2013, 2014,2015),
exit = c(195, 204, 216, 183, 194, 192, 195, 194, 190, 181, 191, 191, 196, 210, 216, 209, 193, 188, 194, 187, 186, 186, 149, 182, 197, 173,185, 182, 198, 189, 183, 177, 190, 198, 208, 190, 204, 185, 188, 189, 205, 179, 175, 180, 188, 191, 173, 186, 191, 196, 196, 192, 207, 192, 185, 176, 190, 192, 175, 196, 200))

I have tried reading requirements for an anova but can't find a case that matches mine. Results of my searches suggested using imputations, but I am not sure if that fits my situation or if I need to use a non-parametric test. I have read a lot of material and am overwhelmed and would really appreciate if someone could point me in the right direction or suggest an appropriate forum to read. Working in R.

Thanks


Solution

  • Sort of.

    Best practice if you're going to think about inferences on main effects in the presence of interactions:

    options(contrasts = c("contr.sum", "contr.poly"))
    

    You can get away with "type 2" comparisons:

    car::Anova(lm(exit~factor(Year)*type, data = df, type = "3"))
    Note: model has aliased coefficients
          sums of squares computed by model comparison
    Anova Table (Type II tests)
    
    Response: exit
                      Sum Sq Df F value  Pr(>F)  
    factor(Year)      2471.7 10  2.3725 0.02325 *
    type                 3.6  1  0.0341 0.85428  
    factor(Year):type  239.2  3  0.7653 0.51934  
    Residuals         4792.3 46                  
    

    But "type 3" comparisons don't work/will be harder:

    car::Anova(lm(exit~factor(Year)*type, data = df), type = "3")
    Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) : 
      there are aliased coefficients in the model
    

    From ?car::Anova:

    The designations "type-II" and "type-III" are borrowed from SAS, but the definitions used here do not correspond precisely to those employed by SAS. Type-II tests are calculated according to the principle of marginality, testing each term after all others, except ignoring the term's higher-order relatives; so-called type-III tests violate marginality, testing each term in the model after all of the others. This definition of Type-II tests corresponds to the tests produced by SAS for analysis-of-variance models, where all of the predictors are factors, but not more generally (i.e., when there are quantitative predictors). Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense.