I am trying to find out whether there is a difference in the exit date of a commonly occupied area of two types of fish from data collected over several years.
I believe year has some influence on the exit date, however I do not have observations of both types of fish in every year that was monitored. To do include year as a second predictor in a two-way anova, do I have to filter the data to only include years where both types of fish were observed?
Here is my data.
df<-data.frame(type = c(rep('C',42),rep('S',19)),
Year = c(2012, 2008, 2008, 2012, 2010, 2010, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 2009, 2009, 2014, 2014, 2014, 2016, 2015, 2015, 2015, 2015, 2015, 2016, 2017, 2018, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2015, 2016, 2016, 2016, 2016, 2018, 2015, 2013, 2013, 2013, 2013, 2013, 2014,2015),
exit = c(195, 204, 216, 183, 194, 192, 195, 194, 190, 181, 191, 191, 196, 210, 216, 209, 193, 188, 194, 187, 186, 186, 149, 182, 197, 173,185, 182, 198, 189, 183, 177, 190, 198, 208, 190, 204, 185, 188, 189, 205, 179, 175, 180, 188, 191, 173, 186, 191, 196, 196, 192, 207, 192, 185, 176, 190, 192, 175, 196, 200))
I have tried reading requirements for an anova but can't find a case that matches mine. Results of my searches suggested using imputations, but I am not sure if that fits my situation or if I need to use a non-parametric test. I have read a lot of material and am overwhelmed and would really appreciate if someone could point me in the right direction or suggest an appropriate forum to read. Working in R.
Thanks
Sort of.
Best practice if you're going to think about inferences on main effects in the presence of interactions:
options(contrasts = c("contr.sum", "contr.poly"))
You can get away with "type 2" comparisons:
car::Anova(lm(exit~factor(Year)*type, data = df, type = "3"))
Note: model has aliased coefficients
sums of squares computed by model comparison
Anova Table (Type II tests)
Response: exit
Sum Sq Df F value Pr(>F)
factor(Year) 2471.7 10 2.3725 0.02325 *
type 3.6 1 0.0341 0.85428
factor(Year):type 239.2 3 0.7653 0.51934
Residuals 4792.3 46
But "type 3" comparisons don't work/will be harder:
car::Anova(lm(exit~factor(Year)*type, data = df), type = "3")
Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) :
there are aliased coefficients in the model
From ?car::Anova
:
The designations "type-II" and "type-III" are borrowed from SAS, but the definitions used here do not correspond precisely to those employed by SAS. Type-II tests are calculated according to the principle of marginality, testing each term after all others, except ignoring the term's higher-order relatives; so-called type-III tests violate marginality, testing each term in the model after all of the others. This definition of Type-II tests corresponds to the tests produced by SAS for analysis-of-variance models, where all of the predictors are factors, but not more generally (i.e., when there are quantitative predictors). Be very careful in formulating the model for type-III tests, or the hypotheses tested will not make sense.