Search code examples
rdata-analysis

Grouping by sample for statistical test


I'm attempting to do a one-way anova test to see if different readers have an affect on measured age from a fish ear bone. Every reader measured the same samples, and I'm trying to group the samples and then run an anova over whether reader affects age within each sample.

Here's a sample of the data:

Sample <- c("A", "B", "C", "A", "B", "C", "A", "B", "C" )
Age <- c(4, 5, 6, 5, 5, 5, 7, 2, 4)
reader_ID <- rep(1:3, each = 3)

df <- data.frame(Sample, Age, reader_ID)
one_way <- aov(Age ~ reader_ID, data = df)
summary(one_way)

This is just comparing reader to age and not taking sample into consideration.

How would I group the samples to analyse age and reader?


Solution

  • Given your nested data structure, you’ll likely want to use a mixed model. e.g., you can fit a mixed linear model with random intercepts for sample using lme4::lmer():

    library(lme4)
    library(broom.mixed)
    
    mdl <- lmer(Age ~ reader_ID + (1 | Sample), data = df) 
    
    # model summary
    glance(mdl)
    
    # A tibble: 1 × 7
       nobs sigma logLik   AIC   BIC REMLcrit df.residual
      <int> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int>
    1   300  2.32  -684. 1375. 1390.    1367.         296
    
    # coefficients
    tidy(mdl, conf.int = TRUE)
    
    # A tibble: 4 × 8
      effect   group    term            estimate std.error statis…¹ conf.…² conf.h…³
      <chr>    <chr>    <chr>              <dbl>     <dbl>    <dbl>   <dbl>    <dbl>
    1 fixed    <NA>     (Intercept)      4.98      0.291      17.1   4.41    5.55   
    2 fixed    <NA>     reader_ID       -0.00776   0.00465    -1.67 -0.0169  0.00134
    3 ran_pars Sample   sd__(Intercept)  0.187    NA          NA    NA      NA      
    4 ran_pars Residual sd__Observation  2.32     NA          NA    NA      NA      
    # … with abbreviated variable names ¹​statistic, ²​conf.low, ³​conf.high
    

    Expanded example data:

    set.seed(13)
    Sample <- rep(c("A", "B", "C"), 100)
    Age <- sample(1:8, 300, replace = TRUE)
    reader_ID <- rep(1:100, each = 3)
    df <- data.frame(Sample, Age, reader_ID)