Search code examples
rdplyrmodelstatisticsbroom

Linear models to test one factor effect on multiple categories


I want to test if the one variable has an effect on the catch rates of different species that are grouped, but am having trouble understanding how to do this in a tidy and succinct manner. I have a dataset with about 400 catch rates, but there is a LOT of variability among species' catch rates. It looks like this:

 set.seed(42)  
 n <- 100

df<- data.frame(organization=rep(LETTERS[1:4], n/2),
            species=rep(c("shark", "whale", "fish", "ray", "turtle"), each=20) ,
            gear=rep(c("l", "p", "l", "p", "l", "p", "l", "p", "l", "p"), each =10),
            rate=rnorm(n))

What I have tried so far is:

 library(broom)

 df %>% 
    group_by(species, gear) %>% 
    do(tidy(lm(rate~organization, data=.))) %>%   
    mutate(p.value=round(p.value, 3)) %>%
    filter(p.value<0.05)#filter only sig. pvals

What I want to know is whether there is a simple and more elegant way to test the effect of ONLY organization, but while still grouping species and gear. Essentially species and gear have a big effect, and different species can't really be compared against one another. So I want to know if WITHIN the same species and gear, organization makes a difference.

Any help will be so appreciated!!


Solution

  • This is a start. Not the complete solution. As here we group only with species. You can first group by species after that by gear and then combine both group_by(species, gear):

    library(tidyverse)
    library(broom)
    
    df %>% 
      mutate(species = as_factor(species)) %>% 
      group_by(species) %>% 
      group_split() %>% 
      map_dfr(.f = function(df) {
        lm(rate ~ organization, data = df) %>% 
          glance() %>% 
          add_column(species = unique(df$species), .before = 1)
      })
    
      species r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <fct>       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
    1 shark      0.229         0.165  1.18      3.57   0.0234     3  -61.4 133.   141.     50.5          36    40
    2 whale      0.192         0.124  1.03      2.84   0.0513     3  -55.6 121.   130.     37.8          36    40
    3 fish       0.0980        0.0229 0.999     1.30   0.288      3  -54.6 119.   128.     35.9          36    40
    4 ray        0.121         0.0481 0.783     1.66   0.194      3  -44.9  99.7  108.     22.1          36    40
    5 turtle     0.0448       -0.0348 0.922     0.563  0.643      3  -51.4 113.   121.     30.6          36    40