Search code examples
rdataframesubsetbins

Apply Bins to Data Frame Groups without making subset Data Frames


I have a data frame containing fish population sampling data. I would like to create bins to count how many fish are in a given length group for each species. The below code accomplishes this task for 2 species. Doing this for all species in the data frame doesn't seem like the most elegant way to achieve this goal.

Plus I would like to apply this code to other lakes with different species. It would be great to find an "automated" way to apply these bins to each species group in the data frame.

The data frame looks like:

Species TL   WT
BLG     75    6
BLG    118   27
LMB    200   98
LMB    315  369
RBS    112   23
RES    165   73
SPB    376  725
YEP    155   33


ss = read.csv("SS_West Point.csv" , na.strings="." , header=T)
blg = ss %>% subset(Species == "BLG")
lmb = ss %>% subset(Species == "LMB") 
blgn = blg %>% summarise(n = n())
lmbn = lmb %>% summarise(n = n())

###  20mm Length Groups - BLG  ###
blg20 = blg %>% group_by(gr=cut(TL , breaks = seq(0 , 1000 , by = 20))) %>% 
            summarise(n = n()) %>% mutate(freq = n , percent = ((n/blgn$n)*100) , 
                                   cumfreq = cumsum(freq) , cumpercent = cumsum(percent))
###  20mm Length Groups - BLG  ###
lmb20 = lmb %>% group_by(gr=cut(TL , breaks = seq(0 , 1000 , by = 20))) %>%
            summarise(n = n()) %>% mutate(freq = n , percent = ((n/lmbn$n)*100) , 
                            cumfreq = cumsum(freq) , cumpercent = cumsum(percent))

I've successfully used do() to run linear models on this data frame but can't seem to get it to work on cut(). Here is how I used do() on lm():

ssl = ss %>% mutate(lTL = log10(TL) , lWT = log10(WT)) %>% group_by(Species)
m = ssl %>% do(lm(lWT~lTL , data =.)) %>% mutate(wp = 10^(.fitted))

Solution

  • Does this do what you expect?

    ss20 <- ss %>%
      add_count(Species) %>%
      rename(Species_count = n) %>%
      # I added Species_count to the grouping so it goes along for the ride in summarization
      group_by(Species, Species_count, gr=cut(TL , breaks = seq(0 , 1000 , by = 20))) %>%
      summarise(n = n()) %>%
      mutate(freq = n, percent = ((n/Species_count)*100), 
             cumfreq = cumsum(freq) , cumpercent = cumsum(percent)) %>%
      ungroup()
    
    
    > ss20
    # A tibble: 8 x 8
      Species Species_count gr            n  freq percent cumfreq cumpercent
      <chr>           <int> <fct>     <int> <int>   <dbl>   <int>      <dbl>
    1 BLG                 2 (60,80]       1     1      50       1         50
    2 BLG                 2 (100,120]     1     1      50       2        100
    3 LMB                 2 (180,200]     1     1      50       1         50
    4 LMB                 2 (300,320]     1     1      50       2        100
    5 RBS                 1 (100,120]     1     1     100       1        100
    6 RES                 1 (160,180]     1     1     100       1        100
    7 SPB                 1 (360,380]     1     1     100       1        100
    8 YEP                 1 (140,160]     1     1     100       1        100