Search code examples
rvegan

Shannon Index for multiple groups in one dataframe


I need to calculate the Shannon Index for multiple samples at multiple sites and I have no idea how to go about it. I'm using R and the data looks something like this.

Sample Species Count
17a Shark 17
17a Dolphin 25
17a Sting Ray 1
17a Badger 234
17b Shark 4
17b Dolphin 6
17b Sting Ray 19
17b Badger 25
18a Shark 45
18a Dolphin 4
18a Sting Ray 4
18a Badger 3

I feel like I need to split by sample some how but then after that I am completely stuck, I dont use R very often.

Thanks for any help!


Solution

  • You can get the Shannon index per Sample by grouping your data frame by Sample and applying vegan::diversity().

    df |>
        dplyr::mutate(shannon_index = vegan::diversity(Count), .by = Sample)
    
       Sample   Species Count shannon_index
    1     17a     Shark    17     0.5511595
    2     17a   Dolphin    25     0.5511595
    3     17a Sting Ray     1     0.5511595
    4     17a    Badger   234     0.5511595
    5     17b     Shark     4     1.1609846
    6     17b   Dolphin     6     1.1609846
    7     17b Sting Ray    19     1.1609846
    8     17b    Badger    25     1.1609846
    9     18a     Shark    45     0.7095302
    10    18a   Dolphin     4     0.7095302
    11    18a Sting Ray     4     0.7095302
    12    18a    Badger     3     0.7095302
    

    Used data:

    > dput(df)
    
    structure(list(Sample = c("17a", "17a", "17a", "17a", "17b", 
    "17b", "17b", "17b", "18a", "18a", "18a", "18a"), Species = c("Shark", 
    "Dolphin", "Sting Ray", "Badger", "Shark", "Dolphin", "Sting Ray", 
    "Badger", "Shark", "Dolphin", "Sting Ray", "Badger"), Count = c(17L, 
    25L, 1L, 234L, 4L, 6L, 19L, 25L, 45L, 4L, 4L, 3L)), row.names = c(NA, 
    -12L), class = "data.frame")