Search code examples
rdataframeaggregate

R: Sum each column, average cover value and relative average cover value


I have a dataframe that contains percent cover data of many plant species in several plots:

Example

        sp1 sp2 sp3 sp4
PlotA   1   12  0   0
PlotB   0   3   5   0
PlotC   3   0   0   1
PlotD   0   6   0   0
PlotE   1   1   0   0

I would like to calculate a few things for each individual species/column:

How many plots does each species occur? (ex. sp1 = 3, sp2 = 4, sp3 = 1)

What is the average pct. cover per species per plot? (total cover/total plots, ex. sp1 = 1.0, sp2 = 4.4)

What is the relative avg. pct. cover per species per plot? (avg. pct. cover/sum of all avg. pct cover)

Will this be more easily achieved by making the species the rows and the plots the columns?


Solution

  • We can use dplyr and tidyr packages to summarise and reshape the data. Look below;

    library(dplyr)
    library(tidyr) # version 1.0.0
    
    df1 %>% 
      summarise_all(list(count = ~sum(`>`(.,0)), 
                         avg.cover=~mean(.))) %>% 
      gather() %>% 
      separate("key", into = c("SP", "var"), sep = "_") %>% 
      pivot_wider(names_from = "var", values_from = "value") %>% 
      mutate(rel.avg.cover = avg.cover/sum(avg.cover))
    #> # A tibble: 4 x 4
    #>   SP    count avg.cover rel.avg.cover
    #>   <chr> <dbl>     <dbl>         <dbl>
    #> 1 sp1       3       1          0.152 
    #> 2 sp2       4       4.4        0.667 
    #> 3 sp3       1       1          0.152 
    #> 4 sp4       1       0.2        0.0303
    

    Data:

    read.table(text = "row  sp1 sp2 sp3 sp4
    PlotA   1   12  0   0
    PlotB   0   3   5   0
    PlotC   3   0   0   1
    PlotD   0   6   0   0
    PlotE   1   1   0   0", header=T, row.names=1) -> df1