I have a dataframe that contains percent cover data of many plant species in several plots:
Example
sp1 sp2 sp3 sp4
PlotA 1 12 0 0
PlotB 0 3 5 0
PlotC 3 0 0 1
PlotD 0 6 0 0
PlotE 1 1 0 0
I would like to calculate a few things for each individual species/column:
How many plots does each species occur? (ex. sp1 = 3, sp2 = 4, sp3 = 1)
What is the average pct. cover per species per plot? (total cover/total plots, ex. sp1 = 1.0, sp2 = 4.4)
What is the relative avg. pct. cover per species per plot? (avg. pct. cover/sum of all avg. pct cover)
Will this be more easily achieved by making the species the rows and the plots the columns?
We can use dplyr
and tidyr
packages to summarise and reshape the data. Look below;
library(dplyr)
library(tidyr) # version 1.0.0
df1 %>%
summarise_all(list(count = ~sum(`>`(.,0)),
avg.cover=~mean(.))) %>%
gather() %>%
separate("key", into = c("SP", "var"), sep = "_") %>%
pivot_wider(names_from = "var", values_from = "value") %>%
mutate(rel.avg.cover = avg.cover/sum(avg.cover))
#> # A tibble: 4 x 4
#> SP count avg.cover rel.avg.cover
#> <chr> <dbl> <dbl> <dbl>
#> 1 sp1 3 1 0.152
#> 2 sp2 4 4.4 0.667
#> 3 sp3 1 1 0.152
#> 4 sp4 1 0.2 0.0303
Data:
read.table(text = "row sp1 sp2 sp3 sp4
PlotA 1 12 0 0
PlotB 0 3 5 0
PlotC 3 0 0 1
PlotD 0 6 0 0
PlotE 1 1 0 0", header=T, row.names=1) -> df1