I have a dataframe bwsp
that contains abundance data of many species at two locations that looks something like this:
Location sp1 sp2 sp3 sp4
sample1 SiteA 0 12 0 0
sample2 SiteA 0 3 0 0
sample3 SiteA 1 0 0 0
sample4 SiteB 0 0 6 0
sample5 SiteB 2 1 1 0
sample6 SiteB 0 1 0 80
sample7 SiteB 2 1 1 0
sample8 SiteB 0 0 0 0
I calculate the total abundance of all species in each sample using:
bwsp$N <- rowSums(bwsp)
I now want to calculate how many samples (=rows) have zero abundance (ie, N=0) at each location. I started with:
library(tidyverse)
sum(bwsp$N == "0")
and found no rows summed to zero. But I know this is wrong! (I handled the samples, and I know that there are several that were "empty".) So I checked it with:
> summary(bwsp$N)
I was really surprised to see that the minimum N was 1.0. I double-checked the other summary statistics in Excel and they don't quite match either.
Are these just rounding errors? What am I doing wrong?
NB: I just checked this with the dummy data that I provided above and it worked just fine. This makes me think that I'm doing something wrong with the way I'm getting the data into R, i.e. bwsp <- read.csv("dummybwsp.csv", row.names = 1)
.
Once I pared down the question, I was able to look back at my original script and see my error. In my other working, I had calculated some diversity indices first using:
bwsp$shann <- diversity(bwsp)
bwsp$simp <- diversity(bwsp, "simpson")
Of course, these add to one, and hence add one to every row of data. There was no issue with the original script that I wrote, but there was an issue with me not thinking carefully about the way I was manipulating data.
I was able to repair this issue by specifying the columns of data used in the calculations:
bwsp$shann <- diversity(bwsp[,1:64])
bwsp$simp <- diversity(bwsp[,1:64], "simpson")
bwsp$N <- rowSums(bwsp[,1:64])
Phew! This was a good reminder to really think about my data!