Search code examples
rimportsummary

Calculating how many rows sum to zero: problems importing data?


I have a dataframe bwsp that contains abundance data of many species at two locations that looks something like this:

          Location      sp1 sp2 sp3 sp4
sample1      SiteA       0  12  0   0
sample2      SiteA       0  3   0   0
sample3      SiteA       1  0   0   0
sample4      SiteB       0  0   6   0
sample5      SiteB       2  1   1   0
sample6      SiteB       0  1   0   80
sample7      SiteB       2  1   1   0
sample8      SiteB       0  0   0   0

I calculate the total abundance of all species in each sample using:

bwsp$N <- rowSums(bwsp)

I now want to calculate how many samples (=rows) have zero abundance (ie, N=0) at each location. I started with:

 library(tidyverse)
 sum(bwsp$N == "0")

and found no rows summed to zero. But I know this is wrong! (I handled the samples, and I know that there are several that were "empty".) So I checked it with:

> summary(bwsp$N)

I was really surprised to see that the minimum N was 1.0. I double-checked the other summary statistics in Excel and they don't quite match either.

Are these just rounding errors? What am I doing wrong?

NB: I just checked this with the dummy data that I provided above and it worked just fine. This makes me think that I'm doing something wrong with the way I'm getting the data into R, i.e. bwsp <- read.csv("dummybwsp.csv", row.names = 1).


Solution

  • Once I pared down the question, I was able to look back at my original script and see my error. In my other working, I had calculated some diversity indices first using:

    bwsp$shann <- diversity(bwsp)
    bwsp$simp <- diversity(bwsp, "simpson")
    

    Of course, these add to one, and hence add one to every row of data. There was no issue with the original script that I wrote, but there was an issue with me not thinking carefully about the way I was manipulating data.

    I was able to repair this issue by specifying the columns of data used in the calculations:

    bwsp$shann <- diversity(bwsp[,1:64])
    bwsp$simp <- diversity(bwsp[,1:64], "simpson")
    bwsp$N <- rowSums(bwsp[,1:64])
    

    Phew! This was a good reminder to really think about my data!