Search code examples
rmatrixtibblevegan

Convert tibble with species by site data into numeric matrix for vegan::diversity()


I have data in tibble format which looks like this (with 22 more rows and 7 more columns):

reprex[1:10,1:7]
# A tibble: 10 x 7
# Groups:   Point, Layer [10]
   Point Layer Lari_deci Quer_rope Pinu_sylv Betu_pend Sorb_aucu
   <chr> <chr> <chr>     <chr>     <chr>     <chr>     <chr>    
 1 P03   C     21        17        5         1         0        
 2 P03   U     0         0         0         0         3        
 3 P06   C     3         28        28        0         0        
 4 P07   C     0         3         20        1         1        
 5 P07   U     0         0         0         0         0        
 6 P08   C     0         16        21        0         0        
 7 P08   U     0         0         0         0         0        
 8 P10   C     0         17        44        1         0        
 9 P10   U     0         50        0         0         0        
10 P11   C     0         36        1         0         0  
> dput(reprex[1:10,1:7])
structure(list(Point = c("P03", "P03", "P06", "P07", "P07", "P08", 
"P08", "P10", "P10", "P11"), Layer = c("C", "U", "C", "C", "U", 
"C", "U", "C", "U", "C"), Lari_deci = c("21", "0", "3", "0", 
"0", "0", "0", "0", "0", "0"), Quer_rope = c("17", "0", "28", 
"3", "0", "16", "0", "17", "50", "36"), Pinu_sylv = c("5", "0", 
"28", "20", "0", "21", "0", "44", "0", "1"), Betu_pend = c("1", 
"0", "0", "1", "0", "0", "0", "1", "0", "0"), Sorb_aucu = c("0", 
"3", "0", "1", "0", "0", "0", "0", "0", "0")), row.names = c(NA, 
-10L), groups = structure(list(Point = c("P03", "P03", "P06", 
"P07", "P07", "P08", "P08", "P10", "P10", "P11"), Layer = c("C", 
"U", "C", "C", "U", "C", "U", "C", "U", "C"), .rows = structure(list(
    1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 10L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

I want to calculate Simpson's diversity index for each Point, considering the two Layer levels separately. Since my initial trials attempting this failed, I decided to split the above data into two, by the two levels C and U and then removing the Layer column and converting Point to rownames.

As a result, I obtained data which was theoretically only numeric (all remaining columns had counts of the corresponding species). But practically, this does not seem to be the case and this is where my problem lies. I then converted the data.frame using as.matrix but I still get the following error: Error in diversity(., index = "simpson") : input data must be numeric

reprex_C <- reprex %>% filter(Layer == "C") %>% ungroup %>% select(-2) %>% 
  column_to_rownames(var="Point") %>% as.matrix %>% 
  diversity(index = "simpson")
# I would have a similar 'reprex_U' object for Layer == "U".

I tried searching for ways to fix this by somehow converting the column values from character to numeric:

as.numeric(reprex_C[,1:14])

but this loses the row numbers and hence the Point identity. And although diversity() now works, it considers all the values as one and calculates just one diversity index for the whole data (as opposed to one value for each row in my original data format).

  1. Why is diversity() not working with such data? What can I do resolve this?
  2. Is there any way to perform diversity() without having to split the original data with two Layer levels into two separate matrices?

Solution

  • It looks to me, that your original dataframe has the numeric columns stored as chr instead. If you coerce them to numeric before you do your split, it should work fine:

    reprex_C <- reprex %>% 
      mutate(across(Lari_deci:Sorb_aucu,.fns = as.numeric)) %>%
      filter(Layer == "C") %>% ungroup %>% select(-2) %>% 
      column_to_rownames(var="Point") %>% as.matrix %>%
      vegan::diversity(index = "simpson")
    

    I'm afraid I'm not familiar enough with diversity to answer your second question.