Search code examples
rdataframevegan

How can I calculate the number of uniques in a row within a species matrix?


I am trying to identify which rows have uniques (a species that was only observed in that row and not in any other row of my species matrix). I have my data matrix set up with columns as individual species and rows as individual sampling units (in our case, transects).

Say, for example, species 1 was only found in row 8 and nowhere else in the dataset, then I would like to know that row 8 contains 1 unique. If species 4 was also only found in row 8, then the number of uniques would be 2 etc. Note: some of the abundances of the uniques I have found in the dataset have been greater than 1, meaning they were found on the transect more than once, but they were still only found in that one transect (still considered a unique).

Here is some example data where row 3 has two uniques and row 5 has 1 unique:

 example_data <- data.frame(Species1 = c(1, 2, 3, 4, 5),
                             Species2 = c(6, 7, 8, 9, 10),
                             Species3 = c(0, 0, 13, 0, 0),
                             Species4 = c(0, 0, 0, 0, 20),
                             Species5 = c(0, 0, 23, 0, 0))

Tried with ChatGPT and got nowhere, I was able to figure out VIA excel that the data set does have uniques. I also figured out how to get R to tell me which species were only found once, but I am more interested in which rows (transects) have uniques and how many, not which species were uniques.

Additionally is there a way I can make a list of all the rows and the number of uniques? For the example data it would be 0,0,2,0,1. I am interested in which rows also don't have uniques (0 uniques).


Solution

  • Since you are interested in which rows have uniques and how many, not which species were uniques:

    You can find both the row and column indices of those with unique values by first finding which species are unique, then using which(..., ind.arr = TRUE):

    uniques <- vapply(example_data, \(x) sum(x !=0) == 1, logical(1L))
    
    # Species1 Species2 Species3 Species4 Species5 
    #    FALSE    FALSE     TRUE     TRUE     TRUE 
    
    rowcol_uniques <- which(example_data[uniques] != 0, arr.ind = TRUE)
    rownames(rowcol_uniques) <- names(uniques[uniques])
    
    #            row col
    # Species3   3   1
    # Species4   5   2
    # Species5   3   3
    

    You can find how many by table:

    table(rowcol_uniques[,1])
    
    # 3 5 
    # 2 1 
    

    In your edit, you mentioned you are interested rows that also don't have uniques - for this you could merge, which would create a second column Freq that identifies the number of uniques per row for all rows:

    anyuniques <- merge(data.frame(row = seq_len(nrow(example_data))), 
                        as.data.frame(table(row = rowcol_uniques[,1])), 
                        all.x = TRUE)
    anyuniques[is.na(anyuniques)] <- 0
    
    #   row Freq
    # 1   1    0
    # 2   2    0
    # 3   3    2
    # 4   4    0
    # 5   5    1