Search code examples
rmatrixchemistry

Calculate diversity index (dissimilarity index) for a set of compounds in R


I want to calculate the diversity index for a given matrix.

I have a dataset matrix (xmatrix.RData), which is a 986 * 881 matrix, indicating 986 compounds and 881 fingerprints descriptors.

The formula to calculate the diversity index is explained in:
http://r.789695.n4.nabble.com/file/n4677766/Diversity_Index_Formula.pdf


Solution

  • I would do something like this:

    # this calulates you the dissimilarity matrix diss(i, j) in the paper
    # look at help page of dist for different methods like euclidean, maximum...
    diss<-as.matrix(dist(xmatrix, method="binary", diag=TRUE, upper=TRUE))
    l<-nrow(xmatrix)
    
    # the overall dissimilarity
    sum(rowSums(diss))/(l*(l-1))
    

    hope this helps...