Search code examples
rvegan

Bray-Curtis Pairwise Analysis in R


I am trying to calculate and visualize the Bray-Curtis dissimilarity between communities at paired/pooled sites using the Vegan package in R.

Below is a simplified example dataframe:

Site = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J") 

PoolNumber = c(1, 3, 4, 2, 4, 1, 2, 3, 4, 4) 

Sp1 = c(3, 10, 7, 0, 12, 9, 4, 0, 4, 3) 

Sp2 = c(2, 1, 17, 1, 2, 9, 3, 1, 6, 7)

Sp3 = c(5, 12, 6, 10, 2, 4, 0, 1, 3, 3)

Sp4 = c(9, 6, 4, 8, 13, 5, 2, 20, 13, 3)

df = data.frame(Site, PoolNumber, Sp1, Sp2, Sp3, Sp4)

"Site" is a variable indicating the location where each sample was taken The "Sp" columns indicate abundance values of species at each site. I want to compare pairs of sites that have the same "PoolNumber" and get a dissimilarity value for each comparison.

Most examples suggest I should create a matrix with only the "Sp" columns and use this code:

matrix <- df[,3:6]

braycurtis = vegdist(matrix, "bray")

hist(braycurtis)

However, I'm not sure how to tell R which rows to compare if I eliminate the columns with "PoolNumber" and "Site". Would this involve organizing by "PoolNumber", using this as a row name and then writing a loop to compare every 2 rows? I am also finding the output difficult to interpret. Lower Bray-Curtis values indicate more similar communities (closer to a value of 0), while higher values (closer to 1) indicate more dissimilar communities, but is there a way to tell directionality, which one of the pair is more diverse?

I am a beginner R user, so I apologize for any misuse of terminology/formatting. All suggestions are appreciated.

Thank you


Solution

  • Do you mean that you want to get a subset of dissimilarities with equal PoolNumber? The vegdist function will get you all dissimilarities, and you can pick your pairs from those. This is easiest when you first transform dissimilarities into a symmetric matrix and then pick your subset from that symmetric matrix:

    braycurtis <- vegdist(df[,3:6])
    as.matrix(braycurtis)[df$PoolNumber==4,df$PoolNumber==4]
    as.dist(as.matrix(braycurtis)[df$PoolNumber==4,df$PoolNumber==4])
    

    If you only want to have averages, vegan::meandist function will give you those:

    meandist(braycurtis, df$PoolNumber)
    

    Here diagonal values will be mean dissimilarities within PoolNumber and off-diagonal mean dissimilarities between different PoolNumbers. Looking at the code of vegan::meandist you can see how this is done.

    Bray-Curtis dissimilarities (like all normal dissimilarities) are a symmetric measure and it has no idea on the concept of being diverse. You can assess the degree of being diverse for each site, but then you need to first tell us what do you mean with "diverse" (diversity or something else?). Then you just need to use those values in your calculations.

    If you just want to look at number of items (species), the following function will give you the differences in the lower triangle (and the upper triangle values will be the same with a switch of a sign):

    designdist(df[,3:6], "A-B", "binary")
    

    Alternatively you can work with row-wise statistics and see their differences. This is an example with Shannon-Weaver diversity index:

    H <- diversity(df[,3:6])
    outer(H, H, "-")
    

    To get the subsets, work similarly as with the Bray-Curtis index.