I am using a simper analysis from the Vegan package in order to determine which amino acid are responsible for the variation in protein composition between different samples. As I understood from this discussion, The simper() function uses Bray-Curtis. I would need to use another dissimilarity index, typically euclidean. How could I modify it within the function? Thanks.
Changing only the dissimilarity is trivial, but the dissimilarity you use should be such that you add and analyse terms by species. Terms of squared Euclidean distance are such. However, simper()
makes all kind of strange tricks with dissimilarities, and I am not at all sure these tricks are valid for squared Euclidean distance (I am not even sure that they are valid for Bray-Curtis that we used, but at least they accord with the published method). NB, we do warn against the use of simper
. This is an extract from the help page -- I hope you have already read this:
The results of ‘simper’ can be very difficult to interpret. The method very badly confounds the mean between group differences and within group variation, and seems to single out variable species instead of distinctive species (Warton et al. 2012). Even if you make groups that are copies of each other, the method will single out species with high contribution, but these are not contributions to non-existing between-group differences but to within-group variation in species abundance.
That said, here are the lines you should change to switch from Bray-Curtis to squared Euclidean. However, I suggest you don't use this function:
diff --git a/R/simper.R b/R/simper.R
index 35fa189..f60c57f 100644
--- a/R/simper.R
+++ b/R/simper.R
@@ -13,9 +13,8 @@
n.b <- nrow(gb)
for(j in seq_len(n.b)) {
for(k in seq_len(n.a)) {
- mdp <- abs(ga[k, , drop = FALSE] - gb[j, , drop = FALSE])
- mep <- ga[k, , drop = FALSE] + gb[j, , drop = FALSE]
- contrp[(j-1)*n.a+k, ] <- mdp / sum(mep)
+ mdp <- (ga[k,, drop=FALSE] - gb[j,, drop = FALSE])^2
+ contrp[(j-1)*n.a+k, ] <- mdp
}
}
colMeans(contrp)
@@ -53,9 +52,8 @@
contr <- matrix(ncol = P, nrow = n.a * n.b)
for (j in seq_len(n.b)) {
for (k in seq_len(n.a)) {
- md <- abs(group.a[k, , drop = FALSE] - group.b[j, , drop = FALSE])
- me <- group.a[k, , drop = FALSE] + group.b[j, , drop = FALSE]
- contr[(j-1)*n.a+k, ] <- md / sum(me)
+ md <- (group.a[k,,drop=FALSE] - group.b[j,,drop=FALSE])^2
+ contr[(j-1)*n.a+k, ] <- md
}
}
average <- colMeans(contr)