R extract distance between centroids to data frame using Vegan

I have a biological data set where I want to calculate the distance between centroids and each centroid represents a given year (so distance is calculated sequentially). I'm exploring usedist::dist_between_centroids() to calculate the distance in high dimensional space, but it seems quite arduous since the function requires vector inputs of the grouping variables (in this case, year). I've explored vegan::adonis() as an alternative function, but I can't figure out how to extract the distances. I've attached some sample data using Dune and recoded one of the factors as 'year.' My actual dataset consists of ~20 years worth of data, so manually calculating distances as I've done below is not practical. I think a loop with dist_between_centroids() might accomplish this task, but I'm not sure how to specify the grouping vectors in the loop.


# Species and environmental data
require(vegan)
require(usedist)

dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)

dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)

data(dune) 
data(dune.env)

all_data <- cbind(dune.env, dune) %>%
              arrange(Use)

all_data$Use <- recode_factor(all_data$Use, "Hayfield"="2017")
all_data$Use <- recode_factor(all_data$Use, "Haypastu"="2018")
all_data$Use <- recode_factor(all_data$Use, "Pasture"="2019")


bio_data <- all_data[,6:35] 

bio_distmat <- vegdist(bio_data, method = "bray", na.rm=T) 


#store distance in matrix
dist_between_mat <- as.data.frame(matrix(ncol=3, nrow=2))
colnames(dist_between_mat) <- c("start_centroid","end_centroid","distance")

dist_between_mat[1,1] = "2017"
dist_between_mat[1,2] = "2018"
dist_between_mat[1,3] = dist_between_centroids(bio_distmat, 1:7,8:15) #distance between 2017 and 2018

dist_between_mat[2,1] = "2018"
dist_between_mat[2,2] = "2019"
dist_between_mat[2,3] = dist_between_centroids(bio_distmat, 8:15,16:20) #distance between 2018 and 2019

Solution

You can do this with a simple for-loop. But why write simple code when we can use "tidy" principles instead?

Here is a solution that iterates over the start years and the end years, generates a one-row tibble and then concatenates the rows into a final tibble.

Note that in your reproducible example the years/levels are in reverse chronological order. I use the levels ordering, without casting the levels to years, so make sure that this is the order you intend.

levels(all_data$Use)
#> [1] "2019" "2018" "2017"

n <- nlevels(all_data$Use)

start <- levels(all_data$Use)[1:(n - 1)]
start
#> [1] "2019" "2018"
end <- levels(all_data$Use)[2:n]
end
#> [1] "2018" "2017"

map2_dfr(start, end, ~ {
  idx1 <- which(all_data$Use == .x)
  idx2 <- which(all_data$Use == .y)
  tibble(
    start_centroid = .x,
    end_centroid = .y,
    distance = dist_between_centroids(bio_distmat, idx1, idx2)
  )
})
#> # A tibble: 2 × 3
#>   start_centroid end_centroid distance
#>   <chr>          <chr>           <dbl>
#> 1 2019           2018            0.210
#> 2 2018           2017            0.327

^{Created on 2022-07-27 by the reprex package (v2.0.1)}