I have a biological data set where I want to calculate the distance between centroids and each centroid represents a given year (so distance is calculated sequentially). I'm exploring usedist::dist_between_centroids()
to calculate the distance in high dimensional space, but it seems quite arduous since the function requires vector inputs of the grouping variables (in this case, year). I've explored vegan::adonis()
as an alternative function, but I can't figure out how to extract the distances. I've attached some sample data using Dune and recoded one of the factors as 'year.' My actual dataset consists of ~20 years worth of data, so manually calculating distances as I've done below is not practical. I think a loop with dist_between_centroids()
might accomplish this task, but I'm not sure how to specify the grouping vectors in the loop.
# Species and environmental data
require(vegan)
require(usedist)
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
data(dune)
data(dune.env)
all_data <- cbind(dune.env, dune) %>%
arrange(Use)
all_data$Use <- recode_factor(all_data$Use, "Hayfield"="2017")
all_data$Use <- recode_factor(all_data$Use, "Haypastu"="2018")
all_data$Use <- recode_factor(all_data$Use, "Pasture"="2019")
bio_data <- all_data[,6:35]
bio_distmat <- vegdist(bio_data, method = "bray", na.rm=T)
#store distance in matrix
dist_between_mat <- as.data.frame(matrix(ncol=3, nrow=2))
colnames(dist_between_mat) <- c("start_centroid","end_centroid","distance")
dist_between_mat[1,1] = "2017"
dist_between_mat[1,2] = "2018"
dist_between_mat[1,3] = dist_between_centroids(bio_distmat, 1:7,8:15) #distance between 2017 and 2018
dist_between_mat[2,1] = "2018"
dist_between_mat[2,2] = "2019"
dist_between_mat[2,3] = dist_between_centroids(bio_distmat, 8:15,16:20) #distance between 2018 and 2019
You can do this with a simple for-loop. But why write simple code when we can use "tidy" principles instead?
Here is a solution that iterates over the start years and the end years, generates a one-row tibble and then concatenates the rows into a final tibble.
Note that in your reproducible example the years/levels are in reverse chronological order. I use the levels ordering, without casting the levels to years, so make sure that this is the order you intend.
levels(all_data$Use)
#> [1] "2019" "2018" "2017"
n <- nlevels(all_data$Use)
start <- levels(all_data$Use)[1:(n - 1)]
start
#> [1] "2019" "2018"
end <- levels(all_data$Use)[2:n]
end
#> [1] "2018" "2017"
map2_dfr(start, end, ~ {
idx1 <- which(all_data$Use == .x)
idx2 <- which(all_data$Use == .y)
tibble(
start_centroid = .x,
end_centroid = .y,
distance = dist_between_centroids(bio_distmat, idx1, idx2)
)
})
#> # A tibble: 2 × 3
#> start_centroid end_centroid distance
#> <chr> <chr> <dbl>
#> 1 2019 2018 0.210
#> 2 2018 2017 0.327
Created on 2022-07-27 by the reprex package (v2.0.1)