I want to create a new distance matrix from an array of hundreds of distance matrices. This new matrix will be the average distances given a set of pairwise distances. The values to be averaged is based matching name prefixes. Easier to illustrate as my language might be incorrect: Here's one sample distance matrix:
SpeciesA_1 SpeciesA_2 SpeciesC_1. SpeciesC_2
SpeciesA_1 0 0.2 0.3 0.4
SpeciesA_2 0.2 0 0.4 0.5
SpeciesC_1 0.3 0.4 0 0.1
SpeciesC_2 0.4 0.5 0.1 0
I want the mean of all possible pairwise combinations given two prefix. For example:
SpeciesA_1 & SpeciesC_1 (0.3)
SpeciesA_1 & SpeciesC_2 (0.4)
SpeciesA_2 & SpeciesC_1 (0.4)
SpeciesA_2 & SpeciesC_2 (0.5)
Therefore, my resulting new distance matrix in this example should be:
SpeciesA SpeciesC
SpeciesA 0 0.4
SpeciesC 0.4 0
My actual matrices have about 140 columns and rows. I have no idea where to even get started with this. These dists are an array:
dists[,,i]
I can't use a loop (which I would normally start with by selecting the first name) because it is an array which I never work with.
dim(dists)
[1] 135 135 138
Here's an example of how I initially made the array of matrices and then filled them in:
labs <- c("A_1","A_2", "B_1", "B_2", "C_1", "C_3", "D_1", "D_3")
dists <- array(0, dim= c(length(labs),length(labs), length(tr)),dimnames=list(labs, labs))```
For testing, I've reduced my dataset to a smaller version for reproducibility and providing it here via dput: ```structure(c(0, 0.0636415002701785, 0.145334597498582, 0.163803183404822,
0.163803183404822, 0.163803183404822, 0.181754323491637, 2, 2,
2, 2, 0.0636415002701785, 0, 0.145334597498582, 0.163803183404822,
0.163803183404822, 0.163803183404822, 0.181754323491637, 2, 2,
2, 2, 0.145334597498582, 0.145334597498582, 0, 0.163803183404822,
0.163803183404822, 0.163803183404822, 0.181754323491637, 2, 2,
2, 2, 0.163803183404822, 0.163803183404822, 0.163803183404822,
0, 0.054577752568928, 0.109103869804459, 0.181754323491637, 2,
2, 2, 2, 0.163803183404822, 0.163803183404822, 0.163803183404822,
0.054577752568928, 0, 0.109103869804459, 0.181754323491637, 2,
2, 2, 2, 0.163803183404822, 0.163803183404822, 0.163803183404822,
0.109103869804459, 0.109103869804459, 0, 0.181754323491637, 2,
2, 2, 2, 0.181754323491637, 0.181754323491637, 0.181754323491637,
0.181754323491637, 0.181754323491637, 0.181754323491637, 0, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 1.33332929092241, 1.33332929092241,
1.33332929092241, 2, 2, 2, 2, 2, 2, 2, 1.33332929092241, 0, 0.222274233863968,
0.444424503431041, 2, 2, 2, 2, 2, 2, 2, 1.33332929092241, 0.222274233863968,
0, 0.444424503431041, 2, 2, 2, 2, 2, 2, 2, 1.33332929092241,
0.444424503431041, 0.444424503431041, 0, 0, 0.0938002031509043,
0.0938002031509043, 0, 0.0938002031509043, 0.167066002969562,
0.167066002969562, 0.445943812443056, 2, 2, 2, 0.0938002031509043,
0, 0.0317248785850623, 0, 0.0663652495301367, 0.167066002969562,
0.167066002969562, 0.445943812443056, 2, 2, 2, 0.0938002031509043,
0.0317248785850623, 0, 0, 0.0663652495301367, 0.167066002969562,
0.167066002969562, 0.445943812443056, 2, 2, 2, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0.0938002031509043, 0.0663652495301367, 0.0663652495301367,
0, 0, 0.167066002969562, 0.167066002969562, 0.445943812443056,
2, 2, 2, 0.167066002969562, 0.167066002969562, 0.167066002969562,
0, 0.167066002969562, 0, 0.0896758431063197, 0.445943812443056,
2, 2, 2, 0.167066002969562, 0.167066002969562, 0.167066002969562,
0, 0.167066002969562, 0.0896758431063197, 0, 0.445943812443056,
2, 2, 2, 0.445943812443056, 0.445943812443056, 0.445943812443056,
0, 0.445943812443056, 0.445943812443056, 0.445943812443056, 0,
2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 2,
2, 2, 0, 1.49478622213262, 2, 2, 2, 0, 2, 2, 2, 2, 2, 1.49478622213262,
0, 0, 0.16633562218756, 0.739785630571588, 0.739785630571588,
0.739785630571589, 0, 0.739785630571589, 0, 0.739785630571589,
1.33468159813059, 1.33468159813059, 0.16633562218756, 0, 0.739785630571588,
0.739785630571588, 0.739785630571589, 0, 0.739785630571589, 0,
0.739785630571589, 1.33468159813059, 1.33468159813059, 0.739785630571588,
0.739785630571588, 0, 0.246730997741163, 0.493610100538917, 0,
0.493610100538917, 0, 0.493610100538917, 1.33468159813059, 1.33468159813059,
0.739785630571588, 0.739785630571588, 0.246730997741163, 0, 0.493610100538917,
0, 0.493610100538917, 0, 0.493610100538917, 1.33468159813059,
1.33468159813059, 0.739785630571589, 0.739785630571589, 0.493610100538917,
0.493610100538917, 0, 0, 0.165591599845893, 0, 0.329233335343189,
1.33468159813059, 1.33468159813059, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.739785630571589, 0.739785630571589, 0.493610100538917,
0.493610100538917, 0.165591599845893, 0, 0, 0, 0.329233335343189,
1.33468159813059, 1.33468159813059, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.739785630571589, 0.739785630571589, 0.493610100538917,
0.493610100538917, 0.329233335343189, 0, 0.329233335343189, 0,
0, 1.33468159813059, 1.33468159813059, 1.33468159813059, 1.33468159813059,
1.33468159813059, 1.33468159813059, 1.33468159813059, 0, 1.33468159813059,
0, 1.33468159813059, 0, 0.999452440298663, 1.33468159813059,
1.33468159813059, 1.33468159813059, 1.33468159813059, 1.33468159813059,
0, 1.33468159813059, 0, 1.33468159813059, 0.999452440298663,
0), .Dim = c(11L, 11L, 3L), .Dimnames = list(c("Az14", "Az14_3",
"Az201", "Az201_3", "Az92", "Az92_3", "Am", "Am_3",
"Ap02", "Ap02_3", "og"), c("Az14", "Az14_3",
"Az201", "Az201_3", "Az92", "Az92_3", "Am", "Am_3",
"Ap02", "Ap02_3", "og"), NULL))```
Using the array in the updated question (here it is x
).
Get the row/column indices of each unique prefix (assumes row names = column names):
idx <- split(1:nrow(x), sapply(strsplit(rownames(x), "_"), "[[", 1))
Initialize the final distance matrix:
m <- array(0, c(rep(length(idx), 2), dim(x)[3]), rep(list(names(idx)), 2))
Calculate the mean distances for each combination of prefixes and put the values into m
:
for (i in 1:(length(idx) - 1)) {
for (j in (i + 1):length(idx)) {
m[i, j,] <- m[j, i,] <-
colMeans(x[idx[[i]], idx[[j]],,drop = FALSE], dims = 2)
}
}
Result:
m[,,1]
#> Am Ap02 Az14 Az201 Az92 og
#> Am 0.000000 1.6666646 1.0908772 1.0908772 1.0908772 1.6666646
#> Ap02 1.666665 0.0000000 2.0000000 2.0000000 2.0000000 0.4444245
#> Az14 1.090877 2.0000000 0.0000000 0.1545689 0.1638032 2.0000000
#> Az201 1.090877 2.0000000 0.1545689 0.0000000 0.1228220 2.0000000
#> Az92 1.090877 2.0000000 0.1638032 0.1228220 0.0000000 2.0000000
#> og 1.666665 0.4444245 2.0000000 2.0000000 2.0000000 0.0000000
m[,,3]
#> Am Ap02 Az14 Az201 Az92 og
#> Am 0.0000000 0.4159787 0.3698928 0.2468051 0.0413979 0.6673408
#> Ap02 0.4159787 0.0000000 1.0372336 0.9141458 0.4159787 1.1670670
#> Az14 0.3698928 1.0372336 0.0000000 0.7397856 0.3698928 1.3346816
#> Az201 0.2468051 0.9141458 0.7397856 0.0000000 0.2468051 1.3346816
#> Az92 0.0413979 0.4159787 0.3698928 0.2468051 0.0000000 0.6673408
#> og 0.6673408 1.1670670 1.3346816 1.3346816 0.6673408 0.0000000
rowMeans(m, dims = 2)
#> Am Ap02 Az14 Az201 Az92 og
#> Am 0.0000000 1.360881 0.5890916 0.4969782 0.4731441 1.444668
#> Ap02 1.3608811 0.000000 1.6790779 1.3047153 1.4719929 1.119628
#> Az14 0.5890916 1.679078 0.0000000 0.3085786 0.2190901 1.778227
#> Az201 0.4969782 1.304715 0.3085786 0.0000000 0.1426616 1.444894
#> Az92 0.4731441 1.471993 0.2190901 0.1426616 0.0000000 1.555780
#> og 1.4446685 1.119628 1.7782272 1.4448939 1.5557803 0.000000
identical(rowMeans(m, dims = 2), apply(m, 1:2, mean))
#> [1] TRUE
As a function:
groupDist <- function(x) {
idx <- split(1:nrow(x), sapply(strsplit(rownames(x), "_"), "[[", 1))
m <- array(0, c(rep(length(idx), 2), dim(x)[3]), rep(list(names(idx)), 2))
for (i in 1:(length(idx) - 1)) {
for (j in (i + 1):length(idx)) {
m[i, j,] <- m[j, i,] <-
colMeans(x[idx[[i]], idx[[j]],,drop = FALSE], dims = 2)
}
}
m
}