I am hoping to build a matrix of 'fake' coordinates (i.e., they could be anywhere on Earth) based on their distances (in km or m) from one another.
I have a data frame containing distances between locations.
dist_df<- data.frame(site1=c("a","b","c","d"),
site2=c("b","c","d","a"),
distance = c(222.1, 672.4, 45.2, 65.4))
However, the actual coordinates have obstructions between them. Thus, I have circumvented the obstructions with another bit of code to obtain a least-cost distance. To run a series of later functions I require lat/long. I figured the easiest way to do this was to generate 'fake' coordinates based on their relative least-cost distances.
I am hoping to obtain something like the following:
coordinates_sites<-data.frame(site=c("a","b","c","d"),
lat=c(34.5332,32.1232,30.4232,30.4232),
long=c(-120.2222,-125.4422,-123.3512,-122.4232))
Any thoughts on the best way to do this? Many thanks in advance!
Here is a solution using cmdscale
. As cmdscale
needs a complete distance matrix, I had to fill in the missing distances in the dataframe you provided:
dist_df<- data.frame(site1=c("a","b","c","d", "d", "c"),
site2=c("b","c","d","a", "b", "a"),
distance = c(222.1, 672.4, 45.2, 65.4, 10.3, 110.1))
dst <- dist_df %>%
{
names <- union(.$site1, .$site2)
m <- matrix(0,
nrow = length(names),
ncol = length(names),
dimnames = list(names, names))
m[cbind(.$site1, .$site2)] <- pull(., distance)
m[cbind(.$site2, .$site1)] <- pull(., distance)
diag(m) <- 0
as.dist(m)
}
dst
##> a b c
##> b 222.1
##> c 110.1 672.4
##> d 65.4 10.3 45.2
The fake
coordinates can be obtained by the following:
cmdscale(dst)
##> [,1] [,2]
##>a -19.88532 39.4181776
##>b 341.11969 -0.9109572
##>c -331.30946 -4.3427637
##>d 10.07509 -34.1644567
Note that the solution returned by cmdscale
is an optimization and does not preserve faithfully input distances:
dist(cmdscale(dst))
##> a b c
##> b 363.25068
##> c 314.48372 672.43790
##> d 79.44829 332.71057 342.68461
However, if you start from a matrix of actual distances, cmdscale
can reconstruct accurately the coordinates. It is shown in the cmdscale
example:
loc <- cmdscale(eurodist)
x <- loc[, 1]
y <- -loc[, 2] # reflect so North is at the top
## note asp = 1, to ensure Euclidean distances are represented correctly
plot(x, y, type = "n", xlab = "", ylab = "", asp = 1, axes = FALSE,
main = "cmdscale(eurodist)")
text(x, y, rownames(loc), cex = 0.6)