Search code examples
rdataframedistance

Calculate Euclidean distance between multiple pairs of points in dataframe in R


I'm trying to calculate the Euclidean distance between pairs of points in a dataframe in R, and there's an ID for each pair:

ID <- sample(1:10, 10, replace=FALSE)
P <- runif(10, min=1, max=3)
S <- runif(10, min=1, max=3)

testdf <- data.frame(ID, P, S)

I found several ways to calculate the Euclidean distance in R, but I'm either getting an error, returning only 1 value (so it's computing the distance between the entire vector), or I end up with a matrix when all I need is a 4th column with the distance between each pair (columns 'P' and 'S.') I'm a bit confused by matrices so I'm not sure how to work with that result.

Tried making a function and applying it to the 2 columns but I get an error:

testdf$V <- apply(testdf[ , c('P', 'S')], 1, function(P, S) sqrt(sum((P^2, S^2)))

 # Error in FUN(newX[, i], ...) : argument "S" is missing, with no default

Then tried using the dist() function in the stats package but it only returns 1 value: (Same problem if I follow the method here: https://www.statology.org/euclidean-distance-in-r/)

P <- testdf$P
S <- testdf$S
testProbMatrix <- rbind(P, S)
stats::dist(testProbMatrix, method = "euclidean")
# returns only 1 distance 

Returns a matrix (Here's a nice explanation why: Calculate the distances between pairs of points in r)

stats::dist(cbind(P, S), method = "euclidean")

But I'm confused how to pull the distances out of the matrix and attach them to the correct ID for each pair of points. I don't understand why I have to make a matrix instead of just applying the function to the dataframe - matrices have always confused me. I think this is the same question as here (Finding euclidean distance between all pair of points) but for R instead of Python

Thanks for the help!


Solution

  • Try this out if you would just like to add another column to your dataframe

    testdf$distance <- sqrt((P^2 + S^2))