I am trying to create the Distance between the first observation and all other observations within a group. The first observation within group A is a1 and group B is b1.
I want a new column in df,"Euclidean', that would have the distance of each observation from the first observation calculated by group.
df <- data.table(Section = rep(c('A', 'B'), each = 4),
ID = c('a1','a2','a3','a4','b1','b2','b3','b4'),
x = c(5,10,15,15,10,15,30,25),
y = c(12,10,8,4,6,8,16,24))
Where distance calculation would be euclidean[a1,a2] = sqrt((x1-x2)^2+(y1-y2)^2). The first value in each group would be 0. I am hoping to accomplish this using dplyr or data.table. Thanks so much.
Two solutions with dplyr
:
(1) By Euclidean distance formula
df %>% group_by(Section) %>%
mutate(Euclidean = sqrt((x - x[1])^2 + (y - y[1])^2))
(2) By base function dist()
df %>% group_by(Section) %>%
mutate(Euclidean = as.matrix(dist(cbind(x, y)))[1, ])
Note: The second way is more flexible if you need to change the power of the Minkowski distance. If you want the distance from other observations, just adjust the number in the square brackets.
Output:
# Section ID x y Euclidean
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 A a1 5 12 0
# 2 A a2 10 10 5.39
# 3 A a3 15 8 10.8
# 4 A a4 15 4 12.8
# 5 B b1 10 6 0
# 6 B b2 15 8 5.39
# 7 B b3 30 16 22.4
# 8 B b4 25 24 23.4