Search code examples
rdplyrdata.tablesapply

Calculate difference between one observation and all other observations by group


I am trying to create the Distance between the first observation and all other observations within a group. The first observation within group A is a1 and group B is b1.

I want a new column in df,"Euclidean', that would have the distance of each observation from the first observation calculated by group.

 df <- data.table(Section = rep(c('A', 'B'), each = 4),
                                 ID = c('a1','a2','a3','a4','b1','b2','b3','b4'),
                                  x = c(5,10,15,15,10,15,30,25),
                                  y = c(12,10,8,4,6,8,16,24))

Where distance calculation would be euclidean[a1,a2] = sqrt((x1-x2)^2+(y1-y2)^2). The first value in each group would be 0. I am hoping to accomplish this using dplyr or data.table. Thanks so much.


Solution

  • Two solutions with dplyr:

    (1) By Euclidean distance formula

    df %>% group_by(Section) %>%
      mutate(Euclidean = sqrt((x - x[1])^2 + (y - y[1])^2))
    

    (2) By base function dist()

    df %>% group_by(Section) %>%
      mutate(Euclidean = as.matrix(dist(cbind(x, y)))[1, ])
    

    Note: The second way is more flexible if you need to change the power of the Minkowski distance. If you want the distance from other observations, just adjust the number in the square brackets.


    Output:

    #   Section ID        x     y Euclidean
    #   <chr>   <chr> <dbl> <dbl>     <dbl>
    # 1 A       a1        5    12      0   
    # 2 A       a2       10    10      5.39
    # 3 A       a3       15     8     10.8 
    # 4 A       a4       15     4     12.8 
    # 5 B       b1       10     6      0   
    # 6 B       b2       15     8      5.39
    # 7 B       b3       30    16     22.4 
    # 8 B       b4       25    24     23.4