Search code examples
rcoordinatesdistancespatial

Remove locations >8km from the original location for each ID in a df


I have a df with ~50 IDs. Each ID has multiple sets of coordinates, and I want to remove any set of coordinates that is greater than 8km distance from that ID's first location. The data looks like this.

ID Easting  Northing   Date
1  593853   5255971    1/24/2008
1  593660   5253841    1/28/2008
1  594513   5253841    2/3/2008
2  583242   5258672    1/21/2008
2  583436   5258031    1/22/2008
3  593983   5258470    1/21/2008
3  591849   5258471    1/24/2008
3  591784   5258974    1/26/2008
3  591984   5258093    1/29/2008
4  591948   5259012    2/4/2009
4  591947   5259016    2/15/2008
4  578452   5261983    2/17/2008

So i want to compare all of ID 1's coordinates to ID 1's first coordinate & remove the row if the distance is >8km.

proj4string <- CRS("+proj=utm +zone=15 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")

Any help would be greatly appreciated.


Solution

  • Assuming UTM in meters, this starts with a group-wise calculation and then you can filter/subset in whichever dialect you want.

    dplyr

    library(dplyr) # > 1.1.0 for .by, use group_by(.) if older
    quux %>%
      mutate(
        Date = as.Date(Date, format = "%m/%d/%Y"),
        dist_km = sqrt((Easting - Easting[which.min(Date)])^2 +
                       (Northing - Northing[which.min(Date)])^2)/1000,
        .by = ID
      )
    #    ID Easting Northing       Date      dist_km
    # 1   1  593853  5255971 2008-01-24  0.000000000
    # 2   1  593660  5253841 2008-01-28  2.138726023
    # 3   1  594513  5253841 2008-02-03  2.229910312
    # 4   2  583242  5258672 2008-01-21  0.000000000
    # 5   2  583436  5258031 2008-01-22  0.669714118
    # 6   3  593983  5258470 2008-01-21  0.000000000
    # 7   3  591849  5258471 2008-01-24  2.134000234
    # 8   3  591784  5258974 2008-01-26  2.256017952
    # 9   3  591984  5258093 2008-01-29  2.034239416
    # 10  4  591948  5259012 2009-02-04  0.004123106
    # 11  4  591947  5259016 2008-02-15  0.000000000
    # 12  4  578452  5261983 2008-02-17 13.817312112
    

    You can choose to filter(dist_km <= 8) or do the filter directly on the sqrt(...) calculations without storing it as a column.

    I'm using which.min(Date) (after making it a real Date), but if you know that it's always sorted chronologically, you can use Easting[1] instead.

    base R

    quux |>
      transform(Date = as.Date(Date, format="%m/%d/%Y")) |>
      transform(dist_km = ave(seq_along(ID), ID, FUN = function(ind) {
        sqrt((Easting[ind] - Easting[ind][which.min(Date[ind])])^2 +
               (Northing[ind] - Northing[ind][which.min(Date[ind])])^2)
      }) / 1000)
    #    ID Easting Northing       Date      dist_km
    # 1   1  593853  5255971 2008-01-24  0.000000000
    # 2   1  593660  5253841 2008-01-28  2.138726023
    # 3   1  594513  5253841 2008-02-03  2.229910312
    # 4   2  583242  5258672 2008-01-21  0.000000000
    # 5   2  583436  5258031 2008-01-22  0.669714118
    # 6   3  593983  5258470 2008-01-21  0.000000000
    # 7   3  591849  5258471 2008-01-24  2.134000234
    # 8   3  591784  5258974 2008-01-26  2.256017952
    # 9   3  591984  5258093 2008-01-29  2.034239416
    # 10  4  591948  5259012 2009-02-04  0.004123106
    # 11  4  591947  5259016 2008-02-15  0.000000000
    # 12  4  578452  5261983 2008-02-17 13.817312112