I have a df with ~50 IDs. Each ID has multiple sets of coordinates, and I want to remove any set of coordinates that is greater than 8km distance from that ID's first location. The data looks like this.
ID Easting Northing Date
1 593853 5255971 1/24/2008
1 593660 5253841 1/28/2008
1 594513 5253841 2/3/2008
2 583242 5258672 1/21/2008
2 583436 5258031 1/22/2008
3 593983 5258470 1/21/2008
3 591849 5258471 1/24/2008
3 591784 5258974 1/26/2008
3 591984 5258093 1/29/2008
4 591948 5259012 2/4/2009
4 591947 5259016 2/15/2008
4 578452 5261983 2/17/2008
So i want to compare all of ID 1's coordinates to ID 1's first coordinate & remove the row if the distance is >8km.
proj4string <- CRS("+proj=utm +zone=15 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
Any help would be greatly appreciated.
Assuming UTM in meters, this starts with a group-wise calculation and then you can filter/subset in whichever dialect you want.
library(dplyr) # > 1.1.0 for .by, use group_by(.) if older
quux %>%
mutate(
Date = as.Date(Date, format = "%m/%d/%Y"),
dist_km = sqrt((Easting - Easting[which.min(Date)])^2 +
(Northing - Northing[which.min(Date)])^2)/1000,
.by = ID
)
# ID Easting Northing Date dist_km
# 1 1 593853 5255971 2008-01-24 0.000000000
# 2 1 593660 5253841 2008-01-28 2.138726023
# 3 1 594513 5253841 2008-02-03 2.229910312
# 4 2 583242 5258672 2008-01-21 0.000000000
# 5 2 583436 5258031 2008-01-22 0.669714118
# 6 3 593983 5258470 2008-01-21 0.000000000
# 7 3 591849 5258471 2008-01-24 2.134000234
# 8 3 591784 5258974 2008-01-26 2.256017952
# 9 3 591984 5258093 2008-01-29 2.034239416
# 10 4 591948 5259012 2009-02-04 0.004123106
# 11 4 591947 5259016 2008-02-15 0.000000000
# 12 4 578452 5261983 2008-02-17 13.817312112
You can choose to filter(dist_km <= 8)
or do the filter directly on the sqrt(...)
calculations without storing it as a column.
I'm using which.min(Date)
(after making it a real Date
), but if you know that it's always sorted chronologically, you can use Easting[1]
instead.
quux |>
transform(Date = as.Date(Date, format="%m/%d/%Y")) |>
transform(dist_km = ave(seq_along(ID), ID, FUN = function(ind) {
sqrt((Easting[ind] - Easting[ind][which.min(Date[ind])])^2 +
(Northing[ind] - Northing[ind][which.min(Date[ind])])^2)
}) / 1000)
# ID Easting Northing Date dist_km
# 1 1 593853 5255971 2008-01-24 0.000000000
# 2 1 593660 5253841 2008-01-28 2.138726023
# 3 1 594513 5253841 2008-02-03 2.229910312
# 4 2 583242 5258672 2008-01-21 0.000000000
# 5 2 583436 5258031 2008-01-22 0.669714118
# 6 3 593983 5258470 2008-01-21 0.000000000
# 7 3 591849 5258471 2008-01-24 2.134000234
# 8 3 591784 5258974 2008-01-26 2.256017952
# 9 3 591984 5258093 2008-01-29 2.034239416
# 10 4 591948 5259012 2009-02-04 0.004123106
# 11 4 591947 5259016 2008-02-15 0.000000000
# 12 4 578452 5261983 2008-02-17 13.817312112