I have two separate data tables that I want to find the closet zip code from one data table to the other. From the SiteZip table, I want the zip codes to loop through the ConsumerZip table to obtain the shortest distance between zip codes. I used the zipcodeR package to assign lat and long to it. The two tables are below. The first table are the locations of stores. The other table are customer locations. I want to be able to create a single table that shows the closest store for each customer. I have researched for a couple days and haven't found many requests that match what I'm looking for. Ultimately I would map the store locations with a density circle by customer location. Thank you in advance for any help you can provide.
head(StoreZip)
Zip Store Address1 City State lat lng
1: 01026 11111 151 South Street Cummington MA 42.48 -72.93
2: 01040 11112 303 Beech St. Holyoke MA 42.22 -72.64
3: 01104 11113 417 Liberty St. Springfield MA 42.13 -72.57
4: 01104 11114 2155 Main St. Springfield MA 42.13 -72.57
5: 01301 11115 55 Federal St. Greenfield MA 42.63 -72.59
6: 01301 11116 1 Arch Place Greenfield MA 42.63 -72.59```
```head(CustomersZip)
Zip ID Address1 City State lat lng
1: 01001 65484654805 1548 MAIN STREET AGAWAM MA 42.07 -72.63
2: 01001 64846124846 569 MAPLE ST AGAWAM MA 42.07 -72.63
3: 01001 68421548945 68 PLANTATION DR AGAWAM MA 42.07 -72.63
4: 01001 84051545484 154 South DR AGAWAM MA 42.07 -72.63
5: 01001 97545154848 985 Main St AGAWAM MA 42.07 -72.63
6: 01002 64841515484 54 KING ST PELHAM MA 42.38 -72.52
Here's one solution for mapping each CustomersZip$ID
with the closest StoreZip$Store
:
library(data.table)
# library(geosphere) # distHaversine
CustomersZip[
, Store := StoreZip$Store[
which.min(geosphere::distHaversine(
cbind(first(lng), first(lat)),
StoreZip[, cbind("lng", "lat"), with = FALSE])) ]
, by = ID ]
CustomersZip
# Zip ID lat lng Store
# <int> <char> <num> <num> <int>
# 1: 1001 65484654805 42.07 -72.63 11113
# 2: 1001 64846124846 42.07 -72.63 11113
# 3: 1001 68421548945 42.07 -72.63 11113
# 4: 1001 84051545484 42.07 -72.63 11113
# 5: 1001 97545154848 42.07 -72.63 11113
# 6: 1002 64841515484 42.38 -72.52 11112
Walk-through:
distHaversine
operates on two arguments, typically matrices (or frames) with two columns each; how it calculates distances depends on the number of points in each of its arguments p1
and p2
:
p1
has one point, then it calculates all p2
points to the individual p1
point; similarly if p2
has one point;p1
and p2
have the same number of points, it calculates the distance point-wise, so that row1 with row1, row2 with row2, etc; it does not do a cartesian expansion of "row1 with row1,row2,row3,...", "row2 with row1,row2,row3,...", so that has to happen externallywhich.min
) and store its Store
idCustomersZip$ID
and use just the first
lat/lng found for that customer; if it's possible that a customer will have different points, then remove the first
calls, and group with by = seq_len(nrow(CustomersZip))
instead; if this step is unnecessary, it will still calculate the same results, the only loss will be some efficiency by calculating the same distances multiple timesReusable data (a subset of what is in the question):
StoreZip <- fread("
Zip Store lat lng
01026 11111 42.48 -72.93
01040 11112 42.22 -72.64
01104 11113 42.13 -72.57
01104 11114 42.13 -72.57
01301 11115 42.63 -72.59
01301 11116 42.63 -72.59")
CustomersZip <- fread("
Zip ID lat lng
01001 65484654805 42.07 -72.63
01001 64846124846 42.07 -72.63
01001 68421548945 42.07 -72.63
01001 84051545484 42.07 -72.63
01001 97545154848 42.07 -72.63
01002 64841515484 42.38 -72.52", colClasses = list(character="ID"))