Search code examples
rmergedata.tablecoordinatesr-maptools

R - Merging city name to approximate lat-long coordinates


I want to merge city names to approximate coordinates.

I have two datasets.

  1. lat-long for cities, called cities.
  2. lat-long for observed events, called events.

Most of the events occur just out-side the lat-longs of the city.

I want to merge in the city from cities if the lat-long are max 1 lat and lon different from those listed in events.

The nearest function in data.table seems to be too crude.

What would you do? Use maptools?

Example:

cities <- data.table(city = c("A", "B", "C"),
                 lat = c(23.4, 43.5, 21.3),
                 lon = c(100, 98.4, -78.2))

events <- data.table(event = c("X1", "Y1", "B1"),
                 lat = c(24.4, 42.5, 23.3),
                 lon = c(101, 100.4, -78.2)))

result <- data.table(event = c("X1", "Y1", "B1"),
                 lat = c(23.4, 43.5, 21.3),
                 lon = c(100, 98.4, -78.2),
                 city = c("A", NA, NA))

> result
   event  lat   lon city
1:    X1 23.4 100.0    A
2:    Y1 43.5  98.4 <NA>
3:    B1 21.3 -78.2 <NA>

Solution

  • method 1: non-equi join

    This non-equi update join do the trick... But this only will work since you put on a hard 1-degree limit. Problem is dat the distance bewteen 2 degrees will vary around the globe...

    events[ cities[, `:=`(lat_min = lat - 1, lat_max = lat+1,
                          lon_min = lon - 1, lon_max = lon + 1) ], 
            city := i.city, 
            on = .(lat >= lat_min, lat <= lat_max, lon >= lon_min, lon <= lon_max ) ][]
    
    #    event  lat   lon city
    # 1:    X1 24.4 101.0    A
    # 2:    Y1 42.5 100.4 <NA>
    # 3:    B1 23.3 -78.2 <NA>
    

    method 2: based on absolute distance

    If you want to set a maximum distance bwetween events and cities, you'll need a spatial solution like this:

    #maximum distance between event and city (in metres)
    max_dist = 180000
    
    library( sf )
    #create simple (point) features of events and cities
    cities.sf <- st_as_sf( cities, coords = c("lon", "lat"), crs = 4326 )
    events.sf <- st_as_sf( events, coords = c("lon", "lat"), crs = 4326 )
    
    #spatial join
    st_join( events.sf, cities.sf, join = st_is_within_distance, dist = max_dist )
    
    # Simple feature collection with 3 features and 2 fields
    # geometry type:  POINT
    # dimension:      XY
    # bbox:           xmin: -78.2 ymin: 23.3 xmax: 101 ymax: 42.5
    # CRS:            EPSG:4326
    #   event city           geometry
    # 1    X1    A   POINT (101 24.4)
    # 2    Y1 <NA> POINT (100.4 42.5)
    # 3    B1 <NA> POINT (-78.2 23.3)