Search code examples
rgeospatialrastergeosphere

Find the Number of Points Within a Certain Radius of a Data Point in R


I have 2 datasets one for hospitals and another for procedures. Each dataset has latitude and longitude coordinates. Procedures are either given in or out of the hospital, though the coordinates are not necessarily precise if given in the hospitals. Im trying to form a radius of a certain size around each of the hospitals and determine how many procedure points fall within that radius on average. So if for example I have 100 hospitals and 3000 procedures, I want to form a radius around all hospitals and see on average how many hospitals fall into that specified radius. My initial code is below, but I know this can be done faster. coded in R. Thanks!

for(i in 1:NROW(hospitals)){
  hospital <- hospitals[i,]
  radius <- .016

  # find all the procedures that lie in the .016 sized radius from this hospital

  hospital$latitude_low <- hospital$lat - radius
  hospital$longitude_low <- hospital$long - radius
  hospital$latitude_high <- hospital$lat + radius
  hospital$longitude_high <- hospital$long + radius

  in_rad <- procedures[(procedures$long >= hospital$longitude_low & procedures$long <= 
  hospital$longitude_high & procedures$lat <= hospital$latitude_high & procedures$lat >= 
  hospital$latitude_low),]

  num <- NROW(in_rad)
  hospitals[i,]$number_of_procedures <- num
}

Solution

  • When you ask a question, you should always include some example data. Like this

    lat <- c(-23.8, -25.8)
    lon <- c(-49.6, -44.6)
    hosp <- cbind(lon, lat)
    
    
    lat <- c(-22.8, -24.8, -29.1, -28, -20)
    lon <- c(-46.4, -46.3, -45.3, -40, -30)
    procedures <- cbind(lon, lat)
    

    Are your data in longitude/latitude? If so, you need to use a proper method to compute distances. For example

     library(geosphere)
     dm <- distm(procedures, hosp)
    

    Or

     library(raster)
     d <- pointDistance(procedures, hosp, lonlat=TRUE)
    

    Both compute the distance from all procedures to all hospitals. This will fail with very large datasets, but from what you describe it should work fine. Now you can use a threshold (here 400,000 m) to find out which procedures are within that distance of each hospital

    apply(d < 400000, 2, which)
    #[[1]]
    #[1] 1 2
    
    #[[2]]
    #[1] 1 2 3
    

    So procedure 1, 2 and 3 are within that distance to hospital 2

    If your data are not longitude/latitude, you can use

     d <- pointDistance(procedures, hosp, lonlat=FALSE)