I have 2 datasets one for hospitals and another for procedures. Each dataset has latitude and longitude coordinates. Procedures are either given in or out of the hospital, though the coordinates are not necessarily precise if given in the hospitals. Im trying to form a radius of a certain size around each of the hospitals and determine how many procedure points fall within that radius on average. So if for example I have 100 hospitals and 3000 procedures, I want to form a radius around all hospitals and see on average how many hospitals fall into that specified radius. My initial code is below, but I know this can be done faster. coded in R. Thanks!
for(i in 1:NROW(hospitals)){
hospital <- hospitals[i,]
radius <- .016
# find all the procedures that lie in the .016 sized radius from this hospital
hospital$latitude_low <- hospital$lat - radius
hospital$longitude_low <- hospital$long - radius
hospital$latitude_high <- hospital$lat + radius
hospital$longitude_high <- hospital$long + radius
in_rad <- procedures[(procedures$long >= hospital$longitude_low & procedures$long <=
hospital$longitude_high & procedures$lat <= hospital$latitude_high & procedures$lat >=
hospital$latitude_low),]
num <- NROW(in_rad)
hospitals[i,]$number_of_procedures <- num
}
When you ask a question, you should always include some example data. Like this
lat <- c(-23.8, -25.8)
lon <- c(-49.6, -44.6)
hosp <- cbind(lon, lat)
lat <- c(-22.8, -24.8, -29.1, -28, -20)
lon <- c(-46.4, -46.3, -45.3, -40, -30)
procedures <- cbind(lon, lat)
Are your data in longitude/latitude? If so, you need to use a proper method to compute distances. For example
library(geosphere)
dm <- distm(procedures, hosp)
Or
library(raster)
d <- pointDistance(procedures, hosp, lonlat=TRUE)
Both compute the distance from all procedures to all hospitals. This will fail with very large datasets, but from what you describe it should work fine. Now you can use a threshold (here 400,000 m) to find out which procedures are within that distance of each hospital
apply(d < 400000, 2, which)
#[[1]]
#[1] 1 2
#[[2]]
#[1] 1 2 3
So procedure 1, 2 and 3 are within that distance to hospital 2
If your data are not longitude/latitude, you can use
d <- pointDistance(procedures, hosp, lonlat=FALSE)