Search code examples
rnested-loops

Calculation with Nested loops in R


I have a Geocode dataset with three columns: Latitude, Longitude and Cluster. I calculated the average center of the clusters and store the results in two lists Center_lat and Center_lon.

Now I wanna calculate the distance from each observation(3000+) to each cluster center(30) with the Haversine formula. To get a 3000 by 30 matrix.

I tried to use a nested for loop, but I got the same distance for all clusters. Here's the code.

for (i in 1:n){
    for (k in 1:c){
    lat1=radians(Geocode[i,1])
    lon1=radians(Geocode[i,2])
    lat2=radians(Center_lat[k,2])
    lon2=radians(Center_lon[k,2])
    }
    R <- 3958.756 # Earth mean radius [miles]
    dist_mat[i,] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R
  }

I'm also thinking using a lapply to substitute the nested loop. But I'm not sure how to use the function... Any help is appreciated.

# Convert to radian
radians = function(theta=0){return(theta * pi / 180)}

# Calculates the geodesic distance from each property to the center of it's current cluster using the
# Spherical Law of Cosines (slc)
get_dist <- function(lat1, lon1, lat2, lon2) {
  R <- 3958.756 # Earth mean radius [miles]
  d <- acos(sin(radians(lat1))*sin(radians(lat2)) + 
              cos(radians(lat1))*cos(radians(lat2)) * cos(radians(lon2)-radians(lon1))) * R
  return(d) # Distance in miles
}

dist_mat<-lapply()

Solution

  • This is the type of computation you want to vectorize in R. Here we use outer to generate all the possible combinations of row indices from your Geocode data and Center_x data, and then apply the distance function in one fell swoop.

    First, get data in easier to use form (one matrix for locations, another for centers, first column lats, second lons):

    # See Data section below for actual data used
    
    # G <- radians(Geocode)
    # C <- radians(cbind(Center_lat[, 2], Center_lon[, 2]))    
    R <- 3958.756 # Earth mean radius [miles]
    

    Define the function, notice how we use the indices to look up the actual coordinates in G and C, and how the function is vectorized (i.e. we only need to call it once with all the data):

    my_dist <- function(xind, yind)
      acos(
        sin(G[xind, 1]) * sin(C[yind, 1]) + 
        cos(G[xind, 1]) * cos(C[yind, 1]) * cos(C[yind, 2] - G[xind, 2])
      ) * R
    

    And apply it with outer:

    DISTS <- outer(seq.int(nrow(G)), seq.int(nrow(C)), my_dist)
    
    str(DISTS)
    # num [1:3000, 1:30] 4208 6500 8623 7303 3864 ...
    
    quantile(DISTS) # to make sure stuff is reasonable:
    #     0%       25%       50%       75%      100% 
    #  0.000  4107.574  6204.799  8333.155 12422.059     
    

    This runs in about 30ms on my system.


    Data:

    set.seed(1)
    lats <- runif(10000, -60, 60) * pi / 180
    lons <- runif(10000, -179, 180) * pi / 180
    
    G.ind <- sample(10000, 3000)
    C.ind <- sample(10000, 30)
    
    G <- cbind(lats[G.ind], lons[G.ind])
    C <- cbind(lats[C.ind], lons[C.ind])