Search code examples
rggplot2geospatialsubsetggmap

Mapping nearest neighbours of a long-lat data set using ggmap, geom_point and a loop


My ultimate goal is to connect all nearest neighbours of a set of buildings (based on Euclidean distance) on a ggmap using geom_path from the ggplot2 package. I need help with a loop that will allow me to plot all neighbours as easily as possible

I have created a distance matrix (called 'kmnew') in kilometres between 3 types of building in Beijing: B (x2), D (x2) and L (x1):

   B        B        D        D        L
B NA 6.599014 5.758531 6.285787 3.770175
B NA       NA 7.141096 3.873296 5.092667
D NA       NA       NA 3.690725 2.563017
D NA       NA       NA       NA 2.832083
L NA       NA       NA       NA       NA

I try to discern the nearest neighbours of each building by row by declaring a matrix and using a loop to ascertain the nearest neighbour building:

nn <- matrix(NA,nrow=5,ncol=1)


for (i in 1:nrow(kmnew)){
  nn[i,] <- which.min(kmnew[i,]) 
}

This returns the following error (not sure why):

Error in nn[i, ] <- which.min(kmnew[i, ]) : replacement has length zero

but seems to return the correct answer to nn:

     [,1]
[1,]    5
[2,]    4
[3,]    5
[4,]    5
[5,]   NA

I append this to an original dataframe called newbjdata:

colbj <- cbind(newbjdata,nn)

that returns

  Name Store sqft     long      lat nn
1    B     1 1200 116.4579 39.93921  5
2    B     2  750 116.3811 39.93312  4
3    D     1  550 116.4417 39.88882  5
4    D     2  600 116.4022 39.90222  5
5    L     1 1000 116.4333 39.91100 NA

I then retrieve my map via ggmap:

bjgmap <- get_map(location = c(lon = 116.407395,lat = 39.904211),
                  zoom = 13, scale = "auto",
                  maptype = "roadmap",
                  messaging = FALSE, urlonly = FALSE,
                  filename = "ggmaptemp", crop = TRUE,
                  color = "bw",
                  source = "google", api_key)

My ultimate goal is to map the nearest neighbours together in a plot using geom_path from the ggplot package.

For example, the nn of the 1st building of type B (row 1) is the 1 building of type L (row 5). Obviously I can draw this line by subsetting the said 2 rows of the dataframe thus:

ggmap(bjgmap) +
geom_point(data = colbj, aes(x = long,y = lat, fill = factor(Name)),
           size =10, pch = 21, col = "white") +
geom_path(data = subset(colbj[c(1,5),]), aes(x = long,y = lat),col = "black")

However, I need a solution that works like a loop, and I can't figure out how one might achieve this, as I need to reference the nn column and refer that back to the long lat data n times. I can well believe that I am not using the most efficient method, so am open to alternatives. Any help much appreciated.


Solution

  • Here is my attempt. I used gcIntermediate() from the geosphere package to set up lines. First, I needed to rearrange your data. When you use gcIntermediate(), you need departure and arrival long/lat. That is you need four columns. In order to arrange your data in this way, I used the dplyr package. mutate_each(colbj, funs(.[nn]), vars = long:lat) works for you to pick up desired arrival long/lat. . is for 'long' and 'lat'. [nn] is the vector index for the variables. Then, I employed gcIntermediate(). This creates SpatialLines. You need to make the object a SpatialLinesDataFrame. Then, you need to convert the output to "normal" data.frame. This step is essential so that ggplot can read your data. fortify() is doing the job.

    library(ggmap)
    library(geosphere)
    library(dplyr)
    library(ggplot2)
    
    ### Arrange the data: set up departure and arrival long/lat
    
    mutate_each(colbj, funs(.[nn]), vars = long:lat) %>%
    rename(arr_long = vars1, arr_lat = vars2) %>%
    filter(complete.cases(nn)) -> mydf
    
    ### Get line information
    
    rts <- gcIntermediate(mydf[,c("long", "lat")],
                          mydf[,c("arr_long", "arr_lat")],
                          50,
                          breakAtDateLine = FALSE,
                          addStartEnd = TRUE,
                          sp = TRUE)
    
    ### Convert the routes to a data frame for ggplot use
    
    rts <- as(rts, "SpatialLinesDataFrame")
    rts.df <- fortify(rts)
    
    
    ### Get a map (borrowing the OP's code)                   
    bjgmap <- get_map(location = c(lon = 116.407395,lat = 39.904211),
                      zoom = 13, scale = "auto",
                      maptype = "roadmap",
                      messaging = FALSE, urlonly = FALSE,
                      filename = "ggmaptemp", crop = TRUE,
                      color = "bw",
                      source = "google", api_key)
    
    # Draw the map
    ggmap(bjgmap) +
    geom_point(data = colbj,aes(x = long, y = lat, fill = factor(Name)),
               size = 10,pch = 21, col = "white") +
    geom_path(data = rts.df, aes(x = long, y = lat, group = group),
              col = "black")
    

    enter image description here

    EDIT

    If you want to do all data manipulation in one sequence, the following is one way to go. foo is identical to rts.df above.

    mutate_each(colbj, funs(.[nn]), vars = long:lat) %>%
    rename(arr_long = vars1, arr_lat = vars2) %>%
    filter(complete.cases(nn)) %>%
    do(fortify(as(gcIntermediate(.[,c("long", "lat")],
                              .[,c("arr_long", "arr_lat")],
                              50,
                              breakAtDateLine = FALSE,
                              addStartEnd = TRUE,
                              sp = TRUE), "SpatialLinesDataFrame"))) -> foo
    
    identical(rts.df, foo)
    #[1] TRUE
    

    DATA

    colbj <- structure(list(Name = structure(c(1L, 1L, 2L, 2L, 3L), .Label = c("B", 
    "D", "L"), class = "factor"), Store = c(1L, 2L, 1L, 2L, 1L), 
    sqft = c(1200L, 750L, 550L, 600L, 1000L), long = c(116.4579, 
    116.3811, 116.4417, 116.4022, 116.4333), lat = c(39.93921, 
    39.93312, 39.88882, 39.90222, 39.911), nn = c(5L, 4L, 5L, 
    5L, NA)), .Names = c("Name", "Store", "sqft", "long", "lat", 
    "nn"), class = "data.frame", row.names = c("1", "2", "3", "4", 
    "5"))