Search code examples
rgeocodingggmapgoogle-geocoding-api

Flag geocoding errors in R using the ggmap package


I have a dataset with two columns on_road and at_road, the combination of which make up a string called geocode_string. With this string, I wish to geocode these intersections using my google API key. As an example, I have on_road = Silverdale and at_road = W 28th St, which combine to form geocode_string = Silverdale and W 28th St, Cleveland, OH.

However, when I try and use the geocode function from ggmap, I get this message: "SILVERDALE and W ..." not uniquely geocoded, using "silverdale ave, cleveland, oh 44109, usa".

It seems in this case that R just assumes a location by default, in this case just silverdale ave. I would like to have R not do this- perhaps just to leave blank the locations for which a unique geocode cannot be found. I can then go through and manually find the coordinates for such cases. I just would like to flag the observations in some way.

I'd also like to point out that in the second row of the dataset, I get S MARGINAL RD and W 93RD ST , CLEVELAND , OH, an intersection that does not exist in Cleveland. When I paste that string into google maps, it seems to search for a partial match and gives me the coordinates for S Marginal Rd. Any thoughts why an intersection that does not exist would generate coordinates in this case, but not the Silverdale case described above? Is there any way to prevent this from happening?

I would greatly appreciate any help!

geocode(df$geocode_string)

structure(list(on_road = c("EDDY RD", "S MARGINAL RD", "MLK", 
"MLK", "IMPERIAL AVE", "HARVARD", "E 55TH", "W 41ST", "SILVERDALE", 
"ONTARIO", "MLK", "CEDAR", "DENNISON AVE", "QUIGLEY RD", "AEROSPACE PKWY", 
"CEDAR", "MLK DR", "LEE RD", "E 93RD", "W QUIGOLY", "W 14TH", 
"W 25TH", "W MALL DR", "E 185TH", "FARRINGTON", "APPLE AVE", 
"FAIRHILL RD", "ST CLAIR", "E 93RD", "FAIRHILL", "E 123RD", "DETROIT RD", 
"CEDAR HILL", "MARTIN LUTHER KING BLVD", "E 109TH", "W 105TH", 
"W WOODLAND AVE", "LAKEWOOD HTS BLVD", "E 56TH", "MARTIN LUTHER KING BLVD", 
"OVINGTON", "MADISON AVE", "QUIGLEY", "DILLE RD", "QUINCY", "MLK", 
"CORONADO AVE", "DETROIT", "MT SINAI DR", "LAKESIDE AVE"), at_road = c("PAXTON RD", 
"W 93RD ST", "ANSEL", "PARKVIEW", "LUKE AVE", "E 163RD", "SCOVAL", 
"BAILY", "W 28TH ST", "E 6TH", "SUPERIOR", "AMBLESIDE", "W 53RD ST", 
"STEELYARD DR", "E SHAFFORD RD", "AMBLESIDE", "ANSEL RD", "S JUDSON AVE", 
"SOPHIA", "STEEL DR", "QUIGOLY", "DENISON", "PUBLIC SQ", "ST CLAIR", 
"E 127TH", "W 41ST PL", "CEDAR RD", "E 178TH", "LAMONTIER", "AMBLESIDE", 
"GRIFFIN AVE", "W 102ND", "MURRAY HILL", "MT AUBURN", "ST CLAIR", 
"S FRONTAGE", "KEMPER", "ALGERS", "BROADWAY AVE", "CORLETT AVE", 
"UNION", "W 86TH ST", "STEELYARD DR", "ST CLAIR AVE", "E 38TH", 
"BENHAM", "E 126TH", "W 47TH", "MLK JR BLVD", "FRANZ PASTORINA"
), geocode_string = c("EDDY RD and PAXTON RD , CLEVELAND , OH", 
"S MARGINAL RD and W 93RD ST , CLEVELAND , OH", "MLK and ANSEL , CLEVELAND , OH", 
"MLK and PARKVIEW , CLEVELAND , OH", "IMPERIAL AVE and LUKE AVE , CLEVELAND , OH", 
"HARVARD and E 163RD , CLEVELAND , OH", "E 55TH and SCOVAL , CLEVELAND , OH", 
"W 41ST and BAILY , CLEVELAND , OH", "SILVERDALE and W 28TH ST , CLEVELAND , OH", 
"ONTARIO and E 6TH , CLEVELAND , OH", "MLK and SUPERIOR , CLEVELAND , OH", 
"CEDAR and AMBLESIDE , CLEVELAND , OH", "DENNISON AVE and W 53RD ST , CLEVELAND , OH", 
"QUIGLEY RD and STEELYARD DR , CLEVELAND , OH", "AEROSPACE PKWY and E SHAFFORD RD , CLEVELAND , OH", 
"CEDAR and AMBLESIDE , CLEVELAND , OH", "MLK DR and ANSEL RD , CLEVELAND , OH", 
"LEE RD and S JUDSON AVE , CLEVELAND , OH", "E 93RD and SOPHIA , CLEVELAND , OH", 
"W QUIGOLY and STEEL DR , CLEVELAND , OH", "W 14TH and QUIGOLY , CLEVELAND , OH", 
"W 25TH and DENISON , CLEVELAND , OH", "W MALL DR and PUBLIC SQ , CLEVELAND , OH", 
"E 185TH and ST CLAIR , CLEVELAND , OH", "FARRINGTON and E 127TH , CLEVELAND , OH", 
"APPLE AVE and W 41ST PL , CLEVELAND , OH", "FAIRHILL RD and CEDAR RD , CLEVELAND , OH", 
"ST CLAIR and E 178TH , CLEVELAND , OH", "E 93RD and LAMONTIER , CLEVELAND , OH", 
"FAIRHILL and AMBLESIDE , CLEVELAND , OH", "E 123RD and GRIFFIN AVE , CLEVELAND , OH", 
"DETROIT RD and W 102ND , CLEVELAND , OH", "CEDAR HILL and MURRAY HILL , CLEVELAND , OH", 
"MARTIN LUTHER KING BLVD and MT AUBURN , CLEVELAND , OH", "E 109TH and ST CLAIR , CLEVELAND , OH", 
"W 105TH and S FRONTAGE , CLEVELAND , OH", "W WOODLAND AVE and KEMPER , CLEVELAND , OH", 
"LAKEWOOD HTS BLVD and ALGERS , CLEVELAND , OH", "E 56TH and BROADWAY AVE , CLEVELAND , OH", 
"MARTIN LUTHER KING BLVD and CORLETT AVE , CLEVELAND , OH", "OVINGTON and UNION , CLEVELAND , OH", 
"MADISON AVE and W 86TH ST , CLEVELAND , OH", "QUIGLEY and STEELYARD DR , CLEVELAND , OH", 
"DILLE RD and ST CLAIR AVE , CLEVELAND , OH", "QUINCY and E 38TH , CLEVELAND , OH", 
"MLK and BENHAM , CLEVELAND , OH", "CORONADO AVE and E 126TH , CLEVELAND , OH", 
"DETROIT and W 47TH , CLEVELAND , OH", "MT SINAI DR and MLK JR BLVD , CLEVELAND , OH", 
"LAKESIDE AVE and FRANZ PASTORINA , CLEVELAND , OH")), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • I faced a similar problem. The best solution I could come up with was to alter the "geocode" function, that you can find at github here

    I included two extra columns: column 'status': informs the number of matches per address. Therefore, you can easily spot where "not uniquely geocoded, using" happened. I also included column address2 to inform what is the second found address (in cases where status > 1).

    I did that by including the following parts marked as 'new'

      ## format geocoded data
      
      gcdf <- with(gc$results[[2]], {
        tibble(
          "lon" = NULLtoNA(geometry$location$lng),
          "lat" = NULLtoNA(geometry$location$lat),
          "type" = tolower(NULLtoNA(types[2])),
          "loctype" = tolower(NULLtoNA(geometry$location_type)),
          "address" = location, # dsk doesn't give the address
          "north" = NULLtoNA(geometry$viewport$northeast$lat),
          "south" = NULLtoNA(geometry$viewport$southwest$lat),
          "east" = NULLtoNA(geometry$viewport$northeast$lng),
          "west" = NULLtoNA(geometry$viewport$southwest$lng),
          'status' = NULLtoNA(length(gc$results)) # new!
        )
      })
      
      if (length(gc$results) > 1L) { # new!
       
       gcdf$address2 <- tolower(NULLtoNA(gc$results[[2]]$formatted_address))
        
      } else {
        
      gcdf$address2 <- "NA"
        
        
    
      }
    
    
      # add address
      if (source == "google") gcdf$address <- tolower(NULLtoNA(gc$results[[2]]$formatted_address))
      if (output == "latlon") return(gcdf[,c("lon","lat", "status", "address", "address2")]) # new!
    

    Finally, I just ran the new function in R and added the following code to modify a package version (see this question for further information).

    environment(geocode) <- asNamespace('ggmap')
    assignInNamespace("geocode", geocode, ns = "ggmap")