Search code examples
rlistlatitude-longitude

How to extract the longitude and latitude information from a list in R


I need your help to extract the longitude and latitude information from a list. I have a bunch of specific addresses and I use this website to get the latitude and longitude for each address, https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress. Here is my code:

fetch_geocodes <- function(address) {
  # Specify the API endpoint
  base_url <- "https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress"
  
  # Specify the parameters to pass to the API
  params <- list(
    address = address,
    benchmark = "Public_AR_Current",  
    vintage = "Current_Current",
    format = "json"
  )
  
  # Send a GET request to the API
  response <- GET(url = base_url, query = params)
  
  # Check if the request was successful
  if (status_code(response) == 200) {
    # Parse the response to JSON
    data <- content(response, "parsed")
    
    # Print the entire JSON response
    print(data)
    
    # Extract the longitude and latitude
    longitude <- data$result$addressMatches$coordinates$x
    latitude <- data$result$addressMatches$coordinates$y
    
    return(c(longitude, latitude))
  } else {
    stop("Request failed with status ", status_code(response))
  }
}

addresses <- c("Riverside Dr, Apple Valley, CA, 92307",
               "11 Wall Street, New York, NY 10005")
geocodes <- lapply(addresses, fetch_geocodes)

Here is my partial output because the entire one is quite long:

$result
$result$input
$result$input$address
$result$input$address$address
[1] "Riverside Dr, Apple Valley, CA, 92307"


$result$input$vintage
$result$input$vintage$isDefault
[1] TRUE

$result$input$vintage$id
[1] "4"

$result$input$vintage$vintageName
[1] "Current_Current"

$result$input$vintage$vintageDescription
[1] "Current Vintage - Current Benchmark"


$result$input$benchmark
$result$input$benchmark$isDefault
[1] TRUE

$result$input$benchmark$benchmarkDescription
[1] "Public Address Ranges - Current Benchmark"

$result$input$benchmark$id
[1] "4"

$result$input$benchmark$benchmarkName
[1] "Public_AR_Current"



$result$addressMatches
list()

$result
$result$input
$result$input$address
$result$input$address$address
[1] "11 Wall Street, New York, NY 10005"

$result$addressMatches[[1]]$coordinates
$result$addressMatches[[1]]$coordinates$x
[1] -74.01073

$result$addressMatches[[1]]$coordinates$y
[1] 40.70714

For the first address, Riverside Dr, Apple Valley, CA, 92307, it does not extract longitude and latitude from the website, I need to assgin NA to the columns, 'longitude' and 'latitude'. For the second address, $result$addressMatches[[1]]$coordinates provides the longitude and latitude information. However, I don't know how to extract corresponding information from geocodes because it returns NULL.

print(geocodes)
[[1]]
NULL

[[2]]
NULL

I don't understand how to do with it. Really appreciate your help. My target is to get a data frame with three columns, the first one is full_address, the second one is longitude and the third one is latitude.


Solution

  • Up front: data$result$addressMatches is a list, each element may have coordinates, so you can likely do something like data$result$addressMatches[[1]]$coordinates$x.

    If you are guaranteed to always have just one x/y in a return, then you can do:

    unlist(data$result$addressMatches[[1]]$coordinates)
    #         x         y 
    # -74.01073  40.70714 
    

    However, if it's possible you can get two or more, then you'll need to return a list or data.frame, and you'll need a little more work:

    L <- lapply(data$result$addressMatches, function(z) {
      if ("coordinates" %in% names(z)) unlist(z$coordinates) else c(x=NA_real_,y=NA_real_)
    })
    list(x=sapply(L, `[[`, 1), y=sapply(L, `[[`, 2))
    # $x
    # [1] -74.01073
    # $y
    # [1] 40.70714
    

    Using the first assumption, then

    fetch_geocodes <- function(address) {
      # Specify the API endpoint
      base_url <- "https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress"
      
      # Specify the parameters to pass to the API
      params <- list(
        address = address,
        benchmark = "Public_AR_Current",  
        vintage = "Current_Current",
        format = "json"
      )
      
      # Send a GET request to the API
      response <- GET(url = base_url, query = params)
      
      # Check if the request was successful
      if (status_code(response) == 200) {
        # Parse the response to JSON
        data <- content(response, "parsed")
        
        ### Print the entire JSON response
        # print(data)
        
        # Extract the longitude and latitude
        if (length(data$result$addressMatches) > 0) {
          longitude <- data$result$addressMatches[[1]]$coordinates$x
          if (is.null(longitude)) longitude <- NA_real_
          latitude <- data$result$addressMatches[[1]]$coordinates$y
          if (is.null(latitude)) latitude <- NA_real_
        } else {
          longitude <- latitude <- NA_real_
        }
        
        return(c(longitude, latitude))
      } else {
        stop("Request failed with status ", status_code(response))
      }
    }
    lapply(addresses, fetch_geocodes)
    # [[1]]
    # [1] NA NA
    # [[2]]
    # [1] -74.01073  40.70714