I wanted to get the Census Codes for a city from the addresses in my data. The problem is that I couldn't find workable shapefiles for Gainesville, FL. So I was trying to get the Census Codes using the addresses of the people who took the survey, and once I had the Census Code, I would substr
to first 11 digits to match the GEOIDs of the tidycensus
package, which gives county census and shapefiles to lowest of the hierarchy.
Since I would have GEOID of the people of the city only, I would get those shapefiles and not the whole county. So I did the following just to get the Census codes:
library(tigris)
library(readr)
gainsville_df2 <- readr::read_csv("311_Service_Requests__myGNV_.csv")
#gainsville_df2 is the dataframe of the csv file
jio<- apply(gainsville_df2["Address"], 1, function(row) tigris::call_geolocator(row, "Gainesville", "FL", zip = NA))
#It ran for ~1.5 hours, parsing through 1892 addresses, then I got this error out of nowhere:
#Error in tigris::call_geolocator(row, "Gainesville", "FL", zip = NA) :
# Internal Server Error (HTTP 500).
#Called from: httr::stop_for_status(r)
A link to the data is here. I have ~9200 addresses to parse, and this is happening in ~1800. Looked around with the error, I am seeing some timeout setting is needed, unfortunately, I have no clue how to do that.
I need the shapefiles to do crucial part of my personal project.
All of the punctuation were had to be removed from the gainsville_df2$Address
vector. call_geolocator
function does not or if does, then works irregularly over the stings made of punctuation, and would often throw HTTP 500 error on addresses with punctuation like # or { } and so on. So it is a better practice to use the as.character(stringr::str_replace_all(gainsville_df2$Address, "[[:punct:]]", " "))
function to remove all of the punctuation. And don't worry, the geo-locator function still gives the correct Census codes even if there is no punctuation. It only looks for street names, numbers, block, city and state.