I want to run a simple search using twitteR but only return tweets located in the U.S. I know twitteR has a geocode argument for lat/long and miles within that lat/long, but this way of locating tweets for an entire country seems hard.
What would I input into the argument to only get US tweets?
Thanks,
I did a brief search around and it looks like twitteR does not have a built-in country argument. But since you have lat/long, it's very straightforward to do a spatial join to a US country shapefile (i.e. point in polygon).
In this example, I'm using the shapefile from Census.gov and the package spatialEco for its point.in.polygon()
function. It's a very fast spatial-join function compared to what other packages offer, even if you have hundreds of thousands of coordinates and dozens of polygons. If you have millions of tweets -- or if you decide later on to join to multiple polygons, e.g. all world countries -- then it could be a lot slower. But for most purposes, it's very fast.
(Also, I don't have a Twitter API set up, so I'm going to use an example data frame with tweet_ids and lat/long.)
library(maptools) # to
library(spatialEco)
# First, use setwd() to set working directory to the folder called cb_2015_us_nation_20m
us <- readShapePoly(fn = "cb_2015_us_nation_20m")
# Alternatively, you can use file.choose() and choose the .shp file like so:
us <- readShapePoly(file.choose())
# Create data frame with sample tweets
# Btw, tweet_id 1 is St. Louis, 2 is Toronto, 3 is ouston
tweets <- data.frame(tweet_id = c(1, 2, 3),
latitude = c(38.610543, 43.653226, 29.760427),
longitude = c(-90.337189, -79.383184, -95.369803))
# Use point.in.poly to keep only tweets that are in the US
coordinates(tweets) <- ~longitude+latitude
tweets_in_us <- point.in.poly(tweets, us)
tweets_in_us <- as.data.frame(tweets_in_us)
Now, if you look at tweets_in_us
you should see only the tweets whose lat/long fall within the area of the US.