Search code examples
rsearchtwitter

twitteR search geocode argument in R


I want to run a simple search using twitteR but only return tweets located in the U.S. I know twitteR has a geocode argument for lat/long and miles within that lat/long, but this way of locating tweets for an entire country seems hard.

What would I input into the argument to only get US tweets?

Thanks,


Solution

  • I did a brief search around and it looks like twitteR does not have a built-in country argument. But since you have lat/long, it's very straightforward to do a spatial join to a US country shapefile (i.e. point in polygon).

    In this example, I'm using the shapefile from Census.gov and the package spatialEco for its point.in.polygon() function. It's a very fast spatial-join function compared to what other packages offer, even if you have hundreds of thousands of coordinates and dozens of polygons. If you have millions of tweets -- or if you decide later on to join to multiple polygons, e.g. all world countries -- then it could be a lot slower. But for most purposes, it's very fast.

    (Also, I don't have a Twitter API set up, so I'm going to use an example data frame with tweet_ids and lat/long.)

    library(maptools) # to 
    library(spatialEco)
    
    # First, use setwd() to set working directory to the folder called cb_2015_us_nation_20m
    us <- readShapePoly(fn = "cb_2015_us_nation_20m")
    # Alternatively, you can use file.choose() and choose the .shp file like so:
    us <- readShapePoly(file.choose())
    
    # Create data frame with sample tweets
    # Btw, tweet_id 1 is St. Louis, 2 is Toronto, 3 is ouston
    tweets <- data.frame(tweet_id = c(1, 2, 3), 
                     latitude = c(38.610543, 43.653226, 29.760427),
                     longitude = c(-90.337189, -79.383184, -95.369803))
    
    # Use point.in.poly to keep only tweets that are in the US
    coordinates(tweets) <- ~longitude+latitude
    tweets_in_us <- point.in.poly(tweets, us)
    tweets_in_us <- as.data.frame(tweets_in_us)
    

    Now, if you look at tweets_in_us you should see only the tweets whose lat/long fall within the area of the US.