Search code examples
pythonpython-requeststwitter-streaming-apitwitter-rest-api

Extracting Old Tweets using Twitter streaming API using Geo-Location Filter


My aim is to extract old tweets for the entire month of Jan 2017 for New York City ('locations':'-74,40,-73,41') using python. I am able to get the live streaming tweets using the following code:

    import json
    import pandas as pd
    import numpy as np
    from TwitterAPI import TwitterAPI

    #Set up the variables for the 'application'
    consumerkey = 'cfKguErYawo2WB7cfNtAT2lKl'
    consumersecret = 'my_consumer_secret'
    access_token_key = '2195434704-Wov69oF2iIBRgUjWJhD0KThqcLApYCJXqtbYI4K'
    access_token_secret = 'my_access_token_secret'

    #Setup the API key
    api = TwitterAPI(consumerkey,consumersecret,access_token_key,access_token_secret)

    # Breaking after extracting 10 live tweets from New York City

    r = api.request('statuses/filter', {'locations':'-74,40,-73,41'})
    for row,item in enumerate(r):
        print(row, item['text'])
        if row >= 10:
            break

But this is not what I am looking for. Can someone suggest how to extract the old tweets for this location filter using Twitter streaming API or any other package in python? Thanks!


Solution

  • You can accomplish part of what you are asking using Twitter's REST API. Below is an example that uses the TwitterAPI package that you used to stream with. However, when you are searching for old tweets there are some restrictions. You can only get about a week's worth of old tweets. Also, you must supply a search string (with the q parameter) regardless whether or not you supply a location. You will only see results that match both the string and the location. When you stream, you can supply a filter string or a location or both. In this case, the results can match either the string or the location but not necessarily both.

    This code will download tweets until you reach the roughly one week limit. It does this by making successive requests which are timed so as not to exceed Twitter's rate limit. You might also find the TwitterGeoPics package useful.

    from TwitterAPI import TwitterAPI, TwitterRestPager
    
    SEARCH_TERM = 'pizza'
    GEOCODE = '40,74,10km'
    
    CONSUMER_KEY = ''
    CONSUMER_SECRET = ''
    ACCESS_TOKEN_KEY = ''
    ACCESS_TOKEN_SECRET = ''
    
    api = TwitterAPI(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
    
    pager = TwitterRestPager(api, 'search/tweets', {'q': SEARCH_TERM, 'geocode':GEOCODE})
    
    for item in pager.get_iterator():
        print(item['text'] if 'text' in item else item)