Tweepy: Stream data for X minutes?

I'm using tweepy to datamine the public stream of tweets for keywords. This is pretty straightforward and has been described in multiple places:

http://runnable.com/Us9rrMiTWf9bAAW3/how-to-stream-data-from-twitter-with-tweepy-for-python

http://adilmoujahid.com/posts/2014/07/twitter-analytics/

Copying code directly from the second link:

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
access_token = "ENTER YOUR ACCESS TOKEN"
access_token_secret = "ENTER YOUR ACCESS TOKEN SECRET"
consumer_key = "ENTER YOUR API KEY"
consumer_secret = "ENTER YOUR API SECRET"


#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
    stream.filter(track=['python', 'javascript', 'ruby'])

What I can't figure out is how can I stream this data into a python variable? Instead of printing it to the screen... I'm working in an ipython notebook and want to capture the stream in some variable, foo after streaming for a minute or so. Furthermore, how do I get the stream to timeout? It runs indefinitely in this manner.

Using tweepy to access Twitter's Streaming API

Solution

Yes, in the post, @Adil Moujahid mentions that his code ran for 3 days. I adapted the same code and for initial testing, did the following tweaks:

a) Added a location filter to get limited tweets instead of universal tweets containing the keyword. See How to add a location filter to tweepy module. From here, you can create an intermediate variable in the above code as follows:

stream_all = Stream(auth, l)

Suppose we, select San Francisco area, we can add:

stream_SFO = stream_all.filter(locations=[-122.75,36.8,-121.75,37.8])

It is assumed that the time to filter for location is lesser than filter for the keywords.

(b) Then you can filter for the keywords:

tweet_iter = stream_SFO.filter(track=['python', 'javascript', 'ruby'])

with open('file_name.json', 'w') as f:
        json.dump(tweet_iter,f,indent=1)

This should take much lesser time. I co-incidently wanted to address the same question that you have posted today. Hence, I don't have the execution time.

Hope this helps.

Tweepy: Stream data for X minutes?

Related: