Search code examples
pythonresttwitter

Collecting Unique tweets REST API


Background: I want to get only unique tweets. According to comments on stackoverflow, one way to do this is to create a set

However, when I try the following code, I get an TypeError: Unhashable. I found some info here TypeError : Unhashable type. I also know I can remove duplicates in MongoDB, where I am storing, but it's cleaner if I do it before storing.

Question: Is there a way I can only collect unique tweets?

results = []
pages = 2 
counts = 100

while True:        
    for tweet in tweepy.Cursor(api.search, q = keywords, since="2017-07-21", until="2017-07-27", count = counts, lang = language,monitor_rate_limit=True, wait_on_rate_limit=True).pages(pages):
        results.extend(tweet)


    results = set(results)

Solution

  • It is difficult to say for sure without a concrete example

    { ~ }  » python                                                                                                                            
    >>> results = ["hi", "hello", "hi", "goodbye"]
    >>> a = set()
    >>> for tweet in results:
    ...     a.add(tweet)
    ...
    >>> print a
    set(['hi', 'hello', 'goodbye'])
    >>>
    

    as you can see above the set has only 1 "hi", you shouldn't try to hash the entire list as a whole.

    Ok, as per your comments I did a littler reverse engineering, I determined that the tweets have a text field that you need to add to the set,

    so just replace a.add(tweet) with a.add(tweet.text)