Background: I want to get only unique tweets. According to comments on stackoverflow, one way to do this is to create a set
However, when I try the following code, I get an TypeError: Unhashable. I found some info here TypeError : Unhashable type. I also know I can remove duplicates in MongoDB, where I am storing, but it's cleaner if I do it before storing.
Question: Is there a way I can only collect unique tweets?
results = []
pages = 2
counts = 100
while True:
for tweet in tweepy.Cursor(api.search, q = keywords, since="2017-07-21", until="2017-07-27", count = counts, lang = language,monitor_rate_limit=True, wait_on_rate_limit=True).pages(pages):
results.extend(tweet)
results = set(results)
It is difficult to say for sure without a concrete example
{ ~ } » python
>>> results = ["hi", "hello", "hi", "goodbye"]
>>> a = set()
>>> for tweet in results:
... a.add(tweet)
...
>>> print a
set(['hi', 'hello', 'goodbye'])
>>>
as you can see above the set has only 1 "hi", you shouldn't try to hash the entire list as a whole.
Ok, as per your comments I did a littler reverse engineering, I determined that the tweets have a text field that you need to add to the set,
so just replace a.add(tweet)
with a.add(tweet.text)