Search code examples
pythonlistooptwitter

Find all hashtags


I have a class Tweet that contains several tweets. Then there's a list that contains all the tweets. These tweets also have users, amount of retweets and age, which is not relevant to my question. Only content matters.

  • tweet1 = Tweet("@realDonaldTrump", "Despite the negative press covfefe #bigsmart", 1249, 54303)

  • tweet2 = Tweet("@elonmusk", "Technically, alcohol is a solution #bigsmart", 366.4, 166500)

  • tweet3 = Tweet("@CIA", "We can neither confirm nor deny that this is our first tweet. #heart", 2192, 284200)

  • tweets = [tweet1, tweet2, tweet3]

I need to get a list of all the hashtags, but I only get the one from the 1st tweet with my code.

for x in tweets:
    return re.findall(r'#\w+', x.content)

Solution

  • You are returning after the first iteration of the loop. You need to go through all tweets and add the hastags to a list:

    def get_hashtags(tweets):
        result = []
        for x in tweets:
            result.extend(re.findall(r'#\w+', x.content))
        return result
    

    For sorting, you can use a defaultdict to add up the reweets. Then, sort by the count.

    from collections import defaultdict
    
    def get_hashtags_sorted(tweets):
        result = defaultdict(int)
        for x in tweets:
            for hashtag in re.findall(r'#\w+', x.content):
                result[hashtag] += x.retweets
        sorted_hashtags = sorted(tweets.items(), key=lambda x: x[1])
        return list(sorted_hashtags)