Search code examples
pythontwittertweepy

How do I filter out tweets containing any URL?


I am using tweepy to get tweets pertaining to a certain hashtag(s) and then I send them to a certain black box for some processing. However, tweets containing any URL should not be sent. What would be the most appropriate way of removing any such tweets?


Solution

  • To go with @Colin's suggestion, this question covers the issue of finding urls with regex.

    An example code snippet would be;

    import re
    
    // tweet_list is a list containing string you with to clean of urls
    pattern = 'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
    filtered_tweet_list = [tweet for tweet in tweet_list if not re.findall(pattern, tweet)]