Search code examples
pythonregexreddit

Removing links from a reddit comments using python and regex


I want to remove links in the format Reddit uses

comment = "Hello this is my [website](https://www.google.com)"

no_links = RemoveLinks(comment)

# no_links == "Hello this is my website"

I found a similar question about the same thing, but I don't know how to translate it to python.

I am not that familiar with regex so I would appreciate it if you explained what's happening.


Solution

  • You could do the following:

    import re
    
    pattern = re.compile('\[(.*?)\]\(.*?\)')
    comment = "Hello this is my [website](https://www.google.com)"
    
    print(pattern.sub(r'\1', comment))
    

    The line:

    pattern = re.compile('\[(.*?)\]\(.*?\)')
    

    creates a regex pattern that will search for anything surrounded by square brackets, followed by anything surrounded by parenthesis, the '?' indicates that they should match as little text as possible (non-greedy).

    The function sub(r'\1', comment) replaces a match by the first capturing group in this case the text inside the brackets.

    For more information about regex I suggest you read this.