Search code examples
pythonregexnormalizationdata-cleaning

Using regex to delete a username in Python


I have a code like that uses a regular expression to delete a "username" from a text

# remove mention, link, hashtag
text = ' '.join(re.sub("([@#][A-Za-z0-9]+)|(\w+:\/\/\S+)"," ", text).split())

However it does not work in all cases, for example, the below username does not seem to be deleted:

@username_user

In fact, it only works on first part of the username, before the underscore, leaving me with the below:

user

How can I adapt my code in order to work on the entire username provded in the example?


Solution

  • If what you need is just adapting your regex to match more patterns of usernames, like @username_user, then you can add the underscore in your first group, as below:

    text = ' '.join(re.sub("([@#][A-Za-z0-9_]+)|(\w+:\/\/\S+)"," ", text).split())
    

    The above will work with "@username_user", and you can adapt it to as many new character as you need, by adding them after the underscore in the character set (the square brackets; after the number 9) of the regex.