Search code examples
pythonregextwitter

Regular expression to search for specific twitter username


I have a project where I'm trying to analyze a database of tweets. I need to write a python regex expression that pulls tweets mentioning specific twitter users. Here is an example tweet I'd like to capture.

"That @A_Person is a real jerk."

The regex that I've been trying is

([^.?!]*)(\b([@]A_Person)\b)([^.?!]*)

But it's not working and I've tried lots of variations. Any advice would be appreciated!


Solution

  • \b matches a word boundary, but @ is not a word character, so if it occurs after a space, the match will fail. Try removing the word boundary there, and removing the extra groups, and add a character set at the end for [.?!] to include the final punctuation, and you get:

    [^.?!]*@A_Person\b.*?[^.?!]*[.?!]
    

    You also might consider including a check for the start of the string or the end of the last sentence, otherwise the engine will go through a lot of steps while going through areas without any matches. Perhaps use

    (?:^|(?<=[.?!])\s*)
    

    which will match the start of the string, or will lookbehind for [.?!] possibly followed by spaces. Put those together and you get

    (?:^|(?<=[.?!])\s*)([^.?!]*@A_Person\b.*?[^.?!]*[.?!])
    

    where the string you want is in the first group (no leading spaces).

    https://regex101.com/r/447KsF/3