I have a project where I'm trying to analyze a database of tweets. I need to write a python regex expression that pulls tweets mentioning specific twitter users. Here is an example tweet I'd like to capture.
"That @A_Person is a real jerk."
The regex that I've been trying is
([^.?!]*)(\b([@]A_Person)\b)([^.?!]*)
But it's not working and I've tried lots of variations. Any advice would be appreciated!
\b
matches a word boundary, but @
is not a word character, so if it occurs after a space, the match will fail. Try removing the word boundary there, and removing the extra groups, and add a character set at the end for [.?!]
to include the final punctuation, and you get:
[^.?!]*@A_Person\b.*?[^.?!]*[.?!]
You also might consider including a check for the start of the string or the end of the last sentence, otherwise the engine will go through a lot of steps while going through areas without any matches. Perhaps use
(?:^|(?<=[.?!])\s*)
which will match the start of the string, or will lookbehind for [.?!]
possibly followed by spaces. Put those together and you get
(?:^|(?<=[.?!])\s*)([^.?!]*@A_Person\b.*?[^.?!]*[.?!])
where the string you want is in the first group (no leading spaces).