Please, can you help me with the following doubt?
I have a text:
text = VLOG - Primer evento de la temporada #ModaenTiktok #eventotiktok #madrid @Tik Tok españa
And I need to extract only the mention:
@Tik Tok españa
What is the most affordable approach to do it?
I tried with:
regex = re.compile("(^|\W)(?:@)([A-Za-z0-9_](?:(?:[A-Za-z0-9_]|(?:\.(?!\.))){0,28}(?:[A-Za-z0-9_]))?)", re.UNICODE)
mention = regex.findall(text)
But I got only Tik
.
Based on the comments to your question, it seems you are trying to collect @
tags, that may or may not have spaces. From the text, we can see that these tags start with @
, but never allow #
as this is a hashtag and not a tag. Therefore we can use these two rules to write a very simple regex solution as :
re.compile(r"\@[^\#\@\n]+?(?= *[\#\@\n]|$)")
More verbose, here we are collecting any text starting with @
, and collecting anything (excluding other @'s, #'s, or hard returns (just good practice)), and ceasing when the next character is either another @
tag, a #
, or the end of the string\line. I added a *
to the beginning of the lookahead as this will allow the regex to ignore any trailing spaces as well, but this is optional.
You can see it work here: https://regex101.com/r/2T07b0/1