Search code examples
pythonsearchtextpython-refindall

How can I search a phrase mention in string text in Python?


Please, can you help me with the following doubt?

I have a text:

text = VLOG - Primer evento de la temporada #ModaenTiktok #eventotiktok #madrid @Tik Tok españa

And I need to extract only the mention:

@Tik Tok españa

What is the most affordable approach to do it?

I tried with:

regex = re.compile("(^|\W)(?:@)([A-Za-z0-9_](?:(?:[A-Za-z0-9_]|(?:\.(?!\.))){0,28}(?:[A-Za-z0-9_]))?)", re.UNICODE)

mention = regex.findall(text)

But I got only Tik.


Solution

  • Based on the comments to your question, it seems you are trying to collect @ tags, that may or may not have spaces. From the text, we can see that these tags start with @, but never allow # as this is a hashtag and not a tag. Therefore we can use these two rules to write a very simple regex solution as :

    re.compile(r"\@[^\#\@\n]+?(?= *[\#\@\n]|$)")

    More verbose, here we are collecting any text starting with @, and collecting anything (excluding other @'s, #'s, or hard returns (just good practice)), and ceasing when the next character is either another @ tag, a #, or the end of the string\line. I added a * to the beginning of the lookahead as this will allow the regex to ignore any trailing spaces as well, but this is optional.

    You can see it work here: https://regex101.com/r/2T07b0/1