Search code examples
pythonregexpython-re

Regex capture optional groups by delimiters


I need to parse a string quote by quote text and @ author and # category delimiters. Author and category come in order, but are optional. Like this:

"When in doubt - don’t. @Ben Franklin #decisions"

{'text': 'When in doubt - don’t.', 'author': 'Ben Franklin', 'category': 'decisions'}

"When in doubt - don’t. #decisions"

{'text': 'When in doubt - don’t.', 'category': 'decisions'}

"When in doubt - don’t. @Ben Franklin"

{'text': 'When in doubt - don’t.', 'author': 'Ben Franklin'}

It's okay if delimiters and whitespaces stick to captured groups, I can strip them later. My current regex:

^(.*?)(@.*)(#.*)$

only does the job if both @author and #category are present in the input string. Trying to make latter groups optional messes things up for me:

^(.*?)(@.*)?(#.*)?$

How do I properly capture them?


Solution

  • Assuming the @ and # only appear at the end of string in front of the author or category, you can use

    ^([^@#]*)(@[^#]*)?(#.*)?$
    

    See the regex demo, the pattern matches

    • ^ - start of string
    • ([^@#]*) - Group 1: any zero or more chars other than # and @
    • (@[^#]*)? - Group 2 (optional): @ and then zero or more chars other than #
    • (#.*)? - Group 3 (optional): # and then zero or more chars other than #
    • $ - end of string.