I need to parse a string quote by quote text and @ author and # category delimiters. Author and category come in order, but are optional. Like this:
"When in doubt - don’t. @Ben Franklin #decisions"
{'text': 'When in doubt - don’t.', 'author': 'Ben Franklin', 'category': 'decisions'}
"When in doubt - don’t. #decisions"
{'text': 'When in doubt - don’t.', 'category': 'decisions'}
"When in doubt - don’t. @Ben Franklin"
{'text': 'When in doubt - don’t.', 'author': 'Ben Franklin'}
It's okay if delimiters and whitespaces stick to captured groups, I can strip them later. My current regex:
^(.*?)(@.*)(#.*)$
only does the job if both @author and #category are present in the input string. Trying to make latter groups optional messes things up for me:
^(.*?)(@.*)?(#.*)?$
How do I properly capture them?
Assuming the @
and #
only appear at the end of string in front of the author or category, you can use
^([^@#]*)(@[^#]*)?(#.*)?$
See the regex demo, the pattern matches
^
- start of string([^@#]*)
- Group 1: any zero or more chars other than #
and @
(@[^#]*)?
- Group 2 (optional): @
and then zero or more chars other than #
(#.*)?
- Group 3 (optional): #
and then zero or more chars other than #
$
- end of string.