The | symbol in regular expressions seems to divide the entire pattern, but I need to divide a smaller pattern... I want it to find a match that starts with either "Q: " or "A: ", and then ends before the next either "Q: " or "A: ". In between can be anything including newlines.
My attempt:
string = "Q: This is a question. \nQ: This is a 2nd question \non two lines. \n\nA: This is an answer. \nA: This is a 2nd answer \non two lines.\nQ: Here's another question. \nA: And another answer."
pattern = re.compile("(A: |Q: )[\w\W]*(A: |Q: |$)")
matches = pattern.finditer(string)
for match in matches:
print('-', match.group(0))
The regex I am using is (A: |Q: )[\w\W]*(A: |Q: |$)
.
Here is the same string over multiple lines, just for reference:
Q: This is a question.
Q: This is a 2nd question
on two lines.
A: This is an answer.
A: This is a 2nd answer
on two lines.
Q: Here's another question.
A: And another answer.
So I was hoping the parenthesis would isolate the two possible patterns at the start and the three at the end, but instead it treats it like 4 separate patterns. Also it would include at the end the next A: or Q:, but hopefully you can see what I was going for. I was planning to just not use that group or something.
If it's helpful, this is for a simple study program that grabs the questions and answers from a text file to quiz the user. I was able to make it with the questions and answers being only one line each, but I'm having trouble getting an "A: " or "Q: " that has multiple lines.
One approach could be to use a negative lookahead ?!
to match a newline followed by an A: | Q:
block, as follows:
^([AQ]):(?:.|\n(?![AQ]:))+
You can also try it out here on the Regex Demo.
Here's another approach suggested by @Wiktor that should be a little faster:
^[AQ]:.*(?:\n+(?![AQ]:).+)*
A slight modification where we match .*
instead of like \n+
(but note that this also captures blank lines at the end):
^[AQ]:.*(?:\n(?![AQ]:).*)*