I am fairly new to Regex and have been trying to just work on some examples.
For this example below I am given a test format and my objective is to match each question and answer segment, then for each segment get the question by itself and all the answers belonging to that question.
The following are my regexes that I have been using
/^\d*\..*[^].*\?(\n.*){2,5}/gm
/(^\d*\..*\w\?)/gm
/[a-zA-Z]\..*[^].*\n?/gm
1. this is a question?
A. This is an answer
B. This is an answer
2. this is a question?
A. This is an answer
B. This is an answer
3. this is a question
multiline?
A. This is an multiline
answer
B. This is an answer
The output I am trying to achieve at the end of the day is something like
[
{
"question": "1. this is a question?",
"answers": ["this is an answer", ...]
},
{
"question": "2. this is a question?",
"answers": ["this is an answer", ...]
},
{
"question": "2. this is a multiline question?",
"answers": ["this is a multiline answer", ...]
}
]
Currently I am using Regex101.com to work on the example.
Below is my screen shot of the matches for the Answer Regex pattern.
I am matching multiple lines when I only want to have one answer per match.
I am matching too many answers per match when using the Answer Regex pattern and would like to know how to get all of the answers in a Question&Answer section but with one answer per match.
Can I please get some help with this? Thanks!
Also if there is a better way to do the task with please let me know. Any feedback on the most proper way to parse this test format would be appreciated.
Match questions using
^\d+\.\s.*(?:\n(?!\s*[A-Z]).*)*
See this regex demo.
Answers can be matched with
^[A-Z]\.\s.*(?:\n(?!\s*(?:\d+|[A-Z])\.\s).*)*
See this regex demo.
Matching the whole paragraph can be done with
^\d+\..*(?:\n.+)*
See this regex demo.
I think this regex is too clumsy and is unnecessarily long and complex here.
Here is one regex explanation, they are quite similar:
^
- start of a line (make sure m
flag is used)\d+
- one or more digits\.
- a dot\s
- a whitespace.*
- the rest of a line(?:\n(?!\s*[A-Z]).*)*
- zero or more occurrences of
\n(?!\s*[A-Z])
- an LF char not followed with zero or more whitespaces and then an ASCII uppercase letter.*
- the rest of the line.