Can any of you please help me to write a regex pattern for the below requirement?
Test String:
<sectionb>2.3. Optimized test sentence<op>(</op>1,1<cp>)</cp></sectionb>
*<sectiona>2 Surface Model: ONGV<op>(</op>1,1<cp>)</cp></sectiona>*
<sectiona>3. Verification of MKJU<op>(</op>1,1<cp>)</cp> Entity</sectiona>
*<sectionc>3. 2. 1 <txt>Case 1</txt> Annual charges to SGX</sectionc>*
*<sectiona>Compound Interest<role>back</role></sectiona>*
Pattern:
<section[a-z]>[\d]*[^\.]*<\/section[a-z]
Regex Pattern Should Match the below string:
<sectiona>2 Surface Model: ONGV<op>(</op>1,1<cp>)</cp></sectiona>
<sectionc>3. 2 1 <txt>Case 1</txt> Annual charges to SGX</sectionc>
<sectiona>Compound Interest<role>back</role></sectiona>
This matches the updated requirements:
<section\w+>(((\d+\.\s*)*(\d+[^\.]))|[^\d]).*?<\/section\w>
<section\w+>
\w
is mostly the same as [a-z]
with +
to allow for 0 or more (<section>
<sectionabc>
), remove +
for exactly one letter
(\d+\.\s*)*
0 or more digit/dot/any number of spaces - match updated row 3 where it's now 3. 2. 1
with spaces after dots
(\d+[^\.])
must match digit without a dot, one or more digits
((...)|[^\d])
or section does not start with a digit (match row 5)
.*?
followed by any character, as few as times as possible upto the following </section
- could likely do this with a look ahead to simplify the regex, but, for me, this keeps the separate "no digits" clause separate.