Search code examples
javascriptjqueryregexregex-group

Regex pattern to find all the digits which don't have the immediate dot character


Can any of you please help me to write a regex pattern for the below requirement?

  1. Section tags that don't have numbers
  2. All section tag numbers that don't have a dot character followed by.
  3. Numbers that are closer to the section tag only that to be considered.

Test String:

<sectionb>2.3. Optimized test sentence<op>(</op>1,1<cp>)</cp></sectionb>
*<sectiona>2 Surface Model: ONGV<op>(</op>1,1<cp>)</cp></sectiona>*
<sectiona>3. Verification of MKJU<op>(</op>1,1<cp>)</cp> Entity</sectiona>
*<sectionc>3. 2. 1 <txt>Case 1</txt> Annual charges to SGX</sectionc>*
*<sectiona>Compound Interest<role>back</role></sectiona>*

Pattern:

<section[a-z]>[\d]*[^\.]*<\/section[a-z]

Regex Pattern Should Match the below string:

<sectiona>2 Surface Model: ONGV<op>(</op>1,1<cp>)</cp></sectiona>
<sectionc>3. 2 1 <txt>Case 1</txt> Annual charges to SGX</sectionc>
<sectiona>Compound Interest<role>back</role></sectiona>

Solution

  • This matches the updated requirements:

    <section\w+>(((\d+\.\s*)*(\d+[^\.]))|[^\d]).*?<\/section\w>
    

    <section\w+> \w is mostly the same as [a-z] with + to allow for 0 or more (<section> <sectionabc>), remove + for exactly one letter

    (\d+\.\s*)* 0 or more digit/dot/any number of spaces - match updated row 3 where it's now 3. 2. 1 with spaces after dots

    (\d+[^\.]) must match digit without a dot, one or more digits

    ((...)|[^\d]) or section does not start with a digit (match row 5)

    .*? followed by any character, as few as times as possible upto the following </section - could likely do this with a look ahead to simplify the regex, but, for me, this keeps the separate "no digits" clause separate.

    regex101