Search code examples
regexmultilinestring

How to capture a multi-line regex between two tags based on a match condition?


I have a text composed of text fragments delimited by "[1]" tags. I would like to use regular expressions to select (and eventually delete) those delimited fragments that do not contain asterisks.

example

[1] "Q 1  Gender * modal2"
Gender          1   0.0165 0.00144 0.6990  0.555   
modal2          2   0.1588 0.01387 3.3636  0.010 **                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Q 1  Gender * interv"
Gender          1   0.0165 0.00144 0.6876  0.495
interv          4   0.0563 0.00492 0.5868  0.765             
[1] "Q 1  Acad_categ * Acad_field"....

In the text presented, the second fragment between the second and third tags [1] would be the one chosen.


Solution

  • Something like this…

    /\[1\][^*]+?(?:(?=\[1\])|$)/
    

    Plain English Explanation

    Match [1] followed by one or more characters that are not an asterisk and can include newlines, followed by (but not including in the match) either [1] or the end of the text being matched.

    Technical Explanation

    \[1\]
    

    Matches [1].

    [^*]+?
    

    Matches one or more characters that are anything other than an asterisk, in a non-greedy way (so the shortest match it can that still finds the next part at the end of it).

    (?:(?=\[1\])|$))/
    

    Without capturing anything - (?: ... ) - match either…

    (?=\[1\])
    

    The following characters are [1] but it doesn’t consume it on the match, just checks that it’s there (positive lookahead assertion).

    Or…

    $
    

    The end of the string that is being matched against.