I have a text composed of text fragments delimited by "[1]" tags. I would like to use regular expressions to select (and eventually delete) those delimited fragments that do not contain asterisks.
example
[1] "Q 1 Gender * modal2"
Gender 1 0.0165 0.00144 0.6990 0.555
modal2 2 0.1588 0.01387 3.3636 0.010 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "Q 1 Gender * interv"
Gender 1 0.0165 0.00144 0.6876 0.495
interv 4 0.0563 0.00492 0.5868 0.765
[1] "Q 1 Acad_categ * Acad_field"....
In the text presented, the second fragment between the second and third tags [1] would be the one chosen.
Something like this…
/\[1\][^*]+?(?:(?=\[1\])|$)/
Plain English Explanation
Match [1]
followed by one or more characters that are not an asterisk and can include newlines, followed by (but not including in the match) either [1]
or the end of the text being matched.
Technical Explanation
\[1\]
Matches [1]
.
[^*]+?
Matches one or more characters that are anything other than an asterisk, in a non-greedy way (so the shortest match it can that still finds the next part at the end of it).
(?:(?=\[1\])|$))/
Without capturing anything - (?: ... )
- match either…
(?=\[1\])
The following characters are [1]
but it doesn’t consume it on the match, just checks that it’s there (positive lookahead assertion).
Or…
$
The end of the string that is being matched against.