Search code examples
regexrestructuredtext

Regex : Catch line not followed by '---'


Here is a small text which looks like restructuredText.

This is a sentence to catch.

Title that should not be caught
-------------------------------

Another sentence to catch.

I want a regex that catches the two lines that are not headers and leave the sentence which is a title.

  1. Test #1: How to leave the --- line. I've done /^(?!(---))[^\n]+/gm. It leaves the line under the header. It gives me:
This is a sentence to catch.
Title that should not be caught
Another sentence to catch.
  1. Test #2: How to leave also the sentence above the line header (Title that should not be caught) ? Ive tried /^(?!(---))[^\n]+(?!\n---)/gm and that gave me:
This is a sentence to catch.
Title that should not be caught
Another sentence to catch.

The problem is that it did not catch the letter before \n--- whereas what I want is not catching the whole sentence before. What I want is:

This is a sentence to catch.
Another sentence to catch.

How should I do ?

EDIT:

Thanks tân for your response which works well (I'm not sure to understand everything, but I'll meditate on it ...).

If you agree, let's extend the problem with an additional complexity. New toy example:

This is another title not to catch, Ha !
========================================

This is a sentence to catch.

Title that should not be caught
-------------------------------

Another sentence to catch.

As you can see, I added another type of heading with an ===line. With the Tân's regex, I get:

=======
This is a sentence to catch.
Another sentence to catch.
  1. Test 1bis: I've tested .+(?![\w\s\n-=]+).+ but nothing is caught :(

Just for info, I'm implementing something with Parsimonious on python.


Solution

  • If you want to match the single lines from the example data, one option could be to make sure that the first line you match does not start with --- or ===.

    After matching the first line, assert the end of the string $ and use another negative lookahead asserting the line after that one also does not start with either --- or ===

    ^(?!(?:---|===)).+$(?!\r?\n(?:---|===))
    
    • ^ Start of string
    • (?! Negative lookahead, assert what is directly on the right is not
      • (?:---|===) Match either --- or ===
    • ) Close lookahead
    • .+$ Match 1+ times any char except a newline and assert end of string
    • (?!\r?\n(?:---|===)) Another lookahead as the first with a newline prepended

    Regex demo | Python demo