Search code examples
phpregexpcre

match text followed by multiple line breaks with spaces


I would like to match a text (numbers, strings, special chars, spaces, one line break ...) followed by at least two line breaks(every line starts with a space then a line break). At the moment I am only able to match the multiple line breaks, but I want to match the text before.. I am using this regular expression: \n+\s*\n+ this is my input:

        Test Test TestTester TestTestt                              Test Test TestTestTestTest: 29724 @erq
        Test Test we                                Test Test, iuow, 0202220
        Test Test  962ert64






                             Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest 
                                      Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest 
                                      Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest 
Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest Test Test TestTestTestTest 

the output should be :

Test Test TestTester TestTestt                              Test Test TestTestTestTest: 29724 @erq
        Test Test we                                Test Test, iuow, 0202220
        Test Test  962ert64

Solution

  • This one should help:

    $re = '/(.+\n)\n\s*\n/sU';
    preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE, 0);
    

    The flags s and U are really important here!

    s means that . will match newlines, and U will make the quantifiers ungreedy (lazy).

    And here is a working example: https://regex101.com/r/G0KS3g/1

    UPD: If you can't use flags, try this one:

    ([\S\s]*?)\n\s*\n

    Here we have a lazy quantifier *?, and [\S\s] matches any character except a newline . OR a newline \n.

    However, the regex dialect of your software might bring more limitations.