I am trying to match consecutive lines that starts with an arbitrary amount of space followed by the character |
. I am using the s
flag, so that .
matches newlines.
What I have so far works with a finite amount of whitespace before |
.
I am having issues with the part that determines that a line is reached that does not meet the requirements. For some reason \n\s*[^\|]
does not do the trick. What I am doing right now is the following:
(?P<terminating>
\n( # when newline is encountered...
[^\|\s] # check if next character is not: (| or space)
|
[\s][^\|\s] # check if next characters are not: space + (| or space)
|
[\s][\s][^\|\s] # check if next characters are not: space + space + (| or space)... And so on....
)
|
$
)
This obviously only works for two spaces. I would like to make this work for an arbitrary amount of spaces. I looked into recursion, but it seems like that is quite the heavy gun to wield in this case. Here now is my question: Why does \n\s*[^\|]
not work, and is there another way of solving this without recursion?
Below is an example of input and the resulting match I would like to get:
Input string:
Lorem ipsum dolor sit amet,
consectetur adipisicing
elit,
|sed do
|eiusmod tempor incididunt
|ut labore et dolore magna aliqua.
Ut enim ad minim veniam,
quis nostrud exercitation
ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Output is one string with content:
|sed do\n |eiusmod tempor incididunt\n |ut labore et dolore magna aliqua.
I don't want three matches with each of the lines that have |
in it.
I solved it myself. I guess I have to exclude the space from the character group I am excluding:
n\s*[^\|\s]
Not quite sure why this is though, I stumbled upon this by sheer accident. I would be grateful if someone could explain the reasoning behind this.
The full expression now is as follows:
'/
(?:
(^|\n)\s*\|
)
(?P<main>
.*?
)
(?=
\n\s*[^\|\s]
|
$
)
/sx'