Search code examples
javascriptregexsplitmatchmultiline

Within a multiline string how does one match sequences of new lines where each line starts with the pipe character


Hey I have a string that is kind of like this:

String:

foo | bar
|foo | bar
|foo bar
|bar foo foo
foo bar
|bar foo
|foo bar
bar

And I need a regular expression that matches everything from the first "|" at a line start to the first line that doesn't start with a "|" (as one match). So the match in the shown example would be like this:

Correct Match:

[
|foo | bar
|foo bar
|bar foo foo
,
|bar foo
|foo bar
]

I came up with this /\|([\s\S]+)(\n\|)/g but that matches the following and that incorrect:

Incorrect Match:

[
| bar
|foo | bar
|foo bar
|bar foo foo
foo bar
|bar foo
|
]

I hope you can understand what I need and thank you for your time!


Solution

  • Answer

    In accordance with your question, to match everything from the first "|" at a line start to the first line that doesn't start with a "|", you can use this regex:

    /^\|([\s\S]*?)(?:(?!^[^\|])[\s\S])*/gm
    

    Its important to note that the m flag (multiline) is required in order to match multiline strings.

    Given you example string of

    foo | bar
    |foo | bar
    |foo bar
    |bar foo foo
    foo bar
    |bar foo
    |foo bar
    bar
    

    this will match

    |foo | bar
    |foo bar
    |bar foo foo
    

    and

    |bar foo
    |foo bar
    

    How it works:

    The ^\| matches any line that starts with a "|" character. Then, the ([\s\S]*?) matches 0 or more of any character including new lines "Lazily" (meaning only capturing the minimum). Finally the (?:(?!^[^\|])[\s\S])* matches up until a new line (^) that does not begin with a "|" ([^\|]).

    Here is a link to Regexr that shows more about how it works and an example of it in action.

    const str =
    `foo | bar
    |foo | bar
    |foo bar
    |bar foo foo
    foo bar
    |bar foo
    |foo bar
    bar`
    
    console.log(str.match(/^\|([\s\S]*?)(?:(?!^[^\|])[\s\S])*/gm))