Search code examples
regexvisual-studio-codefind-replace

Match multiple lines usign Regex


I'm having the below text.

^0001   HeadOne


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

^0002   HeadTwo


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.


^004    HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

^0004   HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Below is the regex I'm using to Find.

@@([\n\r\s]*)(.*)([\n\r\s]+)\^

but this is catching only ^0001 and ^0003 as these have only one paragraph, but in my text there are multi para contents.

I'm using VS code, can someone please let me know how can I capture such multi para strings using REGEX in VS code or NPP.

Thanks


Solution

  • One weird thing about VSCode regex is that \s does not match all line break chars. One needs to use [\s\r] to match all of them.

    Keeping that in mind, you want to match all substrings that start with @@ and then stretch up to a ^ at the start of a line or end of string.

    I suggest:

    @@.*(?:[\n\r]+(?!\s*\^).*)*
    

    See the regex demo

    NOTE: To only match @@ at the start of a line, add ^ at the start of the pattern, ^@@.*(?:[\s\r]+(?!\s*\^).*)*.

    NOTE 2: Starting with VSCode 1.29, you need to enable search.usePCRE2 option to enable lookaheads in your regex patterns.

    Details

    • ^ - start of a line
    • @@ - a literal @@
    • .* - the rest of the line (0+ chars other than line break chars)
    • (?:[\n\r]?(?!\s*\^).*)* - 0 or more consecutive occurrences of:
      • [\n\r]+(?!\s*\^) - one or more line breaks not followed with 0+ whitespace and then ^ char
      • .* - the rest of the line

    In Notepad++, use ^@@.*(?:\R(?!\h*\^).*)* where \R matches a line break, and \h* matches 0 or more horizontal whitespaces (remove if ^ is always the first char on a delimiting line).