Search code examples
powershelltextsplitline

Powershell - find a string in a text file and split the file at the start of the row of the text


I have a very large text file of documents that need to be split up every time it sees "Page: 1 of". But I need it to split at the start of the row where it finds the text.

Example data:

enter image description here

I have the code below and it will split it every time it sees "Page: 1 of", however, it splits it at the match and not at the beginning of the line of the match. So it removes the numbers at the start of the line which I need to keep and also it no longer lines up to the right properly.

(Get-Content -Raw TEST.TXT) -split '(?=Page:  1 of)' | Set-Content -LiteralPath { 'E:\test\TEST_OUT{0}.txt' -f $script:index++ }

It comes out like this. Any ideas?

enter image description here


Solution

    • Turn on the multi-line regex option with inline option (?m) so as to make ^ and $ match the start and end of each line.

    • Adapt the look-ahead assertion to look at the start of each line (^), followed by a non-empty number (+) of non-newline characters (.) before substring Page: 1 of, with the substring starting at a word boundary (\b).

    (Get-Content -Raw TEST.TXT) -split '(?m)(?=^.+\bPage:  1 of)' |
      Set-Content -LiteralPath { 'E:\test\TEST_OUT{0}.txt' -f $script:index++ }
    

    Note: If needed, you could further constrain the look-ahead assertion so as require that the line starts with a sequence of digits (\d+), for instance.