I have a very large text file of documents that need to be split up every time it sees "Page: 1 of". But I need it to split at the start of the row where it finds the text.
Example data:
I have the code below and it will split it every time it sees "Page: 1 of", however, it splits it at the match and not at the beginning of the line of the match. So it removes the numbers at the start of the line which I need to keep and also it no longer lines up to the right properly.
(Get-Content -Raw TEST.TXT) -split '(?=Page: 1 of)' | Set-Content -LiteralPath { 'E:\test\TEST_OUT{0}.txt' -f $script:index++ }
It comes out like this. Any ideas?
Turn on the multi-line regex option with inline option (?m)
so as to make ^
and $
match the start and end of each line.
Adapt the look-ahead assertion to look at the start of each line (^
), followed by a non-empty number (+
) of non-newline characters (.
) before substring Page: 1 of
, with the substring starting at a word boundary (\b
).
(Get-Content -Raw TEST.TXT) -split '(?m)(?=^.+\bPage: 1 of)' |
Set-Content -LiteralPath { 'E:\test\TEST_OUT{0}.txt' -f $script:index++ }
Note: If needed, you could further constrain the look-ahead assertion so as require that the line starts with a sequence of digits (\d+
), for instance.