Search code examples
regexpowershellpowershell-2.0powershell-3.0powershell-4.0

How to include one line above of the keyword line during SPLITING a big txt file using Regex and PowerShell


We have got a big txt file ("C:\temp\longmessages.txt") like below:

Americas

This is Start

some text 1

some text 2

some text 3

etc. etc

End

Europe

This is Start

some text 4

some text 5

some text 6

some text 7

etc. etc

End

Asia

This is Start

some text 8

some text 9

some text 10

etc. etc

End

By using the below PS script, I am able to SPLIT "C:\temp\longmessages.txt" into smaller 1.txt, 2.txt, 3.txt etc. each smaller .txt file split from first "Start" to next "Start" however each smaller file begins from "Start" and leaving the line above the "This is Start" while we want to include one line above the "Start" on the top of each smaller split file means Americas, Europe etc. needs to be added to each file above the "Start"

$InputFile = "C:\temp\longmessages.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
While (($Line = $Reader.ReadLine()) -ne $null) {
    **If ($Line -match "START")** {
  
       $OutputFile = "C:\temp\output\$a.txt"
       $filename
  if ($filename -eq $null){
  
  $OutputFile = $filename
  }
       
        $a++
    }
     
     
    Add-Content $OutputFile $Line
  
}

Solution

  • Continuing from my comment, I think it would be far easier to make the split on the line that says End.

    Try

    $path  = 'C:\temp\longmessages.txt'
    # create a List object to add lines to
    $lines = [System.Collections.Generic.List[string]]::new()
    $count = 1
    
    # use 'switch' to parse the log file line-by-line
    switch -Regex -File $path {
        '^End$' { 
            # add 'End' to the list
            $lines.Add($_)
            # if the top line is empty or whitespace only, remove that line
            if ([string]::IsNullOrWhiteSpace($lines[0])) { $lines.RemoveAt(0) }
            # create the full name of the output file and increment the file counter
            $OutputFile = 'C:\temp\output\{0}.txt' -f $count++
            # write the file
            $lines | Set-Content -Path $OutputFile -Force
            # clear the list for the next file
            $lines.Clear()
        }
        default { $lines.Add($_) }
    }
    

    Using your example this results in three files:

    1.txt

    Americas
    
    This is Start
    
    some text 1
    
    some text 2
    
    some text 3
    
    etc. etc
    
    End
    

    2.txt

    Europe
    
    This is Start
    
    some text 4
    
    some text 5
    
    some text 6
    
    some text 7
    
    etc. etc
    
    End
    

    3.txt

    Asia
    
    This is Start
    
    some text 8
    
    some text 9
    
    some text 10
    
    etc. etc
    
    End