Search code examples
powershellpowershell-4.0

Parsing and splitting files based on the string


I have a very large file (hence .ReadLines) which I need to efficiently and quickly parse and split into other files. For each line which contains a keyword I need to copy that line and append to a specific file. This is what I have so far, the script runs but the files aren't getting populated.

$filename = "C:\dev\powershell\test1.csv"

foreach ($line in [System.IO.File]::ReadLines($filename)) {
    if    ($line | %{$_ -match "Apple"}){Out-File -Append Apples.txt}
    elseif($line | %{$_ -match "Banana"}){Out-File -Append Bananas.txt}
    elseif($line | %{$_ -match "Pear"}){Out-File -Append Pears.txt}
}

Example content of the csv file:

Apple,Test1,Cross1
Apple,Test2,Cross2
Apple,Test3,Cross3
Banana,Test4,Cross4
Pear,Test5,Cross5

I want Apples.txt to contain:

Apple,Test1,Cross1
Apple,Test2,Cross2
Apple,Test3,Cross3

Solution

  • Couple of things:

    Your if conditions don't need %/foreach-object - -match will do on its own:

    foreach ($line in [System.IO.File]::ReadLines($filename)) {
      if($line -match "Apple"){
        # output to apple.txt
      }
      else($line -match "Banana"){
        # output to banana.txt
      }
      # etc...
    }
    

    The files aren't getting populated because you're not actually sending any output to Out-File:

    foreach ($line in [System.IO.File]::ReadLines($filename)) {
      if($line -match "Apple"){
        # send $line to the file
        $line |Out-File apple.txt -Append
      }
      # etc...
    }
    

    If your files are really massive and you expect a lot of matching lines, I'd recommend using a StreamWriter for the output files - otherwise Out-File will be opening and closing the file all the time:

    $OutFiles = @{
      'apple'  = New-Object System.IO.StreamWriter $PWD\apples.txt
      'banana' = New-Object System.IO.StreamWriter $PWD\bananas.txt
      'pear'   = New-Object System.IO.StreamWriter $PWD\pears.txt
    }
    
    foreach ($line in [System.IO.File]::ReadLines($filename)) {
      foreach($keyword in $OutFiles.Keys){
        if($line -match $keyword){
          $OutFiles[$keyword].WriteLine($line)
          continue
        }
      }
    }
    
    foreach($Writer in $OutFiles.Values){
      try{
        $Writer.Close()
      }
      finally{
        $Writer.Dispose()
      }
    }
    

    This way you also only have to maintain the $OutFiles hashtable if you need to update the keywords for example.