Search code examples
powershellcsvlockingexport-to-csv

CSV splitting causes errors


Could you please help me with an issue described below?

I wrote a script in PS which tries to split large CSV file (30 000 rows / 6MB) into smaller ones. New files are named as a mix of 1st and 2nd column content. If file already exists, script only appends new lines.

Main CSV file example:

Site;OS.Type;Hostname;IP address
Amsterdam;Server;AMS_SRVDEV01;10.10.10.12
Warsaw;Workstation;WAR-L4D6;10.10.20.22
Ankara;Workstation;AN-D5G36;10.10.13.22
Warsaw;Workstation;WAR-SRVTST02;10.10.20.33
Amsterdam;Server;LON-SRV545;10.10.10.244

PowerShell Version: 5.1.17134.858

function Csv-Splitter {
    $fileName = Read-Host "Pass file name to process: "

    $FileToProcess = Import-Csv "$fileName.csv" -Delimiter ';'
    $MyList = New-Object System.Collections.Generic.List[string]                                         

    foreach ($row in $FileToProcess) {
        if ("$($row.'OS.Type')-$($row.Site)" -notin $MyList) {
            $MyList.Add("$($row.'OS.Type')-$($row.Site)")
            $row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
        }
        else {
            $row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
        }
    }

}

Basically, code works fine, however it generates some errors from time to time when it process through the loop. This causes lack of some rows in new files - number of missing rows equals to amount of errors:

Export-Csv : The process cannot access the file 'C:\xxx\xxx\xxx.csv' because it is being used by another process.


Solution

  • Export-Csv is synchronous - by the time it returns, the output file has already been closed - so the code in the question does not explain the problem.

    As you've since confirmed in a comment, based on a suggestion by Lee_Dailey, the culprit was the AV (anti-virus) Mcafee On-Access Scan module, which accessed each newly created file behind the scenes, thereby locking it temporarily, causing Export-Csv to fail intermittently.

    The problem should go away if all output files can be fully created with a single Export-Csv call each, after the loop, as also suggested by Lee. This is preferable for performance anyway, but assumes that the entire CSV file fits into memory as a whole.

    Here's a Group-Object-based solution that uses a single pipeline to implement write-each-output-file-in-full functionality:

    function Csv-Splitter {
    
      $fileName = Read-Host "Pass file name to process: "
    
      Import-Csv "$fileName.csv" -Delimiter ';' | 
        Group-Object { $_.'OS.Type' + '_' + $_.Site + '.csv' } |
          ForEach-Object { $_.Group | Export-Csv -NoTypeInformation $_.Name }
    
    }
    
    

    Your own answer shows alternative solutions that eliminate interference from the AV software.