Could you please help me with an issue described below?
I wrote a script in PS which tries to split large CSV file (30 000 rows / 6MB) into smaller ones. New files are named as a mix of 1st and 2nd column content. If file already exists, script only appends new lines.
Main CSV file example:
Site;OS.Type;Hostname;IP address Amsterdam;Server;AMS_SRVDEV01;10.10.10.12 Warsaw;Workstation;WAR-L4D6;10.10.20.22 Ankara;Workstation;AN-D5G36;10.10.13.22 Warsaw;Workstation;WAR-SRVTST02;10.10.20.33 Amsterdam;Server;LON-SRV545;10.10.10.244
PowerShell Version: 5.1.17134.858
function Csv-Splitter {
$fileName = Read-Host "Pass file name to process: "
$FileToProcess = Import-Csv "$fileName.csv" -Delimiter ';'
$MyList = New-Object System.Collections.Generic.List[string]
foreach ($row in $FileToProcess) {
if ("$($row.'OS.Type')-$($row.Site)" -notin $MyList) {
$MyList.Add("$($row.'OS.Type')-$($row.Site)")
$row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
}
else {
$row | Export-Csv -Delimiter ";" -Append -NoTypeInformation "$($row.'OS.Type')-$($row.Site).csv"
}
}
}
Basically, code works fine, however it generates some errors from time to time when it process through the loop. This causes lack of some rows in new files - number of missing rows equals to amount of errors:
Export-Csv : The process cannot access the file 'C:\xxx\xxx\xxx.csv' because it is being used by another process.
Export-Csv
is synchronous - by the time it returns, the output file has already been closed - so the code in the question does not explain the problem.
As you've since confirmed in a comment, based on a suggestion by Lee_Dailey, the culprit was the AV (anti-virus) Mcafee On-Access Scan module, which accessed each newly created file behind the scenes, thereby locking it temporarily, causing Export-Csv
to fail intermittently.
The problem should go away if all output files can be fully created with a single Export-Csv
call each, after the loop, as also suggested by Lee. This is preferable for performance anyway, but assumes that the entire CSV file fits into memory as a whole.
Here's a Group-Object
-based solution that uses a single pipeline to implement write-each-output-file-in-full functionality:
function Csv-Splitter {
$fileName = Read-Host "Pass file name to process: "
Import-Csv "$fileName.csv" -Delimiter ';' |
Group-Object { $_.'OS.Type' + '_' + $_.Site + '.csv' } |
ForEach-Object { $_.Group | Export-Csv -NoTypeInformation $_.Name }
}
Your own answer shows alternative solutions that eliminate interference from the AV software.