Search code examples
windowspowershellcsvscriptingpowershell-cmdlet

process multiple CSV file and delete rows in a single column which has double semi colon characters using powershell


consider I have a below CSV file.

input:

ID;ITEM_ID;STATUS;
001;;RELEASED;
002;36530;RELEASED;
003;86246;RELEASED;
004;;RELEASED;

I want to remove the row that has ;; (ITEM_ID) missing and save it.I tried doing it on one sample file and it worked as expected.

Import-Csv -Path ".\TestFile.CSV" | where {$_.ITEM_ID -ne ""} | Export-Csv -Path ".\TestFile-temp.CSV" -NoTypeInformation
Remove-Item -Path '.\TestDir\TestFile.csv'
Rename-Item -Path '.\TestDir\TestFile-temp.csv' -NewName 'TestFile.csv'

output:

ID;ITEM_ID;STATUS;
002;36530;RELEASED;
003;86246;RELEASED;

The challenge is, i have multiple csv files and it doesn't has value in different columns, but in single column when i opened in excel file. so it's not taking the condition < where {$_.ITEM_ID -ne ""} >. Now i have to search/parse each row of each csv file, search special character (;;) in that row and delete the line and save the file.

i am good at shell scripting but, i am very new to powershell scripting. can anybody please help me to get the logic here or use other cmdlet that can do the job?

$fileDirectory = "C:\Users\Administrator\Documents\check";
foreach($file in Get-ChildItem $fileDirectory)
{
    $csvFileToCheck = Import-Csv -Path $fileDirectory\$file
    $noDoubleSemiComma = foreach($line in $csvFileToCheck)
            {
                if(Select-String << i want the logic here>>)
                {
                $line
                }               
            }
    $noDoubleSemiComma | Export-Csv -Path $fileDirectory\tmp.csv -NoTypeInformation
    Remove-Item -Path $fileDirectory\$file
    Rename-Item -Path $fileDirectory\tmp.csv -NewName $file
}

Solution

  • As commented, you need to add parameter -Delimiter ';' to the cmdlet otherwise a comma is used to parse the fields in the CSV.

    As I understand, you also want to remove the quotes Export-Csv outputs around all fields and headers and for PowerShell version 7 you have the option to use parameter -UseQuotes AsNeeded.

    As this is not available for version 5.1, I made a function ConvertTo-CsvNoQuotes some time ago to remove the quotes in a safe way. (simply replacing them all with an empty string is dangerous, because sometimes values do need quotes)

    Copy that function into your script at the top, then below that, your code could be simplified like this:

    $fileDirectory = "C:\Users\Administrator\Documents\check"
    
    Get-ChildItem -Path $fileDirectory -Filter '*.csv' -File | ForEach-Object {
        # for better readability store the full path of the file in a variable
        $filePath = $_.FullName
        (Import-Csv -Path $filePath -Delimiter ';') | ConvertTo-CsvNoQuotes -Delimiter ';' | Set-Content $filePath -Force
        Write-Host "File '$filePath' modified"
    }