Search code examples
powershellcsvsplit

How to split a CSV file into multiple plain text files in Powershell?


I'm trying to split a CSV file (Excel correctly shows two columns and multiple rows) into multiple plain text files. One file for each row. This is how far i got:

$InputFilename = Get-Content 'csv-full.csv'

$OutputFilenamePattern = 'output_done_'

$LineLimit = 1

$line = 0

$i = 0

$file = 0

$start = 0

while ($line -le $InputFilename.Length) {

if ($i -eq $LineLimit -Or $line -eq $InputFilename.Length) {

$file++

$Filename = "$OutputFilenamePattern$file.txt"

$InputFilename[$start..($line-1)] | Out-File $Filename -Force

$start = $line;

$i = 0

Write-Host "$Filename"

}

$i++;

$line++

}

csv-full.csv is the name of my file.

The issue I have with this method is that it creates dozens of files for a 4 row CSV test file, the first file being the header and the rest of files some are blank, others have paragraphs of text that are in the same row, but I don't know why are split into different files.

Please consider that the first column is a title, and the second column is the text in some cases multiple paragraphs. The text includes accented characters and symbols (text is in Spanish: áéíóúñ¿?) The real CSV file has thousands of rows.

enter image description here

Thank you. If possible, please that the output files are in UTF-8 encoding

Edit:

These are a few lines of the CSV file as plain text:

Pregunta,Respuesta
¿Qué una casa?,"

Una casa es un edificio para habitar. El término suele utilizarse para nombrar a la construcción de una o pocas plantas que está destinada a la vivienda de una única familia, en oposición a los edificios de múltiples departamentos, apartamentos o pisos.

Por ejemplo: “Mi tía vive en una casa con jardín y piscina”, “Me encantaría mudarme a una casa, porque en el departamento me siento como encerrada”, “Los delincuentes ingresaron a la casa y amenazaron al matrimonio”.

"
¿Qué un pregunta?,"

El término pregunta tiene su origen etimológico en el latín. Y es que fruto de la suma de dos componentes de dicha lengua: el prefijo “pre-“, que significa “antes”, y el verbo “cunctari”, que puede traducirse como “dudar” o “demorar”.

Una pregunta es una interpelación que se realiza con la intención de obtener algún tipo de información. Al pronunciar esta interrogación, se espera recibir una respuesta que incluya los datos buscados.

Solution

  • Since your CSV has multi-line strings doing this without a CSV parser would make the task very difficult, instead you can use Import-Csv to read and parse it then Export-Csv to export each row to a separate file:

    $outputfolder = 'path\to\outputFolder'
    Import-Csv path\to\inputCsv.csv -Encoding utf8 | ForEach-Object { $i = 0 } {
        $path = Join-Path $outputfolder -ChildPath ('fileName-Part{0:D2}.csv' -f $i++)
        $_ | Export-Csv $path -NoTypeInformation -Encoding utf8
    }
    

    As for excluding the CSV Headers, there are 2 options, if you're using PowerShell 7.4 (preview) you can simply use -NoHeader, else, you need to ConvertTo-Csv then exclude the first line and use Set-Content for exporting:

    $outputfolder = 'path\to\outputFolder'
    Import-Csv path\to\inputCsv.csv -Encoding utf8 | ForEach-Object { $i = 0 } {
        $path = Join-Path $outputfolder -ChildPath ('fileName-Part{0:D2}.csv' -f $i++)
        $_ | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 | Set-Content $path -Encoding UTF8
    }