Search code examples
powershellcsvsearchimport-csvexport-csv

Split content from one .csv to multiple files based on content using powershell


I've got a .csv file with two types of rows. The first one contains the header-information. It always starts with AB. The second type contains the content. This one always starts with CD. There can be multiple content-rows after each header-row (always at least one). They belong together until the next header-row (starting with AB again).

Example:

header1; header2; header3; header4; header5; header6; header7
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR
CD; 456789; 22.24; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR
CD; 354345; 85.45; Text; SW;
CD; 123556; 94.63; Text; SW;
CD; 354564; 12.34; Text; SW;
CD; 135344; 32.23; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR
CD; 354564; 12.34; Text; SW;
CD; 852143; 34.97; Text; SW;

How Can I split this file into several .csv-files - one for each header-row (AB) - using PowerShell. My desired outcome would be

BC987654321.csv

header1; header2; header3; header4; header5; header6; header7
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR
CD; 456789; 22.24; Text; SW;

BC987654322.csv

header1; header2; header3; header4; header5; header6; header7
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR
CD; 354345; 85.45; Text; SW;
CD; 123556; 94.63; Text; SW;
CD; 354564; 12.34; Text; SW;
CD; 135344; 32.23; Text; SW;

and BC987654323.csv

header1; header2; header3; header4; header5; header6; header7
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR
CD; 354564; 12.34; Text; SW;
CD; 852143; 34.97; Text; SW;

I am not used to PowerShell at all - so I'll appreciate a newb-friendly solution.

Thank you very much in advance.


Solution

  • If I understand correctly, you want to split the csv on every row where 'header1' is equal to 'AB' and then use what is in that row under 'header6' for the output file name.

    $path = 'D:\Test'
    $fileIn = Join-Path -Path $path -ChildPath 'input.csv'
    $fileOut = $null   # will get a value in the loop
    $splitValue = 'AB' # the header1 value that decides to start a new file
    $csv = Import-Csv -Path $fileIn -Delimiter ';'
    # get an array of the column headers
    $allHeaders = $csv[0].PsObject.Properties.Name
    foreach ($item in $csv) {
        if ($item.header1 -eq $splitValue) { 
            # start a new file
            $fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.header6)
            # create the new csv file with the first row of data already in it
            $item | Select-Object $allHeaders | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation
        }
        else {
            # rows with header1 not 'AB' are added to that file
            if ([string]::IsNullOrEmpty($fileOut)) {
                Write-Warning "Could not find a starting row (header1 = '$splitValue') for the file"
            }
            else {
                $item | Select-Object $allHeaders | Export-Csv -Path $fileOut -Delimiter ';' -Append
            }
        }
    }
    

    Of course, change the paths to match your environment.

    Output:

    BC987654321.csv

    "header1";"header2";"header3";"header4";"header5";"header6";"header7"
    "AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654321";"EUR"
    "CD";"456789";"22.24";"Text";"SW";"";
    

    BC987654322.csv

    "header1";"header2";"header3";"header4";"header5";"header6";"header7"
    "AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR"
    "CD";"354345";"85.45";"Text";"SW";"";
    "CD";"123556";"94.63";"Text";"SW";"";
    "CD";"354564";"12.34";"Text";"SW";"";
    "CD";"135344";"32.23";"Text";"SW";"";
    

    BC987654323.csv

    "header1";"header2";"header3";"header4";"header5";"header6";"header7"
    "AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR"
    "CD";"354564";"12.34";"Text";"SW";"";
    "CD";"852143";"34.97";"Text";"SW";;