Search code examples
powershellcsvvariablesdynamicunique

CSV Dynamic Unique Columns


I need to import a number of CSV files, dynamically create variable names based on header row values, assign column values to column arrays, and unique column values to unique column arrays.

ID;PlayGolf;Day;Temperature;Outlook;Humidity;Windy;
1;no;05-Jul;hot;sunny;high;FALSE;
2;no;06-Jul;hot;sunny;high;TRUE;
3;yes;07-Jul;hot;overcast;high;FALSE;
4;yes;09-Jul;cool;rain;normal;FALSE;
5;yes;10-Jul;cool;overcast;normal;TRUE;
6;no;12-Jul;mild;sunny;high;FALSE;
7;yes;14-Jul;cool;sunny;normal;FALSE;
8;yes;15-Jul;mild;rain;normal;FALSE;
9;yes;20-Jul;mild;sunny;normal;TRUE;
10;yes;21-Jul;mild;overcast;high;TRUE;
11;yes;22-Jul;hot;overcast;normal;FALSE;

For example, in the above CSV file, I would create $ID, $PlayGolf, and so on. The array $PlayGolf would contain the values @("no", "no", "yes", "yes",..."yes") and array $PlayGolfUnique would contain the values @("no", "yes").

The following snippet outlines how far I have got towards a solution:

Param(
    [Parameter(Position = 0, HelpMessage = "Input data: (e.g. Input.csv")]
    [string]$inputFile = "Input.csv",
    [Parameter(Position = 2, HelpMessage = "Data delimiter: (e.g. ;")]
    [string]$dataDelimiter = ";"
)

$main = {
    Begin {
        Write-Host "SO Question Begin..." -ForegroundColor Black -BackgroundColor Green
    }
    Process {
        try {
            $line = (Get-Content $inputFile -TotalCount 2)[0]
            $delimiterCount = ([char[]]$line -eq $dataDelimiter).Count

            $colHeaders = @(((Get-Content $inputFile)[0..($delimiterCount - 1)] -split ($dataDelimiter))[0..($delimiterCount - 1)])
        } catch {
            Write-Host "Error: $($_.Exception)" -ForegroundColor White -BackgroundColor Red
            Break
        }
    }
    End {
        if ($?) {
            Write-Host "Completed Successfully." -ForegroundColor Black -BackgroundColor Green
            Write-Host "SO Question End..." -ForegroundColor Black -BackgroundColor Green
        }
    }
}

& $main

Solution

  • Don't try to re-invent the wheel. Import the CSV, select the values of the column in question into one variable, then get the unique values from that variable and assign it to another variable.

    $csv = Import-Csv 'C:\path\to\your.csv' -Delimiter ';'
    
    $PlayGolf       = $csv | Select-Object -Expand 'PlayGolf'
    $PlayGolfUnique = $PlayGolf | Select-Object -Unique
    

    If you want to do that for all columns without knowing the number of columns and/or header names I'd recommend using a hashtable instead of individual variables:

    $csv = Import-Csv 'C:\path\to\your.csv' -Delimiter ';'
    
    $ht = @{}
    $csv[0].PSObject.Properties | ForEach-Object {
        $col = $_.Name
        $ht[$col] = $csv | Select-Object -Expand $col
        $ht["${col}_Unique"] = $csv | Select-Object -Expand $col -Unique
    }