Search code examples
csvpowershellpowershell-5.0

How To Access Specific Rows in an Import-Csv Array?


I need to split a large file upload into many parallel processes and want to use a single CSV file as input. Is it possible to access blocks of rows from an Import-Csv object, something like this:

$SODAData = Import-Csv $CSVPath -Delimiter "|" |
            Where $_.Rownum == 20,000..29,999 | 
            Foreach-Object { ... }

What is the syntax for such an extraction? I'm using Powershell 5.


Solution

  • Import-Csv imports the file as an array of objects, so you could do something like this (using the range operator):

    $csv = Import-CSv $CSVPath -Delimiter '|'
    $SOAData = $csv[20000..29999] | ForEach-Object { ... }
    

    An alternative would be to use Select-Object:

    $offset = 20000
    $count  = 10000
    $csv = Import-Csv $CSVPath -Delimiter '|'
    $SODAData = $csv |
                Select-Object -Skip $offset -First $count |
                ForEach-Object { ... }
    

    If you want to avoid reading the entire file into memory you can change the above to a single pipeline:

    $offset = 20000
    $count  = 10000
    $SODAData = Import-Csv $CSVPath -Delimiter '|' |
                Select-Object -Skip $offset -First $count |
                ForEach-Object { ... }
    

    Beware, though, that with this approach you need to read the file multiple times for processing multiple chunks of data.