Search code examples
powershellcsvwhile-loopsplitchunks

In Powershell Import a CSV , split into multiple files every 10 lines


i'm new at powershell coding and what i'm trying to do is to import a CSV files with an unknown number of lines and then split it every 10 lines to a new CSV file.

Example if I input a CSV file of 97 lines, I expect 9 files of 10 lines and 1 file of 7 lines.

i tried the following, the Hostlist.csv contains 35 hostname, i'm getting 3 files, the first file missing the first host in the list and there is no 4th file with remaining host.

$Hostlist = Get-Content '.\Hostlist.csv'
$BatchID = Get-Random
New-Item -ItemType Directory -Force -Path .\$BatchID
$MaxBatch = 10
$line = 0
$i = 0
$File = 0

While ($line -lt $Hostlist.Length) {

    if ($i -gt $MaxBatch) {
        $Start = $line - $MaxBatch
        $File++
        $Hostlist[$Start..($line-1)] | Out-File ".\$BatchID\Batch$File.csv" -Append -force
        $i = 0
    }
    $i++;
    $line++
}

Solution

  • As an aside:

    • If your input file is a CSV file and you want your output files to be CSV files too, you'd have to write a header line to each.

    • Your code makes no attempt to do so, so I'm assuming you're simply dealing with line-oriented plain-text files, despite the .csv filename extension.

    the first file missing the first host

    Since your condition for processing a batch is $i -gt $MaxBatch, $i and $line are both 11 when you enter the if block first, and $Start = $line - $MaxBatch is therefore 1, i.e. the second line, given that $Start is used as a 0-based array index.

    there is no 4th file with remaining host.

    Since you're only processing a batch if $i -gt $MaxBatch, an input file whose line count that isn't evenly divisible by $MaxBatch will always be missing its last batch, because the last, incomplete batch then never satisfies the if condition.


    I suggest simplifying your code by calculating the number of batches and looping batch by batch, as shown in the following simplified example, which partitions a 10-line input into batches of 3:

    # Simulate the list of hosts
    $HostList = 1..10 -replace '^', 'host$&' # 'host1', 'host2', ...
    
    # Batch size
    $MaxBatch = 3
    
    foreach ($batch in 1..[math]::Ceiling($Hostlist.Count / $MaxBatch)) {
      write-verbose -Verbose "File index (batch number): $batch"
      $startNdx = ($batch-1) * $MaxBatch
      $Hostlist[$startNdx..($startNdx + $MaxBatch - 1)]
    }
    

    Note how use of [math]::Ceiling() on the batch-count calculation $Hostlist.Count / $MaxBatch ensures that an incomplete batch at the end is also processed.

    Unless you have Set-StrictMode -Version 3 or higher in effect, it is fine to exceed the upper array bound in the .. range expression used to slice the array in a final, incomplete batch - PowerShell will simply ignore indices beyond the upper bound.

    The above yields:

    VERBOSE: File index (batch number): 1
    host1
    host2
    host3
    VERBOSE: File index (batch number): 2
    host4
    host5
    host6
    VERBOSE: File index (batch number): 3
    host7
    host8
    host9
    VERBOSE: File index (batch number): 4
    host10