Search code examples
arrayspowershellpowershell-jobs

Processing large arrays in PowerShell


I am having a difficult time understanding the most efficient to process large datasets/arrays in PowerShell. I have arrays that have several million items that I need to process and group. This list is always different in size meaning it could be 3.5 million items or 10 million items.

Example: 3.5 million items they group by "4's" like the following:

Items 0,1,2,3 Group together 4,5,6,7 Group Together and so on.

I have tried processing the array using a single thread by looping through the list and assigning to a pscustomobject which works it just takes 45-50+ minutes to complete.

I have also attempted to break up the array into smaller arrays but that causes the process to run even longer.

$i=0
$d_array = @()
$item_array # Large dataset


While ($i -lt $item_array.length){

    $o = "Test"
    $oo = "Test"
    $n = $item_array[$i];$i++
    $id = $item_array[$i];$i++
    $ir = $item_array[$i];$i++
    $cs = $item_array[$i];$i++

    $items = [PSCustomObject]@{
        'field1' = $o
        'field2' = $oo
        'field3' = $n
        'field4' = $id
        'field5' = $ir
        'field6'= $cs
    }
    $d_array += $items

}

I would imagine if I applied a job scheduler that would allow me to run the multiple jobs would cut the process time down by a significant amount, but I wanted to get others takes on a quick effective way to tackle this.


Solution

  • If you are working with large data, using C# is also effective.

    Add-Type -TypeDefinition @"
    using System.Collections.Generic;
    
    public static class Test
    {
        public static List<object> Convert(object[] src)
        {
            var result = new List<object>();
            for(var i = 0; i <= src.Length - 4; i+=4)
            {
                result.Add( new {
                    field1 = "Test",
                    field2 = "Test",
                    field3 = src[i + 0],
                    field4 = src[i + 1],
                    field5 = src[i + 2],
                    field6 = src[i + 3]
                });
            }
            return result;
        }
    }
    "@
    
    $item_array = 1..10000000
    $result = [Test]::Convert($item_array)