I'm processing large amounts of data and after pulling the data and manipulating it, I have the results stored in memory in a variable.
I now need to separate this data into separate variables and this was easily done via piping and using a where-object, but this has slowed down now that I have much more data (1 million plus members). Note: it takes about 5+ minutes.
$DCEntries = $DNSQueries | ? {$_.ClientIP -in $DCs.ipv4address -Or $_.ClientIP -eq '127.0.0.1'}
$NonDCEntries = $DNSQueries | ? {$_.ClientIP -notin $DCs.ipv4address -And $_.ClientIP -ne '127.0.0.1'}
#Note:
#$DCs is an array of 60 objects of type Microsoft.ActiveDirectory.Management.ADDomainController, with two properties: Name, ipv4address
#$DNSQueries is a collection of pscustomobjects that has 6 properties, all strings.
I immediately realize I'm enumerating $DNSQueries (the large object) twice, which is obviously costing me some time. As such I decided to go about this a different way enumerating it once and using a Switch statement, but this seems to have exponentially caused the timing to INCREASE, which is not what I was going for.
$DNSQueries | ForEach-Object {
Switch ($_) {
{$_.ClientIP -in $DCs.ipv4address -Or $_.ClientIP -eq '127.0.0.1'} {
# Query is from a DC
$DCEntries += $_
}
default {
# Query is not from DC
$NonDCEntries += $_
}
}
}
I'm wondering if someone can explain to me why the second code takes so much more time. Further, perhaps offer a better way to accomplish what I want.
Is the Foreach-Object and/or appending of the sub variables costing that much time?
ForEach-Object
is actually the slowest way to enumerate a collection but also there is a follow-up switch
with a script block
condition causing even more overhead.
If the collection is already in memory, nothing can beat a foreach
loop for linear enumeration.
As for your biggest problem, the use of +=
to add items to an array and it being a collection of a fixed size. PowerShell has to create a new array and copy all items each time a new item is added, this is very inefficient. See this answer as well as this awesome documention for more details.
In this case you can combine a List<T>
with PowerShell's explicit assignment.
$NonDCEntries = [Collections.Generic.List[object]]::new()
$DCEntries = foreach($item in $DNSQueries) {
if($item.ClientIP -eq '127.0.0.1' -or $item.ClientIP -in $DCs.IPv4Address) {
$item
continue
}
$NonDCEntries.Add($item)
}
To put into perspective how exponentially bad +=
to an array is, this a performance test comparing PowerShell explicit assignment from a loop and adding to a List<T>
versus adding to an Array
.
$tests = @{
'PowerShell Explicit Assignment' = {
param($count)
$result = foreach($i in 1..$count) {
$i
}
}
'.Add(..) to List<T>' = {
param($count)
$result = [Collections.Generic.List[int]]::new()
foreach($i in 1..$count) {
$result.Add($i)
}
}
'+= Operator to Array' = {
param($count)
$result = @()
foreach($i in 1..$count) {
$result += $i
}
}
}
5000, 10000, 25000, 50000, 75000, 100000 | ForEach-Object {
$groupresult = foreach($test in $tests.GetEnumerator()) {
$totalms = (Measure-Command { & $test.Value -Count $_ }).TotalMilliseconds
[pscustomobject]@{
CollectionSize = $_
Test = $test.Key
TotalMilliseconds = [math]::Round($totalms, 2)
}
[GC]::Collect()
[GC]::WaitForPendingFinalizers()
}
$groupresult = $groupresult | Sort-Object TotalMilliseconds
$groupresult | Select-Object *, @{
Name = 'RelativeSpeed'
Expression = {
$relativespeed = $_.TotalMilliseconds / $groupresult[0].TotalMilliseconds
[math]::Round($relativespeed, 2).ToString() + 'x'
}
}
}
Below the test results:
CollectionSize Test TotalMilliseconds RelativeSpeed
-------------- ---- ----------------- -------------
5000 PowerShell Explicit Assignment 0.56 1x
5000 .Add(..) to List<T> 7.56 13.5x
5000 += Operator to Array 1357.74 2424.54x
10000 PowerShell Explicit Assignment 0.77 1x
10000 .Add(..) to List<T> 18.20 23.64x
10000 += Operator to Array 5411.23 7027.57x
25000 PowerShell Explicit Assignment 1.39 1x
25000 .Add(..) to List<T> 47.14 33.91x
25000 += Operator to Array 26168.67 18826.38x
50000 PowerShell Explicit Assignment 3.49 1x
50000 .Add(..) to List<T> 97.38 27.9x
50000 += Operator to Array 129537.09 37116.64x
75000 PowerShell Explicit Assignment 14.59 1x
75000 .Add(..) to List<T> 243.47 16.69x
75000 += Operator to Array 247419.68 16958.17x
100000 PowerShell Explicit Assignment 14.85 1x
100000 .Add(..) to List<T> 177.13 11.93x
100000 += Operator to Array 473824.71 31907.39x