Would like to know if there is a chance to improve performance o following array searching. In the moment it is 16 seconds.
Measure-Command -Expression {
$a = @()
$b = @()
1..10000 | %{$a += $_}
1..10000 | %{$b += $_}
#Try to resize but still running 16 seconds
[array]::Resize([ref]$a,10000)
[array]::Resize([ref]$b,10000)
foreach ($i in $a){
if ($b -contains $i) {
#write-host $i
}
}
}
+=
in case of arrays recreates the entire array: it copies the old contents into a new array along with the new element. It's arguably the worst possible method to populate a large array in PowerShell. Use +=
only if it's a one-time operation not inside a loop, or the array is small and the time doesn't matter.
Populate the array directly:
$a = 1..10000
Or collect the output of foreach
statement directly:
$a = foreach ($i in 1..10000) { $i }
Or use an ArrayList:
$a = [Collections.ArrayList]@()
foreach ($i in 1..10000) { $a.Add($i) >$null }
Note, in this particular case it's faster to convert an existing array to an ArrayList:
$a = [Collections.ArrayList]@(1..10000)
Pipelining is a complex operation, it's several/many times slower than flow control statements like foreach
(statement, not cmdlet), while
, do
.
ScriptBlock (the code in curly braces { }
in ForEach
cmdlet, aliased as %
) takes a lot of time to create the execution context for each element compared to the simple code inside.
-contains
checks each element until a match is found so the final number of iterations is $a.count * $b.count
in the worst case or half as much on the average.
as a hashtable:
$a = 1..10000
$b = 1..10000
$bIndex = @{}
foreach ($i in $b) { $bIndex[$i] = $true }
foreach ($i in $a) {
if ($bIndex[$i]) {
#write-host $i
}
}
15 milliseconds on i7 CPU
as a HashSet (.NET 3.5 and newer, built-in since Win7, installable on XP):
$a = 1..10000
$b = 1..10000
$bIndex = [Collections.Generic.HashSet[int]]$b
foreach ($i in $a) {
if ($bIndex.Contains($i)) {
#write-host $i
}
}
7 milliseconds on i7 CPU
intersect the arrays using a HashSet, then iterate the result:
$a = 1..10000
$b = 1..10000
$inBoth = [Collections.Generic.HashSet[int]]$a
$inBoth.IntersectWith([Collections.Generic.HashSet[int]]$b)
foreach ($i in $inBoth) {
#write-host $i
}
7 milliseconds on i7 CPU
It'll be many times faster in case the contents of arrays differs, that is when the intersection is much smaller than the arrays.
It's also possible to build a real index that contains a list of original item indices in the array - useful in case of duplicate values.