Background
It is quite common in PowerShell to build a hash table to quickly access objects by a specific property, e.g. to base an index on the LastName
:
$List = ConvertFrom-Csv @'
Id, LastName, FirstName, Country
1, Aerts, Ronald, Belgium
2, Berg, Ashly, Germany
3, Cook, James, England
4, Duval, Frank, France
5, Lyberg, Ash, England
6, Fischer, Adam, Germany
'@
$Index = @{}
$List |ForEach-Object { $Index[$_.LastName] = $_ }
$Index.Cook
Id LastName FirstName Country
-- -------- --------- -------
3 Cook James England
In some cases it is required to build the index on two (or even more) properties, e.g. the FirstName
and the LastName
. For this you might create a multi dimensional key, e.g.:
$Index = @{}
$List |ForEach-Object {
$Index[$_.FirstName] = @{}
$Index[$_.FirstName][$_.LastName] = $_
}
$Index.James.Cook
Id LastName FirstName Country
-- -------- --------- -------
3 Cook James England
But it is easier (and possibly even faster) to just concatenate the two properties. If only for checking for the existence of the entry: $Index.ContainsKey('James').ContainsKey('Cook')
where an error might occur if the FirstName
doesn't exist.
To join the properties, it is required to use a delimiter between the property otherwise different property lists might end up as the same key. As this example: AshlyBerg
and AshLyberg
.
$Index = @{}
$List |ForEach-Object { $Index["$($_.FirstName)`t$($_.LastName)"] = $_ }
$Index."James`tCook"
Id LastName FirstName Country
-- -------- --------- -------
3 Cook James England
Note: the above are Minimal, Reproducible Examples. In real life, I come several times to the questions below, which includes generally joining objects where the background - and number of properties used in the index are variable.
Questions:
There is no built-in separator for multi-component hashtable (dictionary) keys.
As for a custom separator: Your best bet for a character that is very unlikely to occur in the components themselves is NUL
(the character with code point 0x0
), which you can represent as "`0"
in PowerShell. However, performing convention-based string operations on every lookup is awkward (e.g. $Index."James`0Cook"
) and generally only works if stringifying the key components is feasible - or if they're all strings to begin with, as in your example.
Using arrays for multi-component keys is syntactically preferable, but using collections generally does not work as-is, because .NET reference types in general do not compare distinct instances meaningfully, even if they happen to represent the same data - see this answer.
zett42's helpful answer uses tuples, which do perform meaningful comparisons of distinct instances whose members contain equal data.
However, the need to construct a tuple instance for each addition / modification / lookup is syntactically awkward (e.g., $Index.([Tuple]::Create('James', 'Cook'))
There is a way of making regular PowerShell arrays work as hastable keys, in a manner that only adds complexity to creating the hashtable (calling a constructor), while allowing regular array syntax for additions / updates and lookups (e.g., $Index.('James', 'Cook')
).
[ordered]
hashtables, which, however, must be referred to by their true type name so as to be able to call a construct, namely [System.Collections.Specialized.OrderedDictionary]
.[System.Collections.Generic.Dictionary[TKey, TValue]]
).# Sample objects for the hashtable.
$list = ConvertFrom-Csv @'
Id, LastName, FirstName, Country
1, Aerts, Ronald, Belgium
2, Berg, Ashly, Germany
3, Cook, James, England
4, Duval, Frank, France
5, Lyberg, Ash, England
6, Fischer, Adam, Germany
'@
# Initialize the hashtable with a structural equality comparer, i.e.
# a comparer that compares the *elements* of the array and only returns $true
# if *all* compare equal.
# This relies on the fact that [System.Array] implements the
# [System.Collections.IStructuralEquatable] interface.
$dict = [hashtable]::new([Collections.StructuralComparisons]::StructuralEqualityComparer)
# Add entries that map the combination of first name and last name
# to each object in $list.
# Note the regular array syntax.
$list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ })
# Use regular array syntax for lookups too.
# Note: CASE MATTERS
$dict.('James', 'Cook')
Important: The above performs case-SENSITIVE comparisons (as zett42's tuple solution does), unlike regular PowerShell hashtables.
Making the comparisons case-INSENSITIVE requires more work, because a custom implementation of the [System.Collections.IEqualityComparer]
interface is required, namely a case-insensitive implementation of what [System.Collections.StructuralComparisons]::StructuralEqualityComparer
provides:
# Case-insensitive IEqualityComparer implementation for
# use of arrays as dictionary keys.
# Note: Dictionary keys cannot be $null, so there is no need for $null checks.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
[bool] Equals([object] $o1, [object] $o2) {
return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase)
}
[int] GetHashCode([object] $o) {
return ([System.Collections.IStructuralEquatable] [array] $o).GetHashCode([StringComparer]::InvariantCultureIgnoreCase)
}
}
# Pass an instance of the custom equality comparer to the constructor.
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
Note:
Santiago Squarzon discovered ([System.Collections.IStructuralEquatable] $o).GetHashCode([StringComparer]::InvariantCultureIgnoreCase)
as a built-in way to get a hash code for an array based on its elements' case-insensitive hash codes.
The original solutions below calculate the array's case-insensitive hash code element by element, which is both more cumbersome and less efficient. Perhaps they are still of interest in general with respect to how hash codes are calculated.
# Case-insensitive IEqualityComparer implementation for arrays.
# See the bottom section of this answer for a better .NET 7+ alternative.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
[bool] Equals([object] $o1, [object] $o2) {
return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase)
}
[int] GetHashCode([object] $o) {
if ($o -isnot [Array]) { return $o.GetHashCode() }
[int] $hashCode = 0
foreach ($el in $o) {
if ($null -eq $el) {
continue
} elseif ($el -is [string]) {
$hashCode = $hashCode -bxor $el.ToLowerInvariant().GetHashCode()
} else {
$hashCode = $hashCode -bxor $el.GetHashCode()
}
}
return $hashCode
}
}
$list = ConvertFrom-Csv @'
Id, LastName, FirstName, Country
1, Aerts, Ronald, Belgium
2, Berg, Ashly, Germany
3, Cook, James, England
4, Duval, Frank, France
5, Lyberg, Ash, England
6, Fischer, Adam, Germany
'@
# Pass an instance of the custom equality comparer to the constructor.
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$list.ForEach({ $dict.($_.FirstName, $_.LastName) = $_ })
# Now, case does NOT matter.
$dict.('james', 'cook')
A note on the .GetHashCode()
implementation in the custom comparer class above:
A custom .GetHashCode()
implementation is required to return the same hash code (an [int]
value) for all objects that compare as equal (that is, if $o1 -eq $o2
is $true
, $o1.GetHashCode()
and $o2.GetHashCode()
must return the same value).
While hash codes aren't required to be unique (and cannot be in all cases), ideally as few objects as possible share the same hash code, as that reduces the number of so-called collisions, which decreases the lookup efficiency of hash tables - see the relevant Wikipedia article for background information.
The implementation above uses a fairly simple, -bxor
-based (bitwise XOR) algorithm, which results in the same hash code for two arrays that have the same elements, but in different order.
.GetHashCode()
help topic shows more sophisticated approaches, including using an auxiliary tuple instance, as its hash code algorithm is order-aware - while simple, this approach is computationally expensive, and more work is needed for better performance. See the bottom section for a .NET 7+ option.zett42's collision test code (adapted), which determines how many among 1000 arrays with a given number of elements that are random string values result in the same hash code, i.e. produce collisions, and calculates a collision percentage from that. If you need to improve the efficiency of the implementation above you can use this code to test it (possibly also to measure the tests' runtime to see how different implementations compare).
# Create an instance of the custom comparer defined above.
$cmp = [CaseInsensitiveArrayEqualityComparer]::new()
$numArrays = 1000
foreach ($elementCount in 2..5 + 10) {
$numUniqueHashes = (
1..$numArrays |
ForEach-Object {
$cmp.GetHashCode(@(1..$elementCount | ForEach-Object { "$(New-Guid)" }))
} |
Sort-Object -Unique
).Count
[pscustomobject] @{
ElementCount = $elementCount
CollisionPercentage = '{0:P2}' -f (($numArrays - $numUniqueHashes) / $numArrays)
}
}
The about outputs 0% for all tests, so it seems that the -bxor
approach is sufficient to prevent collisions, at least with random strings and without including variations of arrays that differ in element order only.
Read on for a superior .NET 7+ solution.
Superior custom equality comparer implementation in .NET 7+ (requires at least a preview version of PowerShell 7.3):
zett42 points out that [HashCode]::Combine()
, available in .NET 7+, allows for a more efficient implementation, as it:
Note:
The method is limited to at most 8
array elements - but for multi-component that should be enough.
The values to combine - the array elements in this case - must be passed as individual arguments to the methods - passing the array as a whole doesn't work as intended. This makes the implementation somewhat cumbersome.
# .NET 7+ / PowerShell 7.3+
# Case-insensitive IEqualityComparer implementation for arrays
# using [HashCode]::Combine() - limited to 8 elements.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
[bool] Equals([object] $o1, [object] $o2) {
return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase)
}
[int] GetHashCode([object] $o) {
if ($o -isnot [Array] -or 0 -eq $o.Count) { return $o.GetHashCode() }
$o = $o.ForEach({ $_ -is [string] ? $_.ToLowerInvariant() : $_ })
$hashCode = switch ($o.Count) {
1 { [HashCode]::Combine($o[0]) }
2 { [HashCode]::Combine($o[0], $o[1]) }
3 { [HashCode]::Combine($o[0], $o[1], $o[2]) }
4 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3]) }
5 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4]) }
6 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5]) }
7 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6]) }
8 { [HashCode]::Combine($o[0], $o[1], $o[2], $o[3], $o[4], $o[5], $o[6], $o[7]) }
default { throw 'Not implemented for more than 8 array elements.' }
}
return $hashCode
}
}
However, as zett42 points out, you can overcame the value-count limit by calling [HashCode]::Combine()
iteratively, in a loop.
In the case of a case-insensitive implementation, that isn't too much overhead, given that you need a loop anyway, namely in order to call .ToLowerInvariant()
on [string]
-typed values (which is what the .ForEach()
call above implicitly does).
Here is his implementation:
# .NET 7+ / PowerShell 7.3+
# Case-insensitive IEqualityComparer implementation for arrays
# using [HashCode]::Combine() *iteratively*, with *no* element-count limit.
class CaseInsensitiveArrayEqualityComparer: System.Collections.IEqualityComparer {
[bool] Equals([object] $o1, [object] $o2) {
return ([System.Collections.IStructuralEquatable] [array] $o1).Equals([array] $o2, [System.StringComparer]::InvariantCultureIgnoreCase)
}
[int] GetHashCode([object] $o) {
if ($o -isnot [Array] -or 0 -eq $o.Count) { return $o.GetHashCode() }
$hashCode = 0
$o.ForEach({
$value = $_ -is [string] ? $_.ToLowerInvariant() : $_
$hashCode = [HashCode]::Combine( $hashCode, $value )
})
return $hashCode
}
}