I am writing a PowerShell script to work in Windows 10. I am using the 'HTML Agility Pack' library version 1.11.43.
In this library, there is a GetAttributeValue
method for HTML element nodes in four versions:
public string GetAttributeValue(string name, string def)
public int GetAttributeValue(string name, int def)
public bool GetAttributeValue(string name, bool def)
public T GetAttributeValue<T>(string name, T def)
I have written a test script for these methods on PowerShell:
$libPath = "HtmlAgilityPack.1.11.43\lib\netstandard2.0\HtmlAgilityPack.dll"
Add-Type -Path $libPath
$dom = New-Object -TypeName "HtmlAgilityPack.HtmlDocument"
$dom.Load("test.html", [System.Text.Encoding]::UTF8)
foreach ($node in $dom.DocumentNode.DescendantNodes()) {
if ("#text" -ne $node.Name) {
$node.OuterHTML
" " + $node.GetAttributeValue("class", "")
" " + $node.GetAttributeValue("class", 0)
" " + $node.GetAttributeValue("class", $true)
" " + $node.GetAttributeValue("class", $false)
" " + $node.GetAttributeValue("class", $null)
}
}
File 'test.html':
<p class="true"></p>
<p class="false"></p>
<p></p>
<p class="any other text"></p>
Test script execution result:
<p class="true"></p>
true
0
True
True
True
<p class="false"></p>
false
0
False
False
False
<p></p>
0
True
False
False
<p class="any other text"></p>
any other text
0
True
False
False
I know that to get the attribute value of an HTML element, you can also use the expression $node.Attributes["class"]
. I also understand what polymorphism and method overloading are. I also know what a generic method is. I don't need to explain that.
I have three questions:
When called $node.GetAttributeValue("class", $null)
from a PowerShell script, which of the four variants of the GetAttributeValue
method works?
I think the fourth option works (generic method). Then why does a call with the second parameter $null
work exactly the same as a call with the second parameter $false
?
In the C# source code, the fourth option requires the following condition to work
#if !(METRO || NETSTANDARD1_3 || NETSTANDARD1_6)
I tried the library versions for NETSTANDARD1_6
and for NETSTANDARD2_0
. The test script works the same way. But with NETSTANDARD1_6
the fourth option should be unavailable, right? Then when NETSTANDARD1_6
then which version of the method GetAttributeValue
works with the second parameter $null
?
tl;dr
To achieve what you unsuccessfully attempted with $node.GetAttributeValue("class", $null)
, i.e., to return the attribute value as a [string]
and default to $null
if there is none, use:
$node.GetAttributeValue("class", [string] [NullString]::Value)
[string] $null
works too, but makes ""
(the empty string) rather than $null
the default value.
While the overload resolution that you're seeing is surprising, you can resolve ambiguity during PowerShell's method overload resolution with casts:
$dom = [HtmlAgilityPack.HtmlDocument]::new()
$dom.LoadHtml(@'
<p class="true"></p>
<p class=42></p>
<p></p>
<p class="any other text"></p>
'@)
$nodes = $dom.DocumentNode.SelectNodes('p')
# Note the use of explicit casts (e.g., [string]) to guide overload resolution.
$nodes[0].GetAttributeValue('class', [bool] $false)
$nodes[1].GetAttributeValue('class', [int] 0)
$nodes[2].GetAttributeValue('class', [string] 'default')
$nodes[3].GetAttributeValue('class', [string] [NullString]::Value)
Output:
True
42
default
any other text
Alternatively, in PowerShell (Core) 7.3+[1], you can now call generic methods with explicit type arguments:
# PS 7.3+
# Note the generic type argument directly after the method name.
# Calls the one and only generic overload, with various types substituted for T:
# public T GetAttributeValue<T>(string name, T def)
# Note how the 2nd argument doesn't need a cast anymore.
$nodes[0].GetAttributeValue[bool]('class', $false)
$nodes[1].GetAttributeValue[int]('class', 0)
$nodes[2].GetAttributeValue[string]('class', 'default')
$nodes[3].GetAttributeValue[string]('class', [NullString]::Value)
Note:
When you pass $null
to a [string]
typed parameter (both in cmdlets and .NET methods), PowerShell actually converts it quietly to ""
(the empty string). [NullString]::Value
tell's PowerShell to pass a true null
instead, and is mostly needed for calling .NET methods where a behavioral distinction can result from passing null
vs. ""
.
Therefore, if you were to call $nodes[3].GetAttributeValue('class', [string] $null)
or, in PS 7.3+, $nodes[3].GetAttributeValue[string]('class', $null)
, you'd get ""
(empty string) as the default value if attribute class
doesn't exist.
By contrast, [NullString]::Value
, as used in the commands above, causes a true $null
value to be returned if the attribute doesn't exist; you can test for that with $null -eq ...
.
As for your questions:
On a general note, PowerShell's overload resolution is complex, and for the ultimate source of truth you'll have to consult the source code. The following is based on the de-facto behavior as of PowerShell 7.2.6 and musings about logic that could be applied.
When calling
$node.GetAttributeValue("class", $null)
from a PowerShell script, which of the four variants of the GetAttributeValue method works?
In practice, the public bool GetAttributeValue(string name, bool def)
overload is chosen; why it, specifically, is chosen among the available overloads is ultimately immaterial, because the fundamental problem is that to PowerShell, $null
provides insufficient information as to the type it may be a stand-in for, so it cannot generally be expected to select a specific overload (for the latter, you need a cast, as shown at the top):
In C# passing null
to the second parameter in a non-generic call unambiguously implies the overload with the string
-typed def
parameter, because among the non-generic overloads, string
as the type of the def
parameter is the only .NET reference type, and therefore the only type that can directly accept a null
argument.
This is not true in PowerShell, which has much more flexible, implicit type-conversion rules: from PowerShell's perspective, $null
can bind to any of the types among the def
parameters, because it allows $null
to be converted to those types; specifically, [bool] $null
yields $false
, [int] $null
yields 0
, and - perhaps surprisingly, as discussed above - [string] $null
yields ""
(the empty string).
However, curiously, even using [NullString]::Value
doesn't make a difference, even though PowerShell should know that this special value represents a $null
value for a string
parameter - see GitHub issue #18072
I think the fourth option works (generic method). Then why does a call with the second parameter $null work exactly the same as a call with the second parameter $false?
With the generic invocation syntax available in v7.3+, the generic overload definitely works - and a $null
as the default-value argument is converted to the type specified as the type argument (assuming PowerShell allows such a conversion; it wouldn't work with [datetime]
, for instance, because [datetime] $null
causes an error).
Even with the non-generic syntax, PowerShell does select the generic overload by inference, as the following example shows, but only when you pass an actual object rather than $null
:
# Try to retrieve a non-existent attribute and provide a [double]
# default value.
# The fact that a [double] instance is returned implies that the
# generic overload was chosen.
# -> 'System.Double'
$nodes[0].GetAttributeValue('nosuch', [double] $null).GetType().FullName
In the C# source code, the fourth option requires the following condition to work [...]
When you pass $null
, the generic overload is not considered - and cannot be, in the absence of type information - so this doesn't make a difference.
[1] As of this writing, v7.3 hasn't been released yet, but preview versions are available - see the repo.