In PowerShell, child nodes and attributes of XML elements are accessible as properties:
([xml]"<book genre='novel'/>").Book.genre
novel
It is a Property
, not a NoteProperty
:
([xml]"<book genre='novel'/>").Book | Get-Member genre
TypeName: System.Xml.XmlElement
Name MemberType Definition
---- ---------- ----------
genre Property string genre {get;set;}
The documentation says:
Attributes are properties of the element, ...
Although, in C#, there are no such properties:
XmlDocument doc = new XmlDocument();
doc.LoadXml("<book genre='novel'/>");
// Console.WriteLine(doc.Book); // Does not work
// Console.WriteLine(doc.DocumentElement.genre); // Does not work
'XmlDocument' does not contain a definition for ...
Is it a PowerShell feature? Or is it a .NET feature I am using incorrectly?
What is the mechanism used to implement these properties?
tl;dr
You're seeing PowerShell's adaptation of the XML DOM (.NET type System.Xml.XmlDocument
, available via type accelerator [xml]
in PowerShell), which presents XML child elements and attributes as properties, allowing for convenient access to them via dot notation.
To see only the .NET type-native members of a given type's instance, pass -View Base
to Get-Member
; e.g.:
# -View Base excludes the *adapted* and possibly other ETS properties.
[xml] '<foo/>' | Get-Member -View Base
PowerShell decorates the object hierarchy contained in System.Xml.XmlDocument
instances (created with cast [xml]
, for instance):
with properties named for the input document's specific elements and attributes[1] at every level; e.g.:
([xml] '<foo><bar>baz</bar></foo>').foo.bar # -> 'baz'
([xml] '<foo><bar id="1" /></foo>').foo.bar.id # -> '1'
turning multiple elements of the same name at a given hierarchy level implicitly into arrays (specifically, of type [object[]]
); e.g.:
([xml] '<foo><C>one</C><C>two</C></foo>').foo.C[1] # -> 'two'
As the examples (and your own code in the question) show, this allows for access via convenient dot notation.
Note:
If you use dot notation to target an element that has at least one attribute and/or child elements, the element itself is returned (an XmlElement
instance); otherwise, it is the element's text content; e.g.:
# The <bar> element's *text content* is returned, as a [string] ('baz'),
# because it has only a text child node and no attributes
([xml] '<foo><bar>baz</bar></foo>').foo.bar
# The <bar> element is returned as an XmlElement instance,
# because it has an *attribute*.
([xml] '<foo><bar id="1">baz</bar></foo>').foo.bar
# The <bar> element is returned as an XmlElement instance,
# because it has *child elements*.
([xml] '<foo><bar><baz>quux</baz></bar></foo>').foo.bar
Updating XML documents via dot notation is limited to simple, non-structural changes; the difference above comes into play:
# OK - direct updating of the text content of a simple
# element (no child nodes, no attributes).
# $xml.foo.bar then yields 'new'
($xml = [xml] '<foo><bar>baz</bar></foo>').foo.bar = 'new'
# OK - direct updating of the attribute of an
# element.
# $xml.foo.bar.id then yields '2'
($xml = [xml] '<foo><bar id="1">baz</bar></foo>').foo.bar.id = 2
# !! FAILS - because <bar> isn't a simple element in this case,
# !! due to the presence of an *attribute*, you cannot directly assign new text content.
# !! -> Error "Cannot set "bar" because only strings can be used as values to set XmlNode properties."
($xml = [xml] '<foo><bar id="1">baz</bar></foo>').foo.bar = 'new'
# OK - assign to the type-native .InnerText property
# $xml.foo.bar.InnerText then yields 'new'
($xml = [xml] '<foo><bar id="1">baz</bar></foo>').foo.bar.InnerText = 'new'
System.Xml.XmlDocument
is required, which is significantly more complex. See this answer for an example.The downside of dot notation is that there can be name collisions, if an incidental input-XML element name happens to be the same as either an intrinsic [System.Xml.XmlElement]
property name (for single-element properties), or an intrinsic [Array]
property name (for array-valued properties; [System.Object[]]
derives from [Array]
).
In the event of a name collision: If the property being accessed contains:
a single child element ([System.Xml.XmlElement]
), the incidental properties win; e.g.:
# -> 'foo': i.e .the <foo> element's own name, using
# XmlElement's type-native .Name property.
([xml] '<foo><Bar>bar</Bar></foo>').foo.Name
# -> !! 'bar': That is, the *adapted* .Name property - i.e. the
# !! child element whose name happened to be "Name" takes precedences.
([xml] '<foo><Name>bar</Name></foo>').foo.Name
The workaround to get predictable access to the type-native properties is to call the underlying property accessor method, .get_<propertyName>()
, directly:
# -> 'foo', thanks to .get_Name() workaround
([xml] '<foo><Name>bar</Name></foo>').foo.get_Name()
An alternative is to use the intrinsic psbase
property:
# -> 'foo', thanks to .psbase workaround
([xml] '<foo><Name>bar</Name></foo>').foo.psbase.Name
an array of child elements, the [Array]
type's properties win.
Therefore, the following element names break dot notation with array-valued properties (obtained with reflection command
Get-Member -InputObject 1, 2 -Type Properties, ParameterizedProperty
):
Item Count IsFixedSize IsReadOnly IsSynchronized Length LongLenth Rank SyncRoot
For example, trying to use member-access enumeration to get all item
attribute values across all <bar>
elements:
# !! Outputs the definition of the parameterized .Item property
# !! of type [Array],
# !! *not* the values of the "Item" attributes of the <bar> child elements.
([xml] '<foo><bar item="one" /><bar item="two" /></foo>').foo.bar.item
The workaround is to use explicit enumeration of array-valued properties, e.g. via the intrinsic .ForEach()
method:
# -> 'one', 'two'
([xml] '<foo><bar item="one" /><bar item="two" /></foo>').foo.bar.ForEach('item')
Dot nation is invariably case-insensitive - unlike XML itself, which is an inevitable consequence of representing elements and attribute as properties, given that property access in PowerShell is generally case-insensitive:
# Dot notation: case-INSENSITIVE
# -> 'bar', despite the case mismatch
([xml] '<FOO>bar</FOO>').foo
# -> 'BAR', 'bar', i.e. *both* elements that match case-insensitively
([xml] '<root><FOO>BAR</FOO><foo>bar</foo></root>').root.foo
# Type-native XML method: case-SENSITIVE
# -> NO output, due to the case mismatch
([xml] '<FOO>bar</FOO>').SelectSingleNode('foo')
Dot notation ignores XML namespaces - unlike XML-native functionality:
# Dot notation: Namespaces are ignored.
# -> the <foo> element (as a whole, because it has attributes)
# despite not specifying the namespace.
([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>').foo
# Type-native XML methods: Explicit namespace handling required:
# -> No output
([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>')['foo']
# -> OK - explicit use of the namespace prefix;
([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>')['ns1:foo']
# -> No output; see below.
([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>').SelectSingleNode('foo')
.SelectSingleNode()
/ .SelectNodes()
methods, you need not only the use of namespace prefixes, but you first need to create a namespace manager that maps the prefixes used in queries to their namespace URIs - see this answer for an example of this technique.Select-Xml
cmdlet, which uses XPath queries too - see this answer for an example.[1] If a given element has both an attribute and and element by the same name, PowerShell reports both, as the elements of an array [object[]]
.