Search code examples
.netxmlpowershell

In PowerShell, XML children and attributes are object properties. How does it work?


In PowerShell, child nodes and attributes of XML elements are accessible as properties:

([xml]"<book genre='novel'/>").Book.genre
novel

It is a Property, not a NoteProperty:

([xml]"<book genre='novel'/>").Book | Get-Member genre
   TypeName: System.Xml.XmlElement

Name  MemberType Definition
----  ---------- ----------
genre Property   string genre {get;set;}

The documentation says:

Attributes are properties of the element, ...

Although, in C#, there are no such properties:

XmlDocument doc = new XmlDocument();
doc.LoadXml("<book genre='novel'/>");
// Console.WriteLine(doc.Book);  // Does not work
// Console.WriteLine(doc.DocumentElement.genre);  // Does not work
'XmlDocument' does not contain a definition for ...

Is it a PowerShell feature? Or is it a .NET feature I am using incorrectly?

What is the mechanism used to implement these properties?


Solution

  • tl;dr

    • You're seeing PowerShell's adaptation of the XML DOM (.NET type System.Xml.XmlDocument, available via type accelerator [xml] in PowerShell), which presents XML child elements and attributes as properties, allowing for convenient access to them via dot notation.

    • To see only the .NET type-native members of a given type's instance, pass -View Base to Get-Member; e.g.:

      # -View Base excludes the *adapted* and possibly other ETS properties.
      [xml] '<foo/>' | Get-Member -View Base
      

    Background information:

    PowerShell decorates the object hierarchy contained in System.Xml.XmlDocument instances (created with cast [xml], for instance):

    • with properties named for the input document's specific elements and attributes[1] at every level; e.g.:

       ([xml] '<foo><bar>baz</bar></foo>').foo.bar # -> 'baz'
       ([xml] '<foo><bar id="1" /></foo>').foo.bar.id # -> '1'
      
    • turning multiple elements of the same name at a given hierarchy level implicitly into arrays (specifically, of type [object[]]); e.g.:

       ([xml] '<foo><C>one</C><C>two</C></foo>').foo.C[1] # -> 'two'
      

    As the examples (and your own code in the question) show, this allows for access via convenient dot notation.

    Note:

    • If you use dot notation to target an element that has at least one attribute and/or child elements, the element itself is returned (an XmlElement instance); otherwise, it is the element's text content; e.g.:

      # The <bar> element's *text content* is returned, as a [string] ('baz'),
      # because it has only a text child node and no attributes
      ([xml] '<foo><bar>baz</bar></foo>').foo.bar
      
      # The <bar> element is returned as an XmlElement instance,
      # because it has an *attribute*.
      ([xml] '<foo><bar id="1">baz</bar></foo>').foo.bar
      
      # The <bar> element is returned as an XmlElement instance,
      # because it has *child elements*.
      ([xml] '<foo><bar><baz>quux</baz></bar></foo>').foo.bar
      
    • Updating XML documents via dot notation is limited to simple, non-structural changes; the difference above comes into play:

      # OK - direct updating of the text content of a simple
      # element (no child nodes, no attributes).
      # $xml.foo.bar then yields 'new'
      ($xml = [xml] '<foo><bar>baz</bar></foo>').foo.bar = 'new'
      
      # OK - direct updating of the attribute of an
      # element.
      #  $xml.foo.bar.id then yields '2'
      ($xml = [xml] '<foo><bar id="1">baz</bar></foo>').foo.bar.id = 2
      
      # !! FAILS - because <bar> isn't a simple element in this case,
      # !! due to the presence of an *attribute*, you cannot directly assign new text content.
      # !! -> Error "Cannot set "bar" because only strings can be used as values to set XmlNode properties."
      ($xml = [xml] '<foo><bar id="1">baz</bar></foo>').foo.bar = 'new'
      
      # OK - assign to the type-native .InnerText property
      #  $xml.foo.bar.InnerText then yields 'new'
      ($xml = [xml] '<foo><bar id="1">baz</bar></foo>').foo.bar.InnerText = 'new'
      

    The downside of dot notation is that there can be name collisions, if an incidental input-XML element name happens to be the same as either an intrinsic [System.Xml.XmlElement] property name (for single-element properties), or an intrinsic [Array] property name (for array-valued properties; [System.Object[]] derives from [Array]).

    In the event of a name collision: If the property being accessed contains:

    • a single child element ([System.Xml.XmlElement]), the incidental properties win; e.g.:

      # -> 'foo': i.e .the <foo> element's own name, using 
      # XmlElement's type-native .Name property.
      ([xml] '<foo><Bar>bar</Bar></foo>').foo.Name
      
      # -> !! 'bar': That is, the *adapted* .Name property - i.e. the 
      #    !! child element whose name happened to be "Name" takes precedences.
      ([xml] '<foo><Name>bar</Name></foo>').foo.Name
      
      • The workaround to get predictable access to the type-native properties is to call the underlying property accessor method, .get_<propertyName>(), directly:

        # -> 'foo', thanks to .get_Name() workaround
        ([xml] '<foo><Name>bar</Name></foo>').foo.get_Name()
        
        • An alternative is to use the intrinsic psbase property:

          # -> 'foo', thanks to .psbase workaround
          ([xml] '<foo><Name>bar</Name></foo>').foo.psbase.Name
          
    • an array of child elements, the [Array] type's properties win.

      • Therefore, the following element names break dot notation with array-valued properties (obtained with reflection command
        Get-Member -InputObject 1, 2 -Type Properties, ParameterizedProperty):

        Item Count IsFixedSize IsReadOnly IsSynchronized Length LongLenth Rank SyncRoot
        
        • For example, trying to use member-access enumeration to get all item attribute values across all <bar> elements:

          # !! Outputs the definition of the parameterized .Item property
          # !! of type [Array], 
          # !! *not* the values of the "Item" attributes of the <bar> child elements.
          ([xml] '<foo><bar item="one" /><bar item="two" /></foo>').foo.bar.item
          
        • The workaround is to use explicit enumeration of array-valued properties, e.g. via the intrinsic .ForEach() method:

          # -> 'one', 'two'
          ([xml] '<foo><bar item="one" /><bar item="two" /></foo>').foo.bar.ForEach('item')
          

    Dot nation is invariably case-insensitive - unlike XML itself, which is an inevitable consequence of representing elements and attribute as properties, given that property access in PowerShell is generally case-insensitive:

     # Dot notation: case-INSENSITIVE
     # -> 'bar', despite the case mismatch
     ([xml] '<FOO>bar</FOO>').foo
     # -> 'BAR', 'bar', i.e. *both* elements that match case-insensitively
     ([xml] '<root><FOO>BAR</FOO><foo>bar</foo></root>').root.foo
    
     # Type-native XML method: case-SENSITIVE
     # -> NO output, due to the case mismatch
     ([xml] '<FOO>bar</FOO>').SelectSingleNode('foo')
    

    Dot notation ignores XML namespaces - unlike XML-native functionality:

     # Dot notation: Namespaces are ignored.
     # -> the <foo> element (as a whole, because it has attributes)
     #    despite not specifying the namespace.
     ([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>').foo
    
     # Type-native XML methods: Explicit namespace handling required:
     # -> No output
     ([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>')['foo']
    
     # -> OK - explicit use of the namespace prefix;
     ([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>')['ns1:foo']
    
     # -> No output; see below.
     ([xml] '<ns1:foo xmlns:ns1="https://example.org">bar</ns1:foo>').SelectSingleNode('foo')
    
    • For XPath queries with the type-native .SelectSingleNode() / .SelectNodes() methods, you need not only the use of namespace prefixes, but you first need to create a namespace manager that maps the prefixes used in queries to their namespace URIs - see this answer for an example of this technique.
      The same applies analogously to the Select-Xml cmdlet, which uses XPath queries too - see this answer for an example.

    [1] If a given element has both an attribute and and element by the same name, PowerShell reports both, as the elements of an array [object[]].