Search code examples
xmlpowershellxpathxqueryselect-xml

How to load or read an XML file using ConvertTo-Xml and Select-Xml?


How can I accomplish something like this:

PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $date=(Get-Date | ConvertTo-Xml)                                         
PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $date

xml                            Objects
---                            -------
version="1.0" encoding="utf-8" Objects

PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $date.OuterXml
<?xml version="1.0" encoding="utf-8"?><Objects><Object Type="System.DateTime">12/12/2020 2:43:46 AM</Object></Objects>
PS /home/nicholas/powershell> 

but, instead, reading in a file?


how do I load/import/read/convert an xml file using ConvertTo-Xml for parsing with Select-Xml using Xpath?

PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $xml=ConvertTo-Xml ./bookstore.xml
PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $xml                              

xml                            Objects
---                            -------
version="1.0" encoding="utf-8" Objects

PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $xml.InnerXml                     
<?xml version="1.0" encoding="utf-8"?><Objects><Object Type="System.String">./bookstore.xml</Object></Objects>
PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> $xml.OuterXml                     
<?xml version="1.0" encoding="utf-8"?><Objects><Object Type="System.String">./bookstore.xml</Object></Objects>
PS /home/nicholas/powershell> 
PS /home/nicholas/powershell> cat ./bookstore.xml

<?xml version="1.0"?>
<!-- A fragment of a book store inventory database -->
<bookstore xmlns:bk="urn:samples">
  <book genre="novel" publicationdate="1997" bk:ISBN="1-861001-57-8">
    <title>Pride And Prejudice</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>24.95</price>
  </book>
  <book genre="novel" publicationdate="1992" bk:ISBN="1-861002-30-1">
    <title>The Handmaid's Tale</title>
    <author>
      <first-name>Margaret</first-name>
      <last-name>Atwood</last-name>
    </author>
    <price>29.95</price>
  </book>
  <book genre="novel" publicationdate="1991" bk:ISBN="1-861001-57-6">
    <title>Emma</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>19.95</price>
  </book>
  <book genre="novel" publicationdate="1982" bk:ISBN="1-861001-45-3">
    <title>Sense and Sensibility</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>19.95</price>
  </book>
</bookstore>

PS /home/nicholas/powershell> 

Creating the xml file within the REPL console itself works as expected:

How to parse XML in Powershell with Select-Xml and Xpath?


Solution

  • Properly reading an XML document in Powershell works like this:

    $doc = New-Object xml
    $doc.Load( (Convert-Path bookstore.xml) )
    

    XML can come in numerous file encodings, and using the XmlDocument.Load method makes sure the file is read properly without prior knowledge of the encoding.

    Not reading a file with the correct encoding will result in mangled data or errors except in very basic or very lucky cases.

    The often-seen method of using Get-Content and casting the resulting string to [xml] is the wrong way of dealing with XML for this very reason. So don't do that.

    You can get a correct result with Get-Content, but that requires

    1. Prior knowledge of the file encoding (e.g. Get-Content bookstore.xml -Encoding UTF8)
    2. Hard-coding the file encoding into your script (meaning it will break if the XML encoding ever changes unexpectedly)
    3. Limiting yourself to the very few file encodings that Get-Content supports (XML supports more)

    It means you put yourself in a position where you have to manually think about and solve a problem that XML has been specifically designed to automatically handle for you.

    Doing things correctly with Get-Content is a lot of unnecessary extra work and limitations. And doing things incorrectly is pointless when doing it right is so easy.


    Examples, after loading $doc like shown above.

    $doc.bookstore.book
    

    prints a list of <book> elements and their properties

    genre           : novel
    publicationdate : 1997
    ISBN            : 1-861001-57-8
    title           : Pride And Prejudice
    author          : author
    price           : 24.95
    
    genre           : novel
    publicationdate : 1992
    ISBN            : 1-861002-30-1
    title           : The Handmaid's Tale
    author          : author
    price           : 29.95
    
    genre           : novel
    publicationdate : 1991
    ISBN            : 1-861001-57-6
    title           : Emma
    author          : author
    price           : 19.95
    
    genre           : novel
    publicationdate : 1982
    ISBN            : 1-861001-45-3
    title           : Sense and Sensibility
    author          : author
    price           : 19.95
    

    $doc.bookstore.book | Format-Table
    

    prints the same thing as a table

    genre publicationdate ISBN          title                 author price
    ----- --------------- ----          -----                 ------ -----
    novel 1997            1-861001-57-8 Pride And Prejudice   author 24.95
    novel 1992            1-861002-30-1 The Handmaid's Tale   author 29.95
    novel 1991            1-861001-57-6 Emma                  author 19.95
    novel 1982            1-861001-45-3 Sense and Sensibility author 19.95
    

    $doc.bookstore.book | Where-Object publicationdate -lt 1992 | Format-Table
    

    filters the data

    genre publicationdate ISBN          title                 author price
    ----- --------------- ----          -----                 ------ -----
    novel 1991            1-861001-57-6 Emma                  author 19.95
    novel 1982            1-861001-45-3 Sense and Sensibility author 19.95
    

    $doc.bookstore.book | Where-Object publicationdate -lt 1992 | Sort publicationdate | select title
    

    sorts and prints only the <title> field

    title                
    -----                
    Sense and Sensibility
    Emma
    

    There are many more ways of slicing and dicing the data, it all depends on what you want to do.