Search code examples
xmlpowershellcsvexport-to-csvdata-manipulation

How to read XML and write CSV with Powershell?


How is books.xml converted to CSV and written to a file?

PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books = Import-Clixml books.xml 
PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books.InnerXml                  
<?xml version="1.0"?><catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book><book id="bk103"><author>Corets, Eva</author><title>Maeve Ascendant</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-11-17</publish_date><description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description></book><book id="bk104"><author>Corets, Eva</author><title>Oberon's Legacy</title><genre>Fantasy</genre><price>5.95</price><publish_date>2001-03-10</publish_date><description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description></book><book id="bk105"><author>Corets, Eva</author><title>The Sundered Grail</title><genre>Fantasy</genre><price>5.95</price><publish_date>2001-09-10</publish_date><description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description></book><book id="bk106"><author>Randall, Cynthia</author><title>Lover Birds</title><genre>Romance</genre><price>4.95</price><publish_date>2000-09-02</publish_date><description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description></book><book id="bk107"><author>Thurman, Paula</author><title>Splish Splash</title><genre>Romance</genre><price>4.95</price><publish_date>2000-11-02</publish_date><description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description></book><book id="bk108"><author>Knorr, Stefan</author><title>Creepy Crawlies</title><genre>Horror</genre><price>4.95</price><publish_date>2000-12-06</publish_date><description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description></book><book id="bk109"><author>Kress, Peter</author><title>Paradox Lost</title><genre>Science Fiction</genre><price>6.95</price><publish_date>2000-11-02</publish_date><description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description></book><book id="bk110"><author>O'Brien, Tim</author><title>Microsoft .NET: The Programming Bible</title><genre>Computer</genre><price>36.95</price><publish_date>2000-12-09</publish_date><description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description></book><book id="bk111"><author>O'Brien, Tim</author><title>MSXML3: A Comprehensive Guide</title><genre>Computer</genre><price>36.95</price><publish_date>2000-12-01</publish_date><description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description></book><book id="bk112"><author>Galos, Mike</author><title>Visual Studio 7: A Comprehensive Guide</title><genre>Computer</genre><price>49.95</price><publish_date>2001-04-16</publish_date><description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description></book></catalog>
PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books | Export-Csv books.csv      
PS /home/nicholas/xml> 
PS /home/nicholas/xml> cat ./books.csv
"xml","catalog"
"version=""1.0""","System.Xml.XmlElement"
PS /home/nicholas/xml> 

It would seem that $csv is converted, but how is it written as a CSV file?

Converting to JSON didn't go much better:

PS /home/nicholas/xml> 
PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books | ConvertTo-Json          
WARNING: Resulting JSON is truncated as serialization has exceeded the set depth of 2.
[
  [],
  [
    [
      "System.Xml.XmlElement",
      "System.Xml.XmlElement",
      "System.Xml.XmlElement",

..

even with a depth of 9 or greater.

Here's the output of $books:

PS /home/nicholas/xml> 
PS /home/nicholas/xml> $books

xml           catalog
---           -------
version="1.0" catalog

PS /home/nicholas/xml> $books.ChildNodes

Version         : 1.0
Encoding        : 
Standalone      : 
Value           : version="1.0"
InnerText       : version="1.0"
Name            : xml
LocalName       : xml
NodeType        : XmlDeclaration
PreviousSibling : 
NextSibling     : catalog
ParentNode      : #document
ChildNodes      : {}
Attributes      : 
OwnerDocument   : #document
FirstChild      : 
LastChild       : 
HasChildNodes   : False
NamespaceURI    : 
Prefix          : 
IsReadOnly      : False
OuterXml        : <?xml version="1.0"?>
InnerXml        : 
SchemaInfo      : System.Xml.Schema.XmlSchemaInfo
BaseURI         : 
PreviousText    : 

book : {book, book, book, book…}

PS /home/nicholas/xml>

which is far from CSV.


Solution

  • Import-Clixml is not meant for importing arbitrary XML documents; it is solely designed to work with XML documents that contain CLIXML data, which is an XML-based serialization format for representing instances of .NET types for cross-process communication, such as produced by Export-Clixml. CLIXML is typically used behind the scenes in PowerShell remoting (and, more generally, cross-process communication).

    Perhaps surprisingly, there is no general-purpose cmdlet for importing arbitrary XML documents (however, there is a cmdlet for querying and extracting data from XML documents, namely Select-Xml).

    You have to work with the [xml] type (System.Xml.XmlDocument) to import XML data into a DOM, which then allows you to transform the data to a different format:

    # Load and parse the XML file into a DOM.
    ($xmlDoc = [xml]::new()).Load((Convert-Path books.xml))
    
    # Export the book child elements of the catalog element to a CSV file.
    $xmlDoc.catalog.book | Export-Csv books.csv
    
    # Parse the CSV file into objects ([pscustomobject] instances)
    # and output them to the console with friendly formatting.
    Import-Csv books.csv
    

    Note: In Windows PowerShell, consider using -NoTypeInformation with Export-Csv, as well as the -Encoding parameter to control the output character encoding, which defaults to ASCII(!). In PowerShell (Core) 7+, fortunately, -NoTypeInformation is now implied and the default character encoding is (BOM-less) UTF-8.

    Output:

    id           : bk101
    author       : Gambardella, Matthew
    title        : XML Developer's Guide
    genre        : Computer
    price        : 44.95
    publish_date : 2000-10-01
    description  : An in-depth look at creating applications 
                         with XML.
    
    id           : bk102
    author       : Ralls, Kim
    title        : Midnight Rain
    genre        : Fantasy
    price        : 5.95
    publish_date : 2000-12-16
    description  : A former architect battles corporate zombies, 
                         an evil sorceress, and her own childhood to become queen 
                         of the world.
    
    id           : bk103
    author       : Corets, Eva
    title        : Maeve Ascendant
    genre        : Fantasy
    price        : 5.95
    publish_date : 2000-11-17
    description  : After the collapse of a nanotechnology 
                         society in England, the young survivors lay the 
                         foundation for a new society.
    
    id           : bk104
    author       : Corets, Eva
    title        : Oberon's Legacy
    genre        : Fantasy
    price        : 5.95
    publish_date : 2001-03-10
    description  : In post-apocalypse England, the mysterious 
                         agent known only as Oberon helps to create a new life 
                         for the inhabitants of London. Sequel to Maeve 
                         Ascendant.
    
    id           : bk105
    author       : Corets, Eva
    title        : The Sundered Grail
    genre        : Fantasy
    price        : 5.95
    publish_date : 2001-09-10
    description  : The two daughters of Maeve, half-sisters, 
                         battle one another for control of England. Sequel to 
                         Oberon's Legacy.
    
    id           : bk106
    author       : Randall, Cynthia
    title        : Lover Birds
    genre        : Romance
    price        : 4.95
    publish_date : 2000-09-02
    description  : When Carla meets Paul at an ornithology 
                         conference, tempers fly as feathers get ruffled.
    
    id           : bk107
    author       : Thurman, Paula
    title        : Splish Splash
    genre        : Romance
    price        : 4.95
    publish_date : 2000-11-02
    description  : A deep sea diver finds true love twenty 
                         thousand leagues beneath the sea.
    
    id           : bk108
    author       : Knorr, Stefan
    title        : Creepy Crawlies
    genre        : Horror
    price        : 4.95
    publish_date : 2000-12-06
    description  : An anthology of horror stories about roaches,
                         centipedes, scorpions  and other insects.
    
    id           : bk109
    author       : Kress, Peter
    title        : Paradox Lost
    genre        : Science Fiction
    price        : 6.95
    publish_date : 2000-11-02
    description  : After an inadvertant trip through a Heisenberg
                         Uncertainty Device, James Salway discovers the problems 
                         of being quantum.
    
    id           : bk110
    author       : O'Brien, Tim
    title        : Microsoft .NET: The Programming Bible
    genre        : Computer
    price        : 36.95
    publish_date : 2000-12-09
    description  : Microsoft's .NET initiative is explored in 
                         detail in this deep programmer's reference.
    
    id           : bk111
    author       : O'Brien, Tim
    title        : MSXML3: A Comprehensive Guide
    genre        : Computer
    price        : 36.95
    publish_date : 2000-12-01
    description  : The Microsoft MSXML3 parser is covered in 
                         detail, with attention to XML DOM interfaces, XSLT processing, 
                         SAX and more.
    
    id           : bk112
    author       : Galos, Mike
    title        : Visual Studio 7: A Comprehensive Guide
    genre        : Computer
    price        : 49.95
    publish_date : 2001-04-16
    description  : Microsoft Visual Studio 7 is explored in depth,
                         looking at how Visual Basic, Visual C++, C#, and ASP+ are 
                         integrated into a comprehensive development 
                         environment.