Search code examples
jsonxmlpowershell

How to convert XML into a PsCustomObject to allow final export as JSON?


I am looking for a Powershell function to convert XML into a PsCustomObject which can finally exported as JSON. For this I created this small XML Test object:

[xml]$Xml = @"
<Action name="Test" id="1">
    <Text>sample</Text>
    <sub name="s1" id="2" /> 
    <sub name="s2" id="3" />
    <end details="no" />
</Action>
"@

This gives my an XML DocumentElement which I finally need to convert into the same object like the one from this call:

$Json = convertfrom-json @"
{
    "Action": {
        "name": "Test", "id": "1", "Text": "sample",
        "sub": [
            {"name": "s1","id": "2"},
            {"name": "s2","id": "3"}
        ],
        "End": {"details": "no"}
    }
}
"@

Is there any smart way to get this done? I tested multiple functions from similar questions here but nothing really works as expected.


Solution

  • EDIT: There is a very succinct solution using PowerShell 7 or newer, scroll down to bottom of this answer.

    Original answer

    Because of the ambiguities, there is no standard way of converting XML to JSON. So you really have to roll your own function that interprets the XML in the way that matches your desired output.

    Here is a generic solution:

    Function ConvertFrom-MyXml( [xml.XmlNode] $node ) {
    
        # Create an ordered hashtable
        $ht = [ordered] @{}
    
        # Copy the XML attributes to the hashtable
        $node.Attributes.ForEach{ $ht[ $_.Name ] = $_.Value }
    
        $node.ChildNodes.ForEach{
            if( $_.FirstChild -is [xml.XmlText] ) {
                # Add content of XML text node
                Add-DictionaryArrayItem -Dict $ht -Key $_.LocalName -Value $_.FirstChild.InnerText
            }
            elseif( $_ -is [xml.XmlElement] ) {
                # Add nested hashtable for the XML child elements (recursion)
                Add-DictionaryArrayItem -Dict $ht -Key $_.LocalName -Value (ConvertFrom-MyXml $_)
            }
        }
    
        $ht  # Output
    }
    
    Function Add-DictionaryArrayItem( $Dict, $Key, $Value ) {
    
        if( $Dict.Contains( $Key ) ) {
            $curValue = $Dict[ $Key ]
            # If existing value is not already a list...
            if( $curValue -isnot [Collections.Generic.List[object]] ) {
                # ...turn it into a list.
                $curValue = [Collections.Generic.List[object]] @($curValue)
                $Dict[ $Key ] = $curValue
            }
            # Add next value to the array. This updates the array in the hashtable, 
            # because $curValue is a reference.
            $curValue.Add( $Value )
        }
        else {
            # Key doesn't exist in the hashtable yet, so simply add it.
            $Dict[ $Key ] = $Value
        }
    }
    
    [xml]$Xml = @"
    <Action name="Test" id="1">
        <Text>sample</Text>
        <sub name="s1" id="2" /> 
        <sub name="s2" id="3" />
        <end details="no" />
    </Action>
    "@
    
    ConvertFrom-MyXml $Xml | ConvertTo-Json -Depth 100
    

    Output:

    {
        "Action":  {
           "name":  "Test",
           "id":  "1",
           "Text":  "sample",
           "sub":  [
               {
                   "name":  "s1",
                   "id":  "2"    
               },
               {
                   "name":  "s2",
                   "id":  "3"    
               }
           ],
           "end":  {
               "details":  "no"  
           }
       }
    }
    
    • Function ConvertFrom-MyXml outputs an ordered hashtable. There is no need to convert to PSCustomObject as ConvertFrom-Json works with hashtables as well. So we can keep the code simpler.
    • ConvertFrom-MyXml loops over attributes and elements (recursively) of the given XML node. It calls the helper function Add-DictionaryArrayItem to create an array if a key already exists in the hashtable. Actually this is not a raw, fixed-size array (like @(1,2,3) creates), but a dynamically resizable List, which behaves very similar to an array but is much more efficient when adding many elements.
    • Note that a single sub element won't be turned into an array. If some elements should always be converted to arrays, you'd have to pass some kind of schema to the function (e. g. a list of element names) or add metadata to the XML itself.

    As suggested by OP, here is an alternative version of the code, that consists of only a single function:

    Function ConvertFrom-MyXml( [xml.XmlNode] $node ) {
    
        $ht = [ordered] @{}
    
        $node.Attributes.ForEach{ $ht[ $_.Name ] = $_.Value }
    
        foreach( $child in $node.ChildNodes ) {
            $key = $child.LocalName
    
            $value = if( $child.FirstChild -is [xml.XmlText] ) {
                $child.FirstChild.InnerText
            } elseif( $child -is [xml.XmlElement] ) {
                ConvertFrom-MyXml $child
            } else {
                continue
            }
    
            if( $ht.Contains( $Key ) ) {
                $curValue = $ht[ $Key ]
                if( $curValue -isnot [Collections.Generic.List[object]] ) {
                    $curValue = [Collections.Generic.List[object]] @($curValue)
                    $ht[ $Key ] = $curValue
                }
                $curValue.Add( $Value )
            }
            else {
                $ht[ $Key ] = $Value
            }
        }
    
        $ht  # Output
    }
    

    Succinct solution for PowerShell 7+

    This makes use of Newtonsoft.Json.JsonConvert.

    $xml = [xml]@"
    <Action name="Test" id="1">
        <Text>sample</Text>
        <sub name="s1" id="2" /> 
        <sub name="s2" id="3" />
        <end details="no" />
    </Action>
    "@
    
    # Convert XML to JSON
    $json = [Newtonsoft.Json.JsonConvert]::SerializeXmlNode($xml, 'indent')
    

    This outputs:

    {
      "Action": {
        "@name": "Test",  
        "@id": "1",       
        "Text": "sample", 
        "sub": [
          {
            "@name": "s1",
            "@id": "2"    
          },
          {
            "@name": "s2",
            "@id": "3"    
          }
        ],
        "end": {
          "@details": "no"
        }
      }
    }
    

    It is relatively easy to get rid of the @ prefix of the XML attributes. Though this may cause collisions of XML attribute and element names, potentially making the JSON invalid:

    $json = $json -replace '"@([^"\\]+)":', '"$1":'
    

    Normally I'm strongly against using RegEx with serialized forms of complex data, as it is very hard to do safely. In the case of converting XML to JSON, the above RegEx should be pretty safe, because XML attribute names are not allowed to contain double quotation marks nor backslashes (which would cause the .NET XML parser to fail). If anyone proves me wrong, please drop a comment.