I am looking for a Powershell function to convert XML into a PsCustomObject which can finally exported as JSON. For this I created this small XML Test object:
[xml]$Xml = @"
<Action name="Test" id="1">
<Text>sample</Text>
<sub name="s1" id="2" />
<sub name="s2" id="3" />
<end details="no" />
</Action>
"@
This gives my an XML DocumentElement which I finally need to convert into the same object like the one from this call:
$Json = convertfrom-json @"
{
"Action": {
"name": "Test", "id": "1", "Text": "sample",
"sub": [
{"name": "s1","id": "2"},
{"name": "s2","id": "3"}
],
"End": {"details": "no"}
}
}
"@
Is there any smart way to get this done? I tested multiple functions from similar questions here but nothing really works as expected.
EDIT: There is a very succinct solution using PowerShell 7 or newer, scroll down to bottom of this answer.
Because of the ambiguities, there is no standard way of converting XML to JSON. So you really have to roll your own function that interprets the XML in the way that matches your desired output.
Here is a generic solution:
Function ConvertFrom-MyXml( [xml.XmlNode] $node ) {
# Create an ordered hashtable
$ht = [ordered] @{}
# Copy the XML attributes to the hashtable
$node.Attributes.ForEach{ $ht[ $_.Name ] = $_.Value }
$node.ChildNodes.ForEach{
if( $_.FirstChild -is [xml.XmlText] ) {
# Add content of XML text node
Add-DictionaryArrayItem -Dict $ht -Key $_.LocalName -Value $_.FirstChild.InnerText
}
elseif( $_ -is [xml.XmlElement] ) {
# Add nested hashtable for the XML child elements (recursion)
Add-DictionaryArrayItem -Dict $ht -Key $_.LocalName -Value (ConvertFrom-MyXml $_)
}
}
$ht # Output
}
Function Add-DictionaryArrayItem( $Dict, $Key, $Value ) {
if( $Dict.Contains( $Key ) ) {
$curValue = $Dict[ $Key ]
# If existing value is not already a list...
if( $curValue -isnot [Collections.Generic.List[object]] ) {
# ...turn it into a list.
$curValue = [Collections.Generic.List[object]] @($curValue)
$Dict[ $Key ] = $curValue
}
# Add next value to the array. This updates the array in the hashtable,
# because $curValue is a reference.
$curValue.Add( $Value )
}
else {
# Key doesn't exist in the hashtable yet, so simply add it.
$Dict[ $Key ] = $Value
}
}
[xml]$Xml = @"
<Action name="Test" id="1">
<Text>sample</Text>
<sub name="s1" id="2" />
<sub name="s2" id="3" />
<end details="no" />
</Action>
"@
ConvertFrom-MyXml $Xml | ConvertTo-Json -Depth 100
Output:
{
"Action": {
"name": "Test",
"id": "1",
"Text": "sample",
"sub": [
{
"name": "s1",
"id": "2"
},
{
"name": "s2",
"id": "3"
}
],
"end": {
"details": "no"
}
}
}
ConvertFrom-MyXml
outputs an ordered hashtable. There is no need to convert to PSCustomObject
as ConvertFrom-Json
works with hashtables as well. So we can keep the code simpler.ConvertFrom-MyXml
loops over attributes and elements (recursively) of the given XML node. It calls the helper function Add-DictionaryArrayItem
to create an array if a key already exists in the hashtable. Actually this is not a raw, fixed-size array (like @(1,2,3)
creates), but a dynamically resizable List
, which behaves very similar to an array but is much more efficient when adding many elements.sub
element won't be turned into an array. If some elements should always be converted to arrays, you'd have to pass some kind of schema to the function (e. g. a list of element names) or add metadata to the XML itself.As suggested by OP, here is an alternative version of the code, that consists of only a single function:
Function ConvertFrom-MyXml( [xml.XmlNode] $node ) {
$ht = [ordered] @{}
$node.Attributes.ForEach{ $ht[ $_.Name ] = $_.Value }
foreach( $child in $node.ChildNodes ) {
$key = $child.LocalName
$value = if( $child.FirstChild -is [xml.XmlText] ) {
$child.FirstChild.InnerText
} elseif( $child -is [xml.XmlElement] ) {
ConvertFrom-MyXml $child
} else {
continue
}
if( $ht.Contains( $Key ) ) {
$curValue = $ht[ $Key ]
if( $curValue -isnot [Collections.Generic.List[object]] ) {
$curValue = [Collections.Generic.List[object]] @($curValue)
$ht[ $Key ] = $curValue
}
$curValue.Add( $Value )
}
else {
$ht[ $Key ] = $Value
}
}
$ht # Output
}
This makes use of Newtonsoft.Json.JsonConvert
.
$xml = [xml]@"
<Action name="Test" id="1">
<Text>sample</Text>
<sub name="s1" id="2" />
<sub name="s2" id="3" />
<end details="no" />
</Action>
"@
# Convert XML to JSON
$json = [Newtonsoft.Json.JsonConvert]::SerializeXmlNode($xml, 'indent')
This outputs:
{
"Action": {
"@name": "Test",
"@id": "1",
"Text": "sample",
"sub": [
{
"@name": "s1",
"@id": "2"
},
{
"@name": "s2",
"@id": "3"
}
],
"end": {
"@details": "no"
}
}
}
It is relatively easy to get rid of the @
prefix of the XML attributes. Though this may cause collisions of XML attribute and element names, potentially making the JSON invalid:
$json = $json -replace '"@([^"\\]+)":', '"$1":'
Normally I'm strongly against using RegEx with serialized forms of complex data, as it is very hard to do safely. In the case of converting XML to JSON, the above RegEx should be pretty safe, because XML attribute names are not allowed to contain double quotation marks nor backslashes (which would cause the .NET XML parser to fail). If anyone proves me wrong, please drop a comment.