Search code examples
xmlpowershellcsvparsingxml-parsing

Parsing .XML to .CSV Writing Nodes to Columns


I'm at my wits' end here. Try to parse a large xml file with blocks into a .csv file. Each child node found should have its own columns. Not exactly sure why I am unable to have the inner text be written to the .csv file.

Issue: The code does not appear to write any of the objects into the .csv. All I see are the headers but no data despite the fact the input file has over 100 blocks in it.

The Code

# Define the directory containing the XML files
$directory = "c:\batch1\test"

# Get all XML files in the directory
$xmlFiles = Get-ChildItem -Path $directory -Filter *.xml

foreach ($xmlFile in $xmlFiles) {
    # Load the XML content
    [xml]$xmlContent = Get-Content -Path $xmlFile.FullName

    # Extract all AuditRecord nodes
    $auditRecords = $xmlContent.SelectNodes("//AuditRecord")

    # Create a list to store the data
    $data = @()

    foreach ($record in $auditRecords) {
        # Create a hashtable to store the record data
        $recordData = @{}
        foreach ($node in $record.ChildNodes) {
            $recordData[$node.Name] = $node.InnerText
        }
        $data += [pscustomobject]$recordData
    }

    # Define the output CSV file name
    $csvFileName = [System.IO.Path]::ChangeExtension($xmlFile.FullName, "csv")

    # Export the data to a CSV file
    $data | Export-Csv -Path $csvFileName -NoTypeInformation
}

Sample XML Block within the large .xml files

<AuditRecord AuditRecordID="ca7a433e-b45f-4ed1-8169-5fc70547b35e">
    <PIUser UserID="1" Name="theadmin"/>
    <PITime UTCSeconds="1708987340" LocalDate="2024-02-26T17:42:20-05:00"/>
    <PIConfigurationDB>
        <PIModules Action="Edit">
            <PIModule UniqueID="7c03e74e-ad46-4ada-8234-50c7cd2699f0, 31-Dec-69 16:00:01" Name="MDB-AFMigrationData">
                <PIModuleAttributes>
                    <PIModuleAttribute Name="ModifyDate">
                        <Value>
                            <Before Type="xs:dateTime">2024-02-26T17:42:05-05:00</Before>
                            <After Type="xs:dateTime">2024-02-26T17:42:20-05:00</After>
                        </Value>
                    </PIModuleAttribute>
                    <PIModuleAttribute Name="Revision">
                        <Value>
                            <Before Type="xs:long">222325</Before>
                            <After Type="xs:long">222326</After>
                        </Value>
                    </PIModuleAttribute>
                </PIModuleAttributes>
                <PIProperties>
                    <PIProperty Name="AFCST" Action="Edit" ParentUNC_Name="\\PIProperties">
                        <Value>
                            <Before Type="xs:double">1708987319.220131</Before>
                            <After Type="xs:double">1708987325.796927</After>
                        </Value>
                    </PIProperty>
                </PIProperties>
            </PIModule>
        </PIModules>
    </PIConfigurationDB>
</AuditRecord>

Desired output (Example)

userid   name     utcseconds  localdate                  action
-----    ----     ----------  ---------                  ------
1        theadmin 1722272177  2024-07-29T12:56:17-04:00  edit... .etc.. etc.. etc

Solution

  • For huge xml files you need to use XmlReader. Here is code that uses Xml Linq with XmlReader. Code only does one file. You need to add another for loop to loop through all the files.

    using assembly System.Xml
    using assembly System.Xml.Linq
    
    $xmlFilename = 'c:\temp\test.xml'
    $csvFilename = 'c:\temp\test.csv'
    
    $table = [System.Collections.Generic.List[pscustomobject]]::new()
    $reader = [System.Xml.XmlReader]::Create($xmlFilename)
    While(-not $reader.EOF)
    {
       if($reader.Name -ne 'AuditRecord')
       {
          $reader.ReadToFollowing('AuditRecord') | out-null;
       }
       if(-not $reader.EOF)
       {
          $auditRecord = [System.Xml.Linq.XElement][System.Xml.Linq.XElement]::ReadFrom($reader);
          $PIUser = $auditRecord.Element('PIUser')
          $UserID = $PIUser.Attribute('UserID').Value
          $Name = $PIUser.Attribute('Name').Value
    
          $PITime = $auditRecord.Element('PITime')
          $UTCSeconds = $PITime.Attribute('UTCSeconds').Value
          $LocalDate = $PITime.Attribute('LocalDate').Value
    
          $PIModules = $auditRecord.Descendants('PIModules')[0]
          $Action = $PIModules.Attribute('Action').Value
    
    
          foreach($PIModule in $auditRecord.Descendants('PIModule'))
          {
             $UniqueID = $PIModule.Attribute('UniqueID').Value
             $PIModuleName = $PIModule.Attribute('Name').Value
             foreach($PIModuleAttribute in $PIModule.Descendants('PIModuleAttribute'))
             {
                $Before = $PIModuleAttribute.Descendants('Before')[0].Value
                $After = $PIModuleAttribute.Descendants('After')[0].Value
    
                $newRow = [pscustomobject]@{
                   UserID = $UserID
                   Name = $Name 
                   UTCSeconds = $UTCSeconds
                   LocalDate = $LocalDate 
                   Action= $Action
                   UniqueID = $UniqueID 
                   'PIModule Name' = $PIModuleName  
                   Before = $Before 
                   After = $After  
                }
                $table.Add($newRow) | out-null
             }
          }
        }
    }
    $table | Export-Csv -Path $csvFilename -NoTypeInformation