Search code examples
xmlpowershell

How to read an XML file using PowerShell and filter the required data


How to read an XML file and extract data from that file which has large number of tags using PowerShell? I am using the below code to extract tag, but am unable to read data from sub tag.

$xmlFile= "D:\Testing\TestcasesOutput\1ac.xml"
$xmlConfig = [System.Xml.XmlDocument](Get-Content $xmlFile)
$XmlDocument.Breakfast_menu.price

I expect the output to read the whole xml file but unable to read whole xml file.

<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
    <food>
        <food>Belgian Waffles</food>
        <price>$5.95</price>
        <description>Two of our famous Belgian Waffles with plenty of real maple 
        syrup</description>
        <calories>650</calories>
    </food>
    <food>
        <food>Strawberry Belgian Waffles</food>
        <price>$7.95</price>
        <description>Light Belgian waffles covered with strawberries and whipped 
        cream</description>
        <calories>900</calories>
    </food>
    <food>
        <food>Berry-Berry Belgian Waffles</food>
        <price>$8.95</price>
        <description>Light Belgian waffles covered with an assortment of fresh 
        berries and whipped cream</description>
        <calories>900</calories>
    </food>
    <food>
        <food>French Toast</food>
        <price>$4.50</price>
        <description>Thick slices made from our homemade sourdough 
        bread</description>
        <calories>600</calories>
    </food>
    <food>
    <food>Homestyle Breakfast</food>
        <price>$6.95</price>
        <description>Two eggs, bacon or sausage, toast, and our ever-popular hash 
        browns</description>
        <calories>950</calories>
    </food>
</breakfast_menu>

Solution

  • Reading XML with PowerShell is really easy.

    Suppose your xml file looks similar to this:

    <?xml version="1.0" encoding="UTF-8"?> 
    <breakfast_menu> 
        <food> 
            <food>Belgian Waffles</food>
            <price>$5.95</price>
            <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
            <calories>650</calories>
        </food>
        <food> 
            <food>Fried Egg</food>
            <price>$1.80</price>
            <description>blahblah</description>
            <calories>3500</calories>
        </food>
    </breakfast_menu>
    

    You simply read and have PowerShell parse the file into an object, using this

    [xml]$xml = Get-Content 'D:\Testing\TestcasesOutput\1ac.xml'
    

    Next, you can use the properties of this $xml object to get whatever it is you want to extract from it:

    For instance, loop through all <food> items and output the information you want

    $xml.breakfast_menu.food | ForEach-Object {
        [PSCustomObject]@{
            'MenuItem' = $_.food
            'Price'    = $_.price
        }
    }
    

    results in this output:

    MenuItem        Price
    --------        -----
    Belgian Waffles $5.95
    Fried Egg       $1.80
    

    Or select just the one the item for 'Belgian Waffles':

    $xml.breakfast_menu.food | Where-Object { $_.food -eq 'Belgian Waffles' } | 
                               Select-Object @{Name = 'MenuItem'; Expression = {$_.food}}, Price
    

    outputs:

    MenuItem        price
    --------        -----
    Belgian Waffles $5.95
    

    If all you are after is the price for a certain food item, you can do this:

    $xml.breakfast_menu.food | Where-Object { $_.food -eq 'Belgian Waffles' } | 
                               Select-Object -ExpandProperty Price
    

    or even shorten that code:

    ($xml.breakfast_menu.food | Where-Object { $_.food -eq 'Belgian Waffles' }).price
    

    Hope that explains


    Edit

    If you need to do this on multiple xml files and these files are inside the same root path, you could loop through using Get-ChildItem to get the xml files and process them like in the examples I gave.

    Get-ChildItem -Path 'ROOTFOLDER OF THE FOLDERS WHERE THE XML FILES ARE KEPT' -Filter '*.xml' -File -Recurse | 
        ForEach-Object {
            [xml]$xml = Get-Content -Path $_.FullName
            # in this example simply output the menu items and their price for each xml file
            foreach ($item in $xml.breakfast_menu.food) {
                [PSCustomObject]@{
                    'File'     = $_.FullName    # added the file FullName so you know where the item came from
                    'MenuItem' = $item.food
                    'Price'    = $item.price
                }
            }
        }
    

    Or from several locations:

    $folders = 'D:\Testing\TestcasesOutput\1ac7b5a0-2d62-403c-8394-5bd33330cbe7',
               'D:\Testing\TestcasesOutput\227c619a-b7d1-4da6-8fe5-f2c923ddcb7a',
               'D:\Testing\TestcasesOutput\d4370ae1-643f-4c44-ba41-7f640afcc276'
    
    $result = Get-ChildItem -Path $folders -Filter '*.xml' -File | 
        ForEach-Object {
            [xml]$xml = Get-Content -Path $_.FullName
            # in this example simply output the menu items and their price for each xml file
            foreach ($item in $xml.breakfast_menu.food) {
                [PSCustomObject]@{
                    'File'     = $_.FullName
                    'MenuItem' = $item.food
                    'Price'    = $item.price
                }
            }
        }
    
    #output to screen:
    $result
    
    # output to CSV
    $result | Export-Csv -Path 'PATH AND FILENAME FOR THE OUTPUT CSV FILE' -NoTypeInformation