Search code examples
xmlpowershellloopsstring-interpolationvariable-expansion

Get data from elements within multiple XML files for output to another, single XML file using Powershell


I'll begin by confessing that I'm a Powershell (and coding) noob. I've stumbled my way through a few scripts, but I make no claims to anything even approaching competence. I'm hopeful that some more experienced folks can set me on the right track.

I'm trying to pull specific element-data from multiple XML files, which will be used to populate another XML file. The files from which I'm pulling the data are invoices, and I'd like to grab the invoice number and timestamp and drop those values into a manifest. The manifest structure is as follows

<?xml version="1.0" encoding="utf-8"?>
<Manifest>
    <Invoice>
        <InvoiceID></InvoiceID>
        <Timestamp></Timestamp>
    </Invoice>
</Manifest>

The XMLs from which I am pulling are in a sub-directory of the directory in which the manifest will be saved. For the sake of simplicity, the element names within the invoices are identical to the corresponding elements within the manifest. The folder structure for the manifest is "C:\Projects\Powershell\Manifest\Manifest.xml" and for the invoices it is "C:\Projects\Powershell\Manifest\Invoices\*.xml".

With the following code I am able to grab the data from the elements "InvoiceID" and "Timestamp" of only the first XML in the sub-directory "\Invoices". The code does, however, create one entry for each Invoice file; it just fills each element with the value taken from the first file. (So, for example, if I have three Invoice XML files in the "\Invoices" directory, I get results of: three instances of the <Invoice> complex element, each populated with the InvoiceID and Timestamp found in the first file. So it's counting the files and outputting a corresponding number of elements, it just isn't getting data from any but the first.)

Here is the code:

$files = Get-ChildItem "C:\Projects\Powershell\Manifest\Invoices\*.xml"

$xmlData = @"
    <Invoice>
        <InvoiceId>$InvID</InvoiceId>
        <Timestamp>$Timestamp</Timestamp>
    </Invoice>
"@
$Manifest = "C:\Projects\Powershell\Manifest\Manifest.xml"

ForEach ($file in $files) {
    $xmldoc = [xml](Get-Content $file)
    $InvID = $xmldoc.Manifest.Invoice.InvoiceID
    $Timestamp = $xmldoc.Manifest.Invoice.Timestamp
    ForEach ($xml in $xmldoc)
{
    Add-Content $Manifest $xmlData
}}

I can deal with properly formatting the closing tag of the output file once I have this piece figured out.

I know I must be looping incorrectly, but after reading up on this until my brain hurts, I've finally resorted to asking the question. What obvious thing am I missing/messing up?


Solution

  • String interpolation (expansion) in "..." and @"<newline>...<newline>"@ strings happens instantly, with the values that the referenced variables contain at that time getting used.
    As a result, the same string - whose value was determined before the loop - is output in each iteration of your foreach loop.

    Your use case calls for a templating approach, where string interpolation is deferred, to be invoked on demand with the then-current variable values, using $ExecutionContext.InvokeCommand.ExpandString()[1]:

    # Define the *template* string as a *literal* - with *single* quotes.
    $xmlData = @'
        <Invoice>
            <InvoiceId>$InvID</InvoiceId>
            <Timestamp>$Timestamp</Timestamp>
        </Invoice>
    '@
    
     # ...
     # ForEach ($file in $files) { ...
       # Perform interpolation *on demand* with $ExecutionContext.InvokeCommand.ExpandString()
       Add-Content $Manifest -Value $ExecutionContext.InvokeCommand.ExpandString($xmlData)
     # }
    

    Note:

    • Variable references can also be embedded by explicitly delineating the variable names via enclosure in {...}, e.g., ${InvID}, which may situationally be required for disambiguation.

    • In order to embed expressions / command output, use $(), the subexpression operator, as demonstrated below.

    • In order to embed verbatim $ instances, escape them as `$.


    A simple example:

    # Define a template string, *single-quoted*, with *literal contents*:
    #  - '$InvID' is simply literally part of the string, not a variable reference (yet).
    #  - Ditto for $((Get-Date).TimeOfDay)
    $strTempl = 'Invoice ID $InvID extracted at $((Get-Date).TimeOfDay).'
    
    # Echo the template string as-is - unexpanded - ...
    $strTempl
    
    # ... and expand it on demand
    $InvID = 1
    $ExecutionContext.InvokeCommand.ExpandString($strTempl)
    
    # ... and again, after assigning a different value to $InvID
    $InvID = 2
    $ExecutionContext.InvokeCommand.ExpandString($strTempl)
    

    The above yields something like:

    Invoice ID $InvID extracted at $((Get-Date).TimeOfDay).  # template literal
    Invoice ID 1 extracted at 11:38:12.2719300.              # first on-demand expansion
    Invoice ID 2 extracted at 11:38:12.2766010.              # second on-demand expnsion
    

    [1] Surfacing the functionality of the $ExecutionContext.InvokeCommand.ExpandString() method in a more discoverable way via an Expand-String cmdlet is the subject of this GitHub feature request.