I have a Powershell script that returned an output that's close to what I want, however there are a few lines and HTML-style tags I need to remove. I already have the following code to filter out:
get-content "atxtfile.txt" | select-string -Pattern '<fields>' -Context 1
However, if I attempt to pipe that output into a second "select-string"
, I won't get any results back. I was looking at the REGEX examples online, but most of what I've seen involves the use of coding loops to achieve their objective. I'm more used to the Linux shell where you can pipe output into multiple greps
to filter out text. Is there a way to achieve the same thing or something similar with PowerShell? Here's the file I'm working with as requested:
<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.force.com/2006/04/metadata">
<actionOverrides>
<actionName>Accept</actionName>
<type>Default</type>
</actionOverrides>
<actionOverrides>
<actionName>CancelEdit</actionName>
<type>Default</type>
</actionOverrides>
<actionOverrides>
<actionName>Today</actionName>
<type>Default</type>
</actionOverrides>
<actionOverrides>
<actionName>View</actionName>
<type>Default</type>
</actionOverrides>
<compactLayoutAssignment>SYSTEM</compactLayoutAssignment>
<enableFeeds>false</enableFeeds>
<fields>
<fullName>ActivityDate</fullName>
</fields>
<fields>
<fullName>ActivityDateTime</fullName>
</fields>
<fields>
<fullName>Guid</fullName>
</fields>
<fields>
<fullName>Description</fullName>
</fields>
</CustomObject>
So, I only want the text between the <fullName>
descriptor and I have the following so far:
get-content "txtfile.txt" | select-string -Pattern '<fields>' -Context 1
This will give me everything between the <fields>
descriptor, however I essentially need the <fullName>
line without the XML tags.
The simplest PSv3+ solution is to use PowerShell's built-in XML DOM support, which makes an XML document's nodes accessible as a hierarchy of objects with dot notation:
PS> ([xml] (Get-Content -Raw txtfile.txt)).CustomObject.fields.fullName
ActivityDate
ActivityDateTime
Guid
Description
Note: Even though the [xml] (Get-Content -Raw ...)
approach to parsing an XML document is convenient, it isn't fully robust with respect to character encoding; see this answer.
Note how even though .fields
is an array - representing all child <fields>
elements of top-level element <CustomObject>
- .fullName
was directly applied to it and returned the values of child elements <fullName>
across all array elements (<field>
elements) as an array.
This ability to access a property on a collection and have it implicitly applied to the collection's elements, with the results getting collected in an array, is a generic PSv3+ feature called member-access enumeration.
As an alternative, consider using the Select-Xml
cmdlet (available in PSv2 too), which supports XPath queries that generally allow for more complex extraction logic (though not strictly needed here); Select-Xml
is a high-level wrapper around the [xml]
.NET type's .SelectNodes()
method.
The following is the equivalent of the solution above:
$namespaces = @{ ns="http://soap.force.com/2006/04/metadata" }
$xpathQuery = '/ns:CustomObject/ns:fields/ns:fullName'
(Select-Xml -LiteralPath txtfile.txt $xpathQuery -Namespace $namespaces).Node.InnerText
Note:
As reflected in this snippet - unlike with dot notation - XML namespaces must be considered when using Select-Xml
, as follows:
Given that <CustomObject>
and all its descendants are in namespace xmlns
, identified via URI http://soap.force.com/2006/04/metadata
, you must:
-Namespace
argument
xmlns
is special in that it cannot be used as the key in the hashtable; instead, choose an arbitrary key name such as ns
, but be sure to use that chosen key name as the node-name prefix (see next point).:
; e.g., ns:CustomObject