I have this file below test.dat
<category>Games</category>
</game>
<category>Applications</category>
</game>
<category>Demos</category>
</game>
<category>Games</category>
<description>MLB 2002 (USA)</description>
</game>
<category>Bonus Discs</category>
</game>
<category>Multimedia</category>
</game>
<category>Add-Ons</category>
</game>
<category>Educational</category>
</game>
<category>Coverdiscs</category>
</game>
<category>Video</category>
</game>
<category>Audio</category>
</game>
<category>Games</category>
</game>
How do I use Get-Content
and Select-String
to output the following to terminal from the input of the file above. Using the above input I need to receive this output.
<category>Games</category>
</game>
<category>Games</category>
</game>
This is the command I'm currently using but it isn't working.
Get-Content '.\test.dat' | Select-String -pattern '(^\s+<category>Games<\/category>\n^\s+<\/game>$)'
First thing is you need to read it all in as one string to match across lines.
Get-Content '.\test.dat' -Raw
Since it seems you want to exclude the entry with you can use this pattern that grabs only those that don't have white space after and before
'(?s)\s+<category>Games\S+\r?\n</game>'
Select string returns a matchinfo object and you need to extract the Value
property of the Matches
property. You can do that a few different ways.
Get-Content '.\test.dat' -Raw |
Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches |
ForEach-Object Matches | ForEach-Object Value
or
$output = Get-Content '.\test.dat' -Raw |
Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches
$output.Matches.Value
or
(Get-Content '.\test.dat' -Raw |
Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches).Matches.Value
Output
<category>Games</category>
</game>
<category>Games</category>
</game>
You could also use [regex]
type accelerator.
$str = Get-Content '.\test.dat' -Raw
[regex]::Matches($str,'(?s)\s+<category>Games\S+\r?\n</game>').value
EDIT
Based on your additional info, the way I understand it is you want to remove any game categories that are empty. We can simplify this greatly by using a here string.
$pattern = @'
<category>Games</category>
</game>
'@
The additional blank line is intentional to capture the final newline character. You could also write it like this
$pattern = @'
<category>Games</category>
</game>\r?\n
'@
Now if we do a replace on the pattern, you'll see what I believe is what you expect for your final result.
(Get-Content $inputfile -Raw) -replace $pattern
And to finish it off you can just put the above command inside a Set-Content
command. Since the Get-Content
command is enclosed in parenthesis, it is completely read into memory before the file is written to.
Set-Content -Path $inputfile -Value ((Get-Content $inputfile -Raw) -replace $pattern)
EDIT 2
Well it seems to work in ISE but not in powershell console. In case you encounter the same thing, try this.
$pattern = '(?s)\s+<category>Games</category>\r?\n\s+</game>'
Set-Content -Path $inputfile -Value ((Get-Content $inputfile -Raw) -replace $pattern)