Search code examples
regexpowershelltextselect-string

How to Select-String from Multiple Lines with Powershell


I have this file below test.dat

        <category>Games</category>
</game>

        <category>Applications</category>
</game>

        <category>Demos</category>
</game>

        <category>Games</category>
        <description>MLB 2002 (USA)</description>
</game>

        <category>Bonus Discs</category>
</game>

        <category>Multimedia</category>
</game>

        <category>Add-Ons</category>
</game>

        <category>Educational</category>
</game>

        <category>Coverdiscs</category>
</game>

        <category>Video</category>
</game>

        <category>Audio</category>
</game>

        <category>Games</category>
</game>

How do I use Get-Content and Select-String to output the following to terminal from the input of the file above. Using the above input I need to receive this output.

            <category>Games</category>
    </game>
            <category>Games</category>
    </game>

This is the command I'm currently using but it isn't working. Get-Content '.\test.dat' | Select-String -pattern '(^\s+<category>Games<\/category>\n^\s+<\/game>$)'


Solution

  • First thing is you need to read it all in as one string to match across lines.

    Get-Content '.\test.dat' -Raw
    

    Since it seems you want to exclude the entry with you can use this pattern that grabs only those that don't have white space after and before

    '(?s)\s+<category>Games\S+\r?\n</game>'
    

    Select string returns a matchinfo object and you need to extract the Value property of the Matches property. You can do that a few different ways.

    Get-Content '.\test.dat' -Raw |
        Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches |
            ForEach-Object Matches | ForEach-Object Value
    

    or

    $output = Get-Content '.\test.dat' -Raw |
        Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches
    
    $output.Matches.Value
    

    or

    (Get-Content '.\test.dat' -Raw |
        Select-String '(?s)\s+<category>Games\S+\r?\n</game>' -AllMatches).Matches.Value
    

    Output

            <category>Games</category>
    </game>
    
    
            <category>Games</category>
    </game>
    

    You could also use [regex] type accelerator.

    $str = Get-Content '.\test.dat' -Raw
    
    [regex]::Matches($str,'(?s)\s+<category>Games\S+\r?\n</game>').value
    

    EDIT

    Based on your additional info, the way I understand it is you want to remove any game categories that are empty. We can simplify this greatly by using a here string.

    $pattern = @'
            <category>Games</category>
        </game>
    
    '@
    

    The additional blank line is intentional to capture the final newline character. You could also write it like this

    $pattern = @'
            <category>Games</category>
        </game>\r?\n
    '@
    

    Now if we do a replace on the pattern, you'll see what I believe is what you expect for your final result.

    (Get-Content $inputfile -Raw) -replace $pattern
    

    And to finish it off you can just put the above command inside a Set-Content command. Since the Get-Content command is enclosed in parenthesis, it is completely read into memory before the file is written to.

    Set-Content -Path $inputfile -Value ((Get-Content $inputfile -Raw) -replace $pattern)
    

    EDIT 2

    Well it seems to work in ISE but not in powershell console. In case you encounter the same thing, try this.

    $pattern = '(?s)\s+<category>Games</category>\r?\n\s+</game>'
    
    Set-Content -Path $inputfile -Value ((Get-Content $inputfile -Raw) -replace $pattern)