Search code examples
powershellpattern-matchingstring-matching

Need to get the string in between two patterns in PowerShell


I need to match the a pattern i.e. "Commodity Name" and get the string in the next line between the patterns "<dd>" "</dd>".

Sample Input file:

C:\Users\rpm\Desktop\sample.txt:133:    <dt>Commodity Name</dt>
C:\Users\rpm\Desktop\sample.txt:134:    <dd>Grocery</dd>
C:\Users\rpm\Desktop\sample.txt:136:    <dt>IP address</dt>
C:\Users\rpm\Desktop\sample.txt:137:    <dd>XXX.XXX.XXX.XXX port 8000</dd>
C:\Users\rpm\Desktop\sample.txt:144:    <dt>Commodity Serial #</dt>
C:\Users\rpm\Desktop\sample.txt:145:    <dd>0055500000</dd>
C:\Users\rpm\Desktop\sample.txt:147:    <dt>Client IP</dt>
C:\Users\rpm\Desktop\sample.txt:148:    <dd>xxx.xxx.xxx.xxx</dd>
C:\Users\rpm\Desktop\sample.txt:150:    <dt>Client Logged In As</dt>
C:\Users\rpm\Desktop\sample.txt:151:    <dd>rpm123</dd>
C:\Users\rpm\Desktop\sample.txt:153:    <dt>User is member of</dt>
C:\Users\rpm\Desktop\sample.txt:154:    <dd>BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user</dd>

Need to match patterns such as

  • Commodity Name
  • IP address
  • Commodity Serial #
  • Client IP
  • Client Logged In As
  • User is member of

and get the values in the next line of the matched patterns between the tags <dd> & </dd>.

Desired output:

Grocery | XXX.XXX.XXX.XXX port 8000 | 0055500000 | xxx.xxx.xxx.xxx | rpm123 | BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user

Solution

  • I would start to create an array defining your keywords:

    $keywords = @(
        '<dt>Commodity Name</dt>'
        '<dt>IP address</dt>'
        '<dt>Commodity Serial #</dt>'
        '<dt>Client IP</dt>'
        '<dt>Client Logged In As</dt>'
        '<dt>User is member of</dt>'
    )
    

    Now you can join the keywords by an | to use it with the Select-String cmdlet:

    $file = 'C:\Users\rpm\Desktop\sample.txt'
    $content = Get-Content $file
    $content | Select-String -Pattern ($keywords -join '|')
    

    This will give you the line number of each matched keyword. Now you can iterate over the result, access the next line by index and crop the <dd> pre and </dd> postifx:

    ForEach-Object {
            [regex]::Match($content[$_.LineNumber], '<dd>(.+)</dd>').Groups[1].Value
        }
    

    Regex:

    Regular expression visualization

    Output:

    Grocery
    XXX.XXX.XXX.XXX port 8000
    0055500000
    xxx.xxx.xxx.xxx
    rpm123
    BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user
    

    Finally you have to join the result by | to get the desired output. Here is the whole script:

    $keywords = @(
        '<dt>Commodity Name</dt>'
        '<dt>IP address</dt>'
        '<dt>Commodity Serial #</dt>'
        '<dt>Client IP</dt>'
        '<dt>Client Logged In As</dt>'
        '<dt>User is member of</dt>'
    )
    
    $file = 'C:\Users\rpm\Desktop\sample.txt'
    $content = Get-Content $file
    
    ($content | Select-String -Pattern ($keywords -join '|') | 
        ForEach-Object {
            [regex]::Match($content[$_.LineNumber], '<dd>(.+)</dd>').Groups[1].Value
        }) -join ' | '
    

    Output:

    Grocery | XXX.XXX.XXX.XXX port 8000 | 0055500000 | xxx.xxx.xxx.xxx | rpm123 | BP-RPM\COMD_CSO_ITM-AVAI_Def,BP-RPM\user