Search code examples
regexpowershelltext

Select-String pattern finds only partial string match of -cmatch


I am trying to put together a string replacement routine. I have got as far as isolating the substring matches for two strings stored in array of strings $lines. Except there is a problem:

[string[]]$lines = "160 FROG Kermit  164 Big Bird_Road, Wellsville Singer","161 PIGGY Miss Pretty 1640 Really Long Main_Road, Whathellville Prima Donna"
# match string from last number to comma
foreach ($line in $lines) {
    if ($line -cmatch '\d\s\w[a-z]*\s.*,') {
        Write-Host "Found match!"
        $line | Select-String -Pattern '\d\s\w[a-z]*\s.' -AllMatches |
            ForEach-Object { 
                $x = $_.Matches[1].Value 
                Write-Host "x is:" $x
            }
    }

The first regex in $line -cmatch '\d\s\w[a-z]*\s.*,' is correct according to testing in Expresso. I want the address part of the string from last street number to comma. I am looking to replace the street basename spaces with underscores eg Big Bird_Road with Big_Bird_Road and Really Long Main_Road with Really_Long_Main_Road The problem is that the second regex contained in: $line | Select-String -Pattern '\d\s\w[a-z]*\s.' -AllMatches | Cannot be completed. As it is here. The output is:

Found match!
x is: 4 Big B
Found match!
x is: 0 Really L

The substring has not been captured yet! And if I add the remaining *, I get no output at all for x is:

Why doesn't the first regex (used with -cmatch) work in the same way when used as a Select-String pattern?


Solution

  • If you want to do a replace for that format in the strings, you can might use -replace and might use a patter to match the spaces only to replace them with an underscore:

    (?<=\d\s+\w[a-zA-Z\s_]*)\s(?=[^\d,]*,)
    

    Explanation

    • (?<= Positive lookbehind to assert what to the left is
      • \d\s+\w[a-zA-Z\s_]* Match a digit, 1+ whitespace chars, a word char and optionally repeat the listed characters in the character class
    • ) Close the lookbehind
    • \s Match a whitespace char (or \s+ to match 1 or more)
    • (?=[^\d,]*,) Assert a comma to the right after matching optional chars other than a digit or comma

    Regex demo

    [string[]]$lines = "160 FROG Kermit  164 Big Bird_Road, Wellsville Singer","161 PIGGY Miss Pretty 1640 Really Long Main_Road, Whathellville Prima Donna"
    
    foreach ($line in $lines) {
        $line -replace "(?<=\d\s+\w[a-zA-Z\s_]*)\s(?=[^\d,]*,)","_"
    }
    

    Output

    160 FROG Kermit  164 Big_Bird_Road, Wellsville Singer
    161 PIGGY Miss Pretty 1640 Really_Long_Main_Road, Whathellville Prima Donna