Search code examples
powershellregex-group

Powershell Regex for optional equal sign separator in string


In the Test String below there could either be an equal sign OR one or more space characters between the Key and the Value. If the equal sign is present it may be optionally preceded and/or succeeded by zero or more space characters.

$MyTstString = "KeyX =  ValueY"
$RegExString = "^(?<Key>.+)(?<Sep>\s*=\s*)(?<Value>.*)$"
$MyTstString -match $RegExString | Foreach {$Matches}

What regular expression would do that for me?

Changing the RegExString to

$RegExString = "^(?<Key>.+)(?<Sep>\s*=\s*|\s+)(?<Value>.*)$"

causes the Key and Sep to be incorrect when the TestString is "KeyA = ValueB"


Solution

  • Note: \s matches all forms of whitespace, not just spaces, including tabs, newlines, ... To limit matching to just spaces, use a verbatim space ( ) in lieu of \s below.


    I suggest using a -split operation combined with a multi-assignment instead:

    $key, $value = $MyTstString -split '\s*=\s*|\s+', 2
    

    Note: The , 2 part (specifying the optional <Max-strings> operand) ensures that only at most two tokens are returned; otherwise, the value part could end up itself get split up if it happens to contain whitespace or =. Thanks, iRon.

    If you also want to capture the separator string:

    $key, $sep, $value = $MyTstString -split '(\s*=\s*|\s+)', 2
    

    As for what you tried:

    As Bender the Greatest points out, a -match operation with a scalar LHS:

    • returns $true or $false to indicate if the regex matched.
    • if it did match, populates automatic $Matches variable with the - one - match it found (it never goes looking for more).

    (By contrast, with an array (collection) as the LHS, -match returns the (potentially empty) sub-array of matching elements, and does not populate $Matches.)

    Using your original approach, with a corrected and streamlined form of your regex:

    • The problem with your regex (both variations) is that (?<Key>.+) is too greedy and includes spaces before = or the last space in the capture-group match.

    • The simplest solution is to simply make the + quantifier non-greedy: (?<Key>.+?); alternatively, limit the matched characters to everything but whitespace and =:
      (?<Key>[^\s=]+).

    $RegExString = '^(?<Key>.+?)(?<Sep>\s*=\s*|\s+)(?<Value>.*)$'
    
    'KeyX1 =  ValueY1', 'KeyX2 ValueY2', 'KeyX3=ValueY3' | 
      ForEach-Object {
        if ($_ -match $RegExString) {
          [pscustomobject] @{ Key = $Matches.Key; Sep = $Matches.Sep; Value = $Matches.Value}
        }
      }
    

    The above yields:

    Key   Sep  Value
    ---   ---  -----
    KeyX1  =   ValueY1
    KeyX2      ValueY2
    KeyX3 =    ValueY3