Search code examples
regexpowershellgrouping

PowerShell regular expression to interpret time strings 12h34m56s with optional groups


Using PowerShell and regex, I'm trying to interpret a time string in a data file, the string is for example 12h30m meaning 12 hours and 30 minutes, or 15m meaning 15 minutes, 1m30s, 2h etc. I figured I could split these strings using regex to separate the hours minutes and seconds, and only get the digits parts, so leave out the h, m, s, and then do some calculations.

However, I managed to get a regex working but only when I also include the h, m, s, I mean when I add a "positive look ahead" part in the regex then the final result is not what I expected.

Here is the PowerShell script

$input = "12h34m56s"
#$input = "4h30m"
#$input = "1m15s"
#$input = "14400"

Write-Output "--(test 1)--------------------"
$input -match '(\d*h)?(\d*m)?(\d*s)?(\d*)?'
Write-Output $Matches

Write-Output "--(test 2)--------------------"
$input -match '(\d*(?=h))?(\d*(?=m))?(\d*(?=s))?(\d*)?'
Write-Output $Matches

The output is this:

--(test 1)--------------------
True

Name                           Value
----                           -----
4
3                              56s
2                              34m
1                              12h
0                              12h34m56s
--(test 2)--------------------
True
4
1                              12
0                              12

The first part "test 1" is what I expected, however for the second part "test 2" I was expecting this output

--(test 2)--------------------
True

Name                           Value
----                           -----
4
3                              56
2                              34
1                              12
0                              12h34m56s

I tested this on regex101 and looking at the syntax coloring it seems correct, but at he top it states "27 matches" and I would expect just 5 matches (because 5 lines). So I suspect it has something to do with grouping. I tried adding extra parenthesis around the whole but that didn't help. Any help would be appreciated.

regex time string interpretation


Solution

  • Using [timespan]::ParseExact() as shown in the other, helpful answers, is definitely preferable if/once you have individual tokens representing timespans.

    However, at least hypothetically you may (first) have to extract such tokens out of a larger text, in which case a regex is needed - see below.


    The problem with your regex is that using lookahead assertions (e.g, (?=h)) prevents your regex from recognizing tokens such as 12h34m56s as a single match, because lookaround assertions do not consume the substrings they match.

    Therefore, just match those characters directly, and enclose the subexpression in (?:…), i.e. a non-capturing group to avoid unnecessary capture groups; e.g., instead of (\d*h)?, use (?:(\d+)h)?.

    Also note the use of + instead of +, as at least one digit should be present, whereas the subexpression as a whole may be absent ((?:…)?)

    To put it all together, along with using named capture groups, which makes it easier to identify which capture-group matches captured what units:

    $str = @'
    12h34m56s545294385
    1h2m3s
    4h30m
    1m15s
    14400
    '@
    
    [regex]::Matches($str, '(?:(?<hrs>\d+)h)?(?:(?<mins>\d+)m)?(?:(?<secs>\d+)s)?(?<num>\d+)?') |
      ForEach-Object { 
        if ($_.Value) { # Only consider nonempty matches.
          [pscustomobject] @{ 
            Match = $_.Value
            Groups = $_.Groups | Select-Object -Skip 1 | Select-Object Name, Value | Out-String
          } 
        }
      } | Format-Table -Wrap
    

    Output:

    Match              Groups
    -----              ------
    12h34m56s545294385
                       Name Value
                       ---- -----
                       hrs  12
                       mins 34
                       secs 56
                       num  545294385
                      
                      
    1h2m3s            
                       Name Value
                       ---- -----
                       hrs  1
                       mins 2
                       secs 3
                       num
                      
                      
    4h30m             
                       Name Value
                       ---- -----
                       hrs  4
                       mins 30
                       secs
                       num
                      
                      
    1m15s             
                       Name Value
                       ---- -----
                       hrs
                       mins 1
                       secs 15
                       num
                      
                      
    14400             
                       Name Value
                       ---- -----
                       hrs
                       mins
                       secs
                       num  14400