Using PowerShell and regex, I'm trying to interpret a time string in a data file, the string is for example 12h30m
meaning 12 hours and 30 minutes, or 15m
meaning 15 minutes, 1m30s
, 2h
etc. I figured I could split these strings using regex to separate the hours minutes and seconds, and only get the digits parts, so leave out the h
, m
, s
, and then do some calculations.
However, I managed to get a regex working but only when I also include the h
, m
, s
, I mean when I add a "positive look ahead" part in the regex then the final result is not what I expected.
Here is the PowerShell script
$input = "12h34m56s"
#$input = "4h30m"
#$input = "1m15s"
#$input = "14400"
Write-Output "--(test 1)--------------------"
$input -match '(\d*h)?(\d*m)?(\d*s)?(\d*)?'
Write-Output $Matches
Write-Output "--(test 2)--------------------"
$input -match '(\d*(?=h))?(\d*(?=m))?(\d*(?=s))?(\d*)?'
Write-Output $Matches
The output is this:
--(test 1)--------------------
True
Name Value
---- -----
4
3 56s
2 34m
1 12h
0 12h34m56s
--(test 2)--------------------
True
4
1 12
0 12
The first part "test 1" is what I expected, however for the second part "test 2" I was expecting this output
--(test 2)--------------------
True
Name Value
---- -----
4
3 56
2 34
1 12
0 12h34m56s
I tested this on regex101 and looking at the syntax coloring it seems correct, but at he top it states "27 matches" and I would expect just 5 matches (because 5 lines). So I suspect it has something to do with grouping. I tried adding extra parenthesis around the whole but that didn't help. Any help would be appreciated.
Using [timespan]::ParseExact()
as shown in the other, helpful answers, is definitely preferable if/once you have individual tokens representing timespans.
However, at least hypothetically you may (first) have to extract such tokens out of a larger text, in which case a regex is needed - see below.
The problem with your regex is that using lookahead assertions (e.g, (?=h)
) prevents your regex from recognizing tokens such as 12h34m56s
as a single match, because lookaround assertions do not consume the substrings they match.
Therefore, just match those characters directly, and enclose the subexpression in (?:…)
, i.e. a non-capturing group to avoid unnecessary capture groups; e.g., instead of (\d*h)?
, use (?:(\d+)h)?
.
Also note the use of +
instead of +
, as at least one digit should be present, whereas the subexpression as a whole may be absent ((?:…)?
)
To put it all together, along with using named capture groups, which makes it easier to identify which capture-group matches captured what units:
$str = @'
12h34m56s545294385
1h2m3s
4h30m
1m15s
14400
'@
[regex]::Matches($str, '(?:(?<hrs>\d+)h)?(?:(?<mins>\d+)m)?(?:(?<secs>\d+)s)?(?<num>\d+)?') |
ForEach-Object {
if ($_.Value) { # Only consider nonempty matches.
[pscustomobject] @{
Match = $_.Value
Groups = $_.Groups | Select-Object -Skip 1 | Select-Object Name, Value | Out-String
}
}
} | Format-Table -Wrap
Output:
Match Groups
----- ------
12h34m56s545294385
Name Value
---- -----
hrs 12
mins 34
secs 56
num 545294385
1h2m3s
Name Value
---- -----
hrs 1
mins 2
secs 3
num
4h30m
Name Value
---- -----
hrs 4
mins 30
secs
num
1m15s
Name Value
---- -----
hrs
mins 1
secs 15
num
14400
Name Value
---- -----
hrs
mins
secs
num 14400