I am trying to write a regular expression that will allow me to parse and modify strings that may include an instruction to time a specific action, using capture groups to identify "hour" "minute" and "second" values in the input string.
In ruby I have a regex that gets close to the matches & capture groups I need
(?<hour_digit>\d+\s)[a-z]*\s?hour[s\b|b\s]|(?<minute_digit>\d+\s)[a-z]*\s?minute[s\b|b\s][a-z]*|(?<second_digit>\d+\s)[a-z]*\s?second[s\b|b]
I want to find an expression that can capture strings where multiple values could be matched, instead of independently; "5 hours and 15 minutes" should be one match & "30 minutes up to 1 hour" should be one match. Visually the matching of the current regex is like so:
You can use
(?<!\w)\b(?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?\b(?!\w)
See the regex demo. Details:
(?<!\w)\b
- a left-hand side word boundary ([[:<:]]
or \<
or \m
in some flavors does this)(?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)?
- an optional occurrence of
(?<hour_digit>\d+)
- Group "hour_digit": one or more digits(?:\s*(?:more|another))?
- an optional occurrence of zero or more whitespaces and then more
or another
word\s*hours?
- zero or more whitespaces, hour
or hours
(?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)?
- an optional occurrence of
(?:\s*(?:or|up to|and|to))*
- zero or more occurrences of zero or more whitespaces followed with or
, up
, up to
, and
words\s*
- zero or more whitespaces(?<minute_digit>\d+)
- Group "minute_digit": one or more digits(?:\s*(?:more|another))?
- an optional occurrence of zero or more whitespaces and then more
or another
word\s*minutes?
- zero or more whitespaces, minute
or minutes
(?:(?:\s*(?:or|up to|and|to))*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?
- an optional occurrence of
(?:\s*(?:or|up to|and|to))*
- zero or more occurrences of zero or more whitespaces followed with or
, up
, up to
, and
words\s*
- zero or more whitespaces(?<second_digit>\d+)
- Group "second_digit": one or more digits(?:\s*(?:more|another))?
- an optional occurrence of zero or more whitespaces and then more
or another
word\s*seconds?
- zero or more whitespaces, second
or seconds
\b(?!\w)
- a right-hand side word boundary (in some other regex flavors, it is \M
, \>
or [[:>:]]
).