Search code examples
regexrubycapture-group

Regex capture groups for "timer" sentence pattern


I am trying to write a regular expression that will allow me to parse and modify strings that may include an instruction to time a specific action, using capture groups to identify "hour" "minute" and "second" values in the input string.

In ruby I have a regex that gets close to the matches & capture groups I need

(?<hour_digit>\d+\s)[a-z]*\s?hour[s\b|b\s]|(?<minute_digit>\d+\s)[a-z]*\s?minute[s\b|b\s][a-z]*|(?<second_digit>\d+\s)[a-z]*\s?second[s\b|b]

I want to find an expression that can capture strings where multiple values could be matched, instead of independently; "5 hours and 15 minutes" should be one match & "30 minutes up to 1 hour" should be one match. Visually the matching of the current regex is like so: enter image description here


Solution

  • You can use

    (?<!\w)\b(?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)?(?:(?:\s*(?:or|up to|and|to))*\s*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?\b(?!\w)
    

    See the regex demo. Details:

    • (?<!\w)\b - a left-hand side word boundary ([[:<:]] or \< or \m in some flavors does this)
    • (?:(?<hour_digit>\d+)(?:\s*(?:more|another))?\s*hours?)? - an optional occurrence of
      • (?<hour_digit>\d+) - Group "hour_digit": one or more digits
      • (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
      • \s*hours? - zero or more whitespaces, hour or hours
    • (?:(?:\s*(?:or|up to|and|to))*\s*(?<minute_digit>\d+)(?:\s*(?:more|another))?\s*minutes?)? - an optional occurrence of
      • (?:\s*(?:or|up to|and|to))* - zero or more occurrences of zero or more whitespaces followed with or, up, up to, and words
      • \s* - zero or more whitespaces
      • (?<minute_digit>\d+) - Group "minute_digit": one or more digits
      • (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
      • \s*minutes? - zero or more whitespaces, minute or minutes
    • (?:(?:\s*(?:or|up to|and|to))*(?<second_digit>\d+)(?:\s*(?:more|another))?\s*seconds?)?- an optional occurrence of
      • (?:\s*(?:or|up to|and|to))* - zero or more occurrences of zero or more whitespaces followed with or, up, up to, and words
      • \s* - zero or more whitespaces
      • (?<second_digit>\d+) - Group "second_digit": one or more digits
      • (?:\s*(?:more|another))? - an optional occurrence of zero or more whitespaces and then more or another word
      • \s*seconds? - zero or more whitespaces, second or seconds
    • \b(?!\w) - a right-hand side word boundary (in some other regex flavors, it is \M, \> or [[:>:]]).