Search code examples
pythonregexparsingregular-language

Greedy Python RegEx capturing group to include "and"


I need some help writing regex expressions. I need an expression that can match the following patterns (including words and digits, spaces and commas):

  • Line 145
  • Line3544354
  • Lines 10,12
  • Line items 45,10,26
  • Lines 10 and 45

Thus far, I wrote one expression which includes the first three patterns and all case variations:

r'(?i)(line item[\.*\,*\s*\d+]+]+|line[\.*\,*\s*\d+]+|lines[\.*\,*\s*\d+]+|line items[\.*\,*\s*\d+]+)'

I would like to include the last two patterns listed but not sure how. I have wrote this expression for the pattern matching "Lines 10 and 45" by modifying the capturing group as follows:

r'(Lines[\.*\,*\w*\s*\d+]+)'

However, it does not work as expected. It selects all word characters in the string. I would like to keep my expressions greedy, but not sure how to implement the last two patterns in the list.

Any suggestions please?


Solution

  • You may use

    (?i)lines?(?:\s+items?)?\s*\d+(?:\.\d+)?(?:\s*(?:,|and)\s*\d+(?:\.\d+)?)*
    

    See the regex demo.

    Pattern details:

    • (?i) - ignore case inline modifier
    • lines? - line or lines (? quantifier makes the preceding pattern optional, matching 1 or 0 occurrences)
    • (?:\s+items?)? - an optional non-capturing group matching 1 or 0 occurrences of 1+ whitespaces followed with item and an optional s char
    • \s* - 0+ whitespaces
    • \d+(?:\.\d+)? - 1+ digits followed with an optional sequence of . and 1+ digits
    • (?:\s*(?:,|and)\s*\d+(?:\.\d+)?)* - 0 or more repetitions of
      • \s* - 0+ whitespaces
      • (?:,|and) - , or and char sequence
      • \s* - 0+ whitespaces
      • \d+(?:\.\d+)? - 1+ digits followed with an optional sequence of . and 1+ digits