Search code examples
pythonregexregex-group

Match non-capturing group multiple times


I tried really hard to make a good title, but I'm not sure if I'm asking this right. Here's my best attempt:

I'm using Python's flavor of regex

I need to match numbers using named groups:

15x20x30    ->  'values': [15,20,30]
15bits      ->  'values': [15]
15          ->  'values': [15]
x15         ->  'values': [15]

but should not match:

456.48
888,12
6,4.8,4684.,6

my best attempt so far has been:

((?:[\sa-z])(?P<values>\d+)(?:[\sa-z]))

I'm using [\sa-z] instead of a word-boundary because 15x20 are two different values.

But it fails to match both 15 and 20 for the 15x20 case. It does work if I put an extra space as in 15x 20. How do I tell it to "reset" the non-capturing group at the end so it also works for the non-capturing group at the beginning?


Solution

  • You may use

    (?<![^\sa-z])\d+(?![^\sa-z])
    

    Case insensitive version:

    (?i)(?<![^\sa-z])\d+(?![^\sa-z])
    

    Or, compile the pattern with the re.I / re.IGNORECASE flags.

    See the regex demo

    Details

    • (?<![^\sa-z]) - a negative lookbehind that fails the match if, immediately to the left, there is no whitespace or a lowercase letter (any ASCII letter if (?i) or re.I are used)
    • \d+ - 1+ digits
    • (?![^\sa-z]) - a negative lookahead that fails the match if, immediately to the right, there is no whitespace or a lowercase letter (any ASCII letter if (?i) or re.I are used)