Search code examples
pythonregexregex-greedy

Python regex - greedy quantifier not working in all cases


Further to this question, I am attempting to isolate/return the first int or float before an x (multiplication sign).

Here are my test strings:

2 x 3 kg PPG etc #returns 2
bob 2 x 3 kg PPG etc #returns 2
1.5x1.5kgPPGetcFred #returns 1.5
BobFred1.5x1.5kgPPGetcFred #returns 1.5
1.5 x 2.3 kg PPG Fred Bob #returns 5 (should return 1.5)
bob Fred 1.5 x 2.3 kg PPG Fred Bob #returns 5 (should return 1.5)

Here is my regex:

.*?(\d+)(\.?)(\s*)(\d?)(x)(.*)

It works for all of the above test strings except the last two. Vos iss up??

RegEx101 Demo

Python code example:

import re

regex = r'.*?(\d+)(\.?)(\s*)(\d?)(x)(.*)'
regout = r'\1\2\4'
test_str = "1.5 x 2.3 kg PPG Fred Bob"

tmp = re.sub(regex, regout, test_str)
print(tmp)

Solution

  • For matching numbers with a dot before a x you can use this regex: (\d*\.?\d+)\s*(?=x).

    • (\d*\.?\d+) creates a group with digits, either between dots like: 1, 10, 1.3, 1.5, 22.10, etc.
    • \s* matches whitespaces zero to unlimited times (between number and x can have whitespaces)
    • (?=x) makes sure everything in right before a x

    If you'd like to use .sub() then you must match entire string and this can be done using .*?(\d*\.?\d+)\s*(?=x).*, like you mentioned in comments.


    EDIT: OP asks for matching number right after x.

    For this, it's almost the inverse terms of previous regex, but instead of using positive lookahead (?=), you make use of positive lookbehind (?<=). So, when you use (?<=x) you want to make sure everything is after a x.

    With this, to match you could use (?<=x)\s*?(\d*\.?\d+) and for .sub() you could .*?(?<=x)\s*?(\d*\.?\d+).*

    Link for regex101 here.