Search code examples
pythonregexpandasextract

Python Regex including Comma and Dot


I am trying to extract the physical dimensions of items from a column and at the moment my regex is working fine. I was helped here

The only issue now is that the regex does not include dots or commas

My current regex:

'(\d{1,3}\s*[xX*]\s*\d{1,3}(?:\s*[xX*]\s*\d{1,3})?)'

It's working fine for:

120 x 80 x 100
120x80
120 x 80
120X80x100
120*80 * 100

Now I need it to also look for [,.]

120,3x80,9x1003
120.3x80.9

Any help? Thanks in advance


Solution

  • You can use

    r'\b(\d+(?:[.,]\d+)*\s*[xX*]\s*\d+(?:[.,]\d+)*(?:\s*[xX*]\s*\d+(?:[.,]\d+)*)?)\b'
    

    See the regex demo.

    Details:

    • \b - a word boundary
    • ( - Group 1:
      • \d+ - one or more digits
      • (?:[.,]\d+)* - zero or more occurrences of . or , and one or more digits
      • \s*[xX*]\s* - x, X or * enclosed with zero or more whitespaces
      • \d+(?:[.,]\d+)* - one or more digits and then zero or more occurrences of . or , and one or more digits
      • (?:\s*[xX*]\s*\d+(?:[.,]\d+)*)? - an optional occurrence of x, X or * enclosed with zero or more whitespaces, followed with one or more digits and then zero or more occurrences of . or , and one or more digits
    • ) - end of Group 1
    • \b - a word boundary