I am trying to extract the physical dimensions of items from a column and at the moment my regex is working fine. I was helped here
The only issue now is that the regex does not include dots or commas
My current regex:
'(\d{1,3}\s*[xX*]\s*\d{1,3}(?:\s*[xX*]\s*\d{1,3})?)'
It's working fine for:
120 x 80 x 100
120x80
120 x 80
120X80x100
120*80 * 100
Now I need it to also look for [,.]
120,3x80,9x1003
120.3x80.9
Any help? Thanks in advance
You can use
r'\b(\d+(?:[.,]\d+)*\s*[xX*]\s*\d+(?:[.,]\d+)*(?:\s*[xX*]\s*\d+(?:[.,]\d+)*)?)\b'
See the regex demo.
Details:
\b
- a word boundary(
- Group 1:
\d+
- one or more digits(?:[.,]\d+)*
- zero or more occurrences of .
or ,
and one or more digits\s*[xX*]\s*
- x
, X
or *
enclosed with zero or more whitespaces\d+(?:[.,]\d+)*
- one or more digits and then zero or more occurrences of .
or ,
and one or more digits(?:\s*[xX*]\s*\d+(?:[.,]\d+)*)?
- an optional occurrence of x
, X
or *
enclosed with zero or more whitespaces, followed with one or more digits and then zero or more occurrences of .
or ,
and one or more digits)
- end of Group 1\b
- a word boundary