I am trying to extract the physical dimensions of items from a column "Description" in a df to create a new column with it.
Dimensions usually appear in this format (120x80x100) in the middle of long descriptions like:
Lorem ipsum dolor sit amet, consectetur adipiscing elit 120x80x100 ed do eiusmod tempor...
But sometimes have spaces between:
120 x 80 x 100
Or don't have height:
120x80
120 x 80
Any help? Thanks in advance
You can use the regex, \d+\s*x\s*\d+(?:\s*x\s*\d+)?
Explanation:
\d+
: One or more digits\s*
: Zero or more whitespace charactersx
: Literal, x
(?:\s*x\s*\d+)?
: Optional non-capturing groupIf you want the numbers to be of one to three digits, replace \d+
with \d{1,3}
as shown in the regex, \d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?
.
If your code requires you to use a group, do it as follows:
(\d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?)