Need to retrieve dimensions from the text where they can be specified in a couple of ways:
There can be two or three dimensions and each might have its own measurement type or not have it.
I am struggling to add the list of optional dimension types and make the search of the third vale optional in the regex:
dimensions = re.findall(r'(\d+\.?\d*)\s*inches?feet?\s*x\s*(\d+\.?\d*)\s*inches?feet?\s*x?\s*(\d+\.?\d*)?\s*inches?feet?',string)
What you have is inches?feet?
, which says "match 0 to 1 'inches' and 0 to 1 'feet'". This means it could match something like "5 inchesfeet".
You were fairly close. The key idea you missed is that |
can be used to specify alternatives to match: (?:inches|feet)?
. They're put in a non-capturing group to clarify that only "feet" should be part of the alternative and not everything after it. The ?
at the end makes the entire group optional.
To make the entire third dimension optional, the pattern for it can be put in a non-capturing group, and then that group can be made optional with ?
:
(?:x\s*(\d+\.?\d*)?\s*(?:inches|feet)?)?
The final regex is
(\d+\.?\d*)\s*(?:inches|feet)?\s*x\s*(\d+\.?\d*)\s*(?:inches|feet)?\s*(?:x\s*(\d+\.?\d*)?\s*(?:inches|feet)?)?