Search code examples
pythonregexparentheses

How can i make group 1 differ based on content in the whole string?


In our Python system, I'm trying to isolate the second part of a size to make sure i can save the values separately.

As i got data in tons of different ways i have to take a lot of scenarios into consideration! At the same time our system requires everything to be in group 1 to be identified correctly, which increases the complexity!

This is what i got so far:

(?<=[\/\-])\s*([A-Za-z]+|\w+)+?(?!\d*\s*\)|\d*\)|\w*\))(?!\s*[\/\-]+)

Examples

working

These are my examples working:

110/116
S/M
S / M
S/M(32-34)
110/116(10-12y)
110/116(S/M)

not working

However my regex only functions correctly on the above examples.

Following 7 are causing issues:

S/M / L /XL
S / M / L / XL
S/M / L/XL
S/M/L/XL
S/M/L/XL(30-32)
S/M / L/XL(30-32)
S/M / L / XL(30-32)

How can I capture those cases as in below table:

Case Input Expected capture in group 1
1 S/M / L /XL "L /XL"
2 S / M / L / XL "L / XL"
3 S/M / L/XL "L/XL"
4 S/M/L/XL "L/XL"
5 S/M/L/XL(30-32) "L/XL"
6 S/M / L/XL(30-32) "L/XL"
7 S/M / L / XL(30-32) "L / XL"

Issue

How can I capture a "/" in the middle including the whole part after (like /XL) but without any following parentheses (like not the (30/32)).

Example for S/M / L / XL(30-32) I want to capture L / XL only.


Solution

  • You can use

    (?<=[/-])\s*([A-Z]+(?:\s*/\s*[A-Z]+)?|\d+)\b(?!\s*[/)-])
    

    See the regex demo. Details:

    • (?<=[/-]) - a position immediately preceded with / or -
    • \s* - zero or more whitespaces
    • ([A-Z]+(?:\s*/\s*[A-Z]+)?|\d+) - Group 1: one or more uppercase letters, and then an optional sequence of a / char enclosed with zero or more whitespaces and then one or more uppercase letters, or one or more digits
    • \b - a word boundary
    • (?!\s*[/)-]) - immediately to the right of the current location, there can't be zero or more whitespaces and then either /, ) or -.