Search code examples
regexregex-lookaroundsregex-groupregex-greedy

RegEx for matching uppercase and dash followed by a comma


Trying to remove strings that follow the pattern

Tag Starts With

Size:  

and before the next COMMA (,) includes the - character.

Example:

Size: XS-S-M-L-XL-2XL,

or

Size: XS-S-M,

etc.

WOULD get selected (including ,)

but Size_S, would be ignored because there is no -

I'm close with:

Size:(.*)-*(.?),

But still not stopping at ,

Here is 1 line of tags:

Athletics, Fitted, Mesh, Feature_Moisture Wicking, Material_Polyester 100%, , Material_Polyester 100%, Material_Polyester Over 50%,  School, Style_Short Sleeves, Size_2XL, Size_L, Size_M, Size_S, Size_XL, Size_XS, Size: XS-S-M-L-XL-2XL, Uniforms, Unisex, V-Neck, VisibleLogos, Youth

To remove all size 'range' tags from my cells and only leave the single size tag.

Solution can be found here: regex101.com/r/VuTzba/1


Solution

  • In your pattern Size:(.*)-*(.?), you are first matching until the end of the string using (.*).

    After that the hyphen -* and single character in the group (.?) are optional so it will backtrack until the last comma as that is the only character that has to be matched.

    To get a more exact match, you could use a repeating pattern to match the sizes:

    Size: (?:\d*X[SL]|L|M|S)(?:-(?:\d*X[LS]|L|M|S))*,
    

    Explanation

    • Size: Match Size followed by a space
    • (?: Non capturing group
      • \d*X[SL]|L|M|S match one of the listed items in the alternation
    • ) Close group
    • (?: Non capturing group
      • -(?:\d*X[LS]|L|M|S) Match a hyphen followed by any of the listed items
    • )*, Close group and repeat 0+ times and match a comma

    Regex demo

    As more broader pattern could be using a character class and list all the allowed characters Size: [XSML\d]+(?:-[XSML\d]+)*, or match until the first comma Size:[^,]+,

    Edit

    To also match Size: 28W-30W-32W-34W-36W-38W-40W, Size: 28W-30W-32W-34W or you could use extend the character class adding |\d+W to it and end the pattern matching either a comma or assert the end of the string $

    Size: (?:\d*X[SL]|L|M|S|\d+W)(?:-(?:\d*X[LS]|L|M|S|\d+W))*(?:,|$)
    

    Regex demo