Search code examples
regexregex-groupregexp-replaceairtable

Regex extract - single string between commas&english only


I have strings that have random option name list, and I would like to extract only one option COLOR="" between the comma & " ". These options sometimes include different language letter so It must extract english only.

Here we have four examples of the string.

  1. COLOR="WHITE 화이트", DETAIL="NONE"
  2. SLEEVE="SHORT 쇼트", COLOR="BLUE"
  3. SLEEVE="LONG", COLOR="GRAY", TOP="DOUBLE"
  4. COLOR="YELLOW 노랑"

=>>>> I would like to see >>>>

  1. COLOR="WHITE"
  2. COLOR="BLUE"
  3. COLOR="GRAY"
  4. COLOR="YELLOW"

I tried regex extract

"COLOR=[a-zA-Z]+.*[a-zA-Z]+")

but obviously it includes after comma as well. something with '*,' was recommended but do not know how to combine these..

Thanks for the help in advanced!


Solution

  • If the COLOR=" part always start with an English color name, then you can assert the closing double quote to the right

    If you want to get the final result with the closing double quote, you can add it to the extracted match as you already know that it is present in the source.

    \bCOLOR="[a-zA-Z]+(?=[^\n"]*")
    

    Explanation

    • \bCOLOR=" Match the word COLOR followed by ="
    • [a-zA-Z]+ Match 1+ chars a-zA-Z
    • (?= Positive lookahead, assert that to the right is
      • [^\n"]*" Match optional characters except a newline or " and then match the "
    • ) Close the lookahead

    Regex demo

    If lookarounds are not supported, you can capture the first part that you want in a capture group, and then match the closing double quote.

    \b(COLOR="[a-zA-Z]+)[^\n"]*"
    

    Regex demo

    Or if the color is not the first occurrence (you can use 2 capture groups as well, but you already know that you are matching the COLOR= part)

    \bCOLOR="[^\na-zA-Z"]*\b([a-zA-Z]+)\b[^\n"]*"
    

    Regex demo