Suppose I have a string like this:
exp = 'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END'
exp2 = 'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
I want to return the all occurrences of WHEN and THEN along with the texts of that.
This is the expected output of exp1
['WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\'','WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\'']
This is the expected output of exp2
['WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']
What I have tried is this:
res = re.findall(r'\s*(WHEN|When|when)+\s*(.*)\s*(THEN|Then|then)+\s*')
But the resulting list shows this output in my case
['(WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN)']
Try:
WHEN (?:(?! +(?:WHEN|ELSE)).)* # with flags=re.I
WHEN
- Matches 'WHEN '(?:(?! +(?:WHEN|ELSE)).)
- Uses negative lookahead and states that as long as the current position does not match one or more space characters followed by either 'WHEN' or 'ELSE', then match one more character.import re
cases = [
'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END',
'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
]
for case in cases:
res = re.findall(r'WHEN (?:(?! +(?:WHEN|ELSE)).)*', case, flags=re.I)
print(res)
Prints:
['WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\'', 'WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\'']
['WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']
Update
If you want to group the WHEN and ELSE parts (stripped of leading and trailing spaces), then use the following regex:
WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)
import re
cases = [
'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END',
'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
]
for case in cases:
results = re.findall(r'WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)', case, flags=re.I)
for result in results:
print(result[0], result[1])
Prints:
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='CPU' 'YES'
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='RAM' 'YES'
("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")