I have this string
a="""SELECT
transform_abc.ds AS "ds",
SUM(transform_abc.dollars) AS "dollars",
transform_abc.unit AS "unit"
FROM fct_table_abc transform_abc
WHERE
(
transform_abc.is_charged > 0
OR transform_abc.account_status = 0
)
AND transform_abc.ds = '2020-02-20'
GROUP BY
transform_abc.ds,
transform_abc.unit"""
I need to remove column with ds from this string after SELECT and GROUP BY but not after WHERE.
Output needed:
a="""SELECT
SUM(transform_abc.dollars) AS "dollars",
transform_abc.unit AS "unit"
FROM fct_table_abc transform_abc
WHERE
(
transform_abc.is_charged > 0
OR transform_abc.account_status = 0
)
AND transform_abc.ds = '2020-02-20'
GROUP BY
transform_abc.unit"""
Tranform_abc is just a table name , it can be any table name. So we cant use that in regex. Not sure how to solve this
Here is a way to go:
import re
a="""SELECT
transform_abc.ds AS "ds",
SUM(transform_abc.dollars) AS "dollars",
transform_abc.unit AS "unit"
FROM fct_table_abc transform_abc
WHERE
(
transform_abc.is_charged > 0
OR transform_abc.account_status = 0
)
AND transform_abc.ds = '2020-02-20'
GROUP BY
transform_abc.ds,
transform_abc.unit"""
res = re.sub(r'((?:SELECT|GROUP BY)\s+(?:(?!WHERE)[\s\S])*?)\s+[\w.]+\.ds.+', r'\1', a)
print res
Output:
SELECT
SUM(transform_abc.dollars) AS "dollars",
transform_abc.unit AS "unit"
FROM fct_table_abc transform_abc
WHERE
(
transform_abc.is_charged > 0
OR transform_abc.account_status = 0
)
AND transform_abc.ds = '2020-02-20'
GROUP BY
transform_abc.unit
If the SQL is on a single line, use:
((?:SELECT|GROUP BY)\s+(?:(?!WHERE)[\s\S])*?)\s*[\w.]+\.ds[^,]*,