Search code examples
pythonregexregex-lookaroundsregexp-replace

How to remove .ds patterned string from a string using python


I have this string

a="""SELECT
    transform_abc.ds AS "ds",
    SUM(transform_abc.dollars) AS "dollars",
    transform_abc.unit AS "unit"
FROM fct_table_abc transform_abc
WHERE
    (
        transform_abc.is_charged > 0
        OR transform_abc.account_status = 0
    )
    AND transform_abc.ds = '2020-02-20'
GROUP BY
    transform_abc.ds,
    transform_abc.unit"""

I need to remove column with ds from this string after SELECT and GROUP BY but not after WHERE.

Output needed:

a="""SELECT
    SUM(transform_abc.dollars) AS "dollars",
    transform_abc.unit AS "unit"
FROM fct_table_abc transform_abc
WHERE
    (
        transform_abc.is_charged > 0
        OR transform_abc.account_status = 0
    )
    AND transform_abc.ds = '2020-02-20'
GROUP BY
    transform_abc.unit"""

Tranform_abc is just a table name , it can be any table name. So we cant use that in regex. Not sure how to solve this


Solution

  • Here is a way to go:

    import re
    
    a="""SELECT
        transform_abc.ds AS "ds",
        SUM(transform_abc.dollars) AS "dollars",
        transform_abc.unit AS "unit"
    FROM fct_table_abc transform_abc
    WHERE
        (
            transform_abc.is_charged > 0
            OR transform_abc.account_status = 0
        )
        AND transform_abc.ds = '2020-02-20'
    GROUP BY
        transform_abc.ds,
        transform_abc.unit"""
    
    res = re.sub(r'((?:SELECT|GROUP BY)\s+(?:(?!WHERE)[\s\S])*?)\s+[\w.]+\.ds.+', r'\1', a)
    print res
    

    Output:

    SELECT
    
        SUM(transform_abc.dollars) AS "dollars",
        transform_abc.unit AS "unit"
    FROM fct_table_abc transform_abc
    WHERE
        (
            transform_abc.is_charged > 0
            OR transform_abc.account_status = 0
        )
        AND transform_abc.ds = '2020-02-20'
    GROUP BY
    
        transform_abc.unit
    

    Demo & explanation


    If the SQL is on a single line, use:

    ((?:SELECT|GROUP BY)\s+(?:(?!WHERE)[\s\S])*?)\s*[\w.]+\.ds[^,]*,
    

    Demo & explanation