Search code examples
sqlpysparkapache-spark-sql

How Do I Pass Later Defined Parameter Into PySpark SQL


I defined a query like this:

    sql_query_1 = f"""
                select
                    Country_Name as country,
                    State_Name as state,
                    City_Name as city,
                    avg(Order_Rate)
                 from last_day 
                 where Country_Name  ='USA' and Account_ID = "{account_id}" 
                 group by 1,2,3;
            """

    for orders in order_list:
        account_id = orders["Account_ID"]
        spark.sql(sql_query_1)

In this case, it cannot find account_id as I only define the account_id in for loop which is executed later.

If I do a reverse way, it said it cannot find sql_query_1 cause the for loop executed first and by the time it executing, sql_query_1 hasn't define

    for orders in order_list:
        account_id = orders["Account_ID"]
        spark.sql(sql_query_1)

    sql_query_1 = f"""
                select
                    Country_Name as country,
                    State_Name as state,
                    City_Name as city,
                    avg(Order_Rate)
                 from last_day 
                 where Country_Name  ='USA' and Account_ID = "{account_id}" 
                 group by 1,2,3;
            """

I can embed the sql query into the for loop but is there any elegant way to do this?


Solution

  • you have to pass the parameter to the sql string and instead of f-string I'm using .format() like so:

    sql_query_1 = """
        select
            Country_Name as country,
            State_Name as state,
            City_Name as city,
            avg(Order_Rate)
         from last_day 
         where Country_Name  ='USA' and Account_ID = "{account_id}" 
         group by 1,2,3;
    """
    
    for orders in order_list:
        account_id = orders["Account_ID"]
        spark.sql(sql_query_1.format(account_id=account_id))