Search code examples
pythonpandasfor-loopnested-loops

For loop with an SQL Query based on multiple columns


I'm looking to use 'for loops' in order to segment my SQL query as my current method is taking far too long on the server. I'm relatively new and unfamiliar with nested for loops over a pandas dataframe.

Promo Name Key Start_Date
Promo 1 6565 4/10/2023
Promo 1 5555 4/10/2023
Promo 1 5455 4/10/2023
Promo 1 5535 4/10/2023
Promo 2 2222 5/10/2023
Promo 2 2333 5/10/2023
Promo 2 2433 5/10/2023

What I want is based on the Promo Name column to be able to query the specific Keys from Promo 1 with that specific start date instead of querying all the keys and using the earliest start date (4/10/2023) in this example.

sql = "SELECT * FROM TABLE WHERE KEY IN {} AND TIME_KEY between {} and {}".format(key, start_date, end_date)

I already have the end_date set as the most recent sunday so it's a static value. I just need help being able to segment the promos so that I'm not querying Promo 2 with a start date of 4/10/2023 when it starts at 5/10/2023

Ideally what the loop would do is make it so that the sql statement reads as such:

"SELECT * FROM TABLE WHERE KEY IN {6565, 5555, 5455, 5535} AND TIME_KEY between{4/10/2023} and {7/9/2023}"

The results would then be appended and then it would query the next promo (Promo 2) and append the results to a dataframe:

"SELECT * FROM TABLE WHERE KEY IN {2222, 2333, 2433} AND TIME_KEY between{5/10/2023} and {7/9/2023}"

Solution

  • If I understood all the problem, the solution you expect is something like:

        end_date = '7/9/2023'
        sql_template = "SELECT * FROM TABLE WHERE KEY IN {} AND TIME_KEY BETWEEN {} AND {}"
        
        promo_groups = df.groupby('Promo Name')
        
        results_df = pd.DataFrame()
        
        for promo_name, promo_group in promo_groups:                       
            sql_query = sql_template.format(str(promo_group['Key'].tolist()), 
    promo_group['Start_Date'].iloc[0]  # Assuming each promo has a single start date
    end_date)
                
            promo_results = execute_sql_query(sql_query)  # Replace 'execute_sql_query' with your SQL execution function
            results_df = results_df.append(promo_results)
        
        print(results_df)