I'm looking to use 'for loops' in order to segment my SQL query as my current method is taking far too long on the server. I'm relatively new and unfamiliar with nested for loops over a pandas dataframe.
Promo Name | Key | Start_Date |
---|---|---|
Promo 1 | 6565 | 4/10/2023 |
Promo 1 | 5555 | 4/10/2023 |
Promo 1 | 5455 | 4/10/2023 |
Promo 1 | 5535 | 4/10/2023 |
Promo 2 | 2222 | 5/10/2023 |
Promo 2 | 2333 | 5/10/2023 |
Promo 2 | 2433 | 5/10/2023 |
What I want is based on the Promo Name column to be able to query the specific Keys from Promo 1 with that specific start date instead of querying all the keys and using the earliest start date (4/10/2023) in this example.
sql = "SELECT * FROM TABLE WHERE KEY IN {} AND TIME_KEY between {} and {}".format(key, start_date, end_date)
I already have the end_date set as the most recent sunday so it's a static value. I just need help being able to segment the promos so that I'm not querying Promo 2 with a start date of 4/10/2023 when it starts at 5/10/2023
Ideally what the loop would do is make it so that the sql statement reads as such:
"SELECT * FROM TABLE WHERE KEY IN {6565, 5555, 5455, 5535} AND TIME_KEY between{4/10/2023} and {7/9/2023}"
The results would then be appended and then it would query the next promo (Promo 2) and append the results to a dataframe:
"SELECT * FROM TABLE WHERE KEY IN {2222, 2333, 2433} AND TIME_KEY between{5/10/2023} and {7/9/2023}"
If I understood all the problem, the solution you expect is something like:
end_date = '7/9/2023'
sql_template = "SELECT * FROM TABLE WHERE KEY IN {} AND TIME_KEY BETWEEN {} AND {}"
promo_groups = df.groupby('Promo Name')
results_df = pd.DataFrame()
for promo_name, promo_group in promo_groups:
sql_query = sql_template.format(str(promo_group['Key'].tolist()),
promo_group['Start_Date'].iloc[0] # Assuming each promo has a single start date
end_date)
promo_results = execute_sql_query(sql_query) # Replace 'execute_sql_query' with your SQL execution function
results_df = results_df.append(promo_results)
print(results_df)