I have a pandas timeseries dataframe df with columns date, week, week_start_date, country, campaign_name, active
for some date(s) we have multiple campaign information.
for example
data = [["2023.01.02", 1, "2023.01.01", "BR", "SALE-1", 1],
["2023.01.02", 1, "2023.01.01", "BR", "SALE-2", 1],
["2023.01.02", 1, "2023.01.01", "NL", "SALE-1", 1],
["2023.01.02", 1, "2023.01.01", "DE", "SALE-1", 1]]
df = pd.DataFrame(data, columns=["date", "week", "week_start_date", "country", "campaign_name", "active"])
date week week_start_date country campaign_name active
2023.01.02 1 2023.01.01 BR SALE-1 1
2023.01.02 1 2023.01.01 BR SALE-2 1
2023.01.02 1 2023.01.01 NL SALE-1 1
2023.01.02 1 2023.01.01 DE SALE-1 1
I don't mind having separate date country time-series combination but for the same country in case we have 2 campaigns then I would like to pivot it
date week week_start_date country campaign_name active campaign_name_n active_n total_active
2023.01.02 1 2023.01.01 BR SALE-1 1 SALE-2 1 2
2023.01.02 1 2023.01.01 NL SALE-1 1 NaN NaN 1
2023.01.02 1 2023.01.01 DE SALE-1 1 NaN NaN 1
now campaign_name_n , active_n
could be any number based on the values we find while running the loop.
I am trying to use
import pandas as pd
# Load your data into a pandas DataFrame
df = pd.read_csv("data.csv")
# Group the data by date, week, week_start_date, country, and days_active
grouped = df.groupby(["date", "week", "week_start_date", "country", "days_active"])
# Create a dictionary to store the campaign names for each group
campaign_names = {}
# Iterate through the groups
for name, group in grouped:
# Check if there are multiple entries for a particular date
if len(group) > 1:
# Create new columns for the campaign names
for i, row in enumerate(group.itertuples()):
campaign_name = "campaign_name_{}".format(i + 1)
campaign_names[row.Index] = campaign_name
df.at[row.Index, campaign_name] = row.campaign_name
# Add the campaign name columns to the DataFrame
for index, campaign_name in campaign_names.items():
df.at[index, "campaign_name"] = campaign_name
# Drop the original campaign_name column
df = df.drop(columns=["campaign_name"])
# Save the grouped and modified data to a new file
df.to_csv("grouped_data.csv", index=False)
but I am getting all the campaigns pivoted. which is not intended. Would be great if someone can help here. Thank you!
Try:
x = df.groupby(["date", "week", "week_start_date", "country"]).agg(
{"campaign_name": list, "active": [list, "sum"]}
)
x.columns = (f"{a}_{b}".replace("_list", "") for a, b in x.columns)
tmp = pd.DataFrame(
x[["campaign_name", "active"]]
.apply(
lambda x: {
f"{a}_{i}": v for a, b in zip(x.index, x.values) for i, v in enumerate(b, 1)
},
axis=1,
)
.to_list()
)
x = pd.concat([x.reset_index(), tmp], axis=1).drop(columns=["campaign_name", "active"])
print(x)
Prints:
date week week_start_date country active_sum campaign_name_1 campaign_name_2 active_1 active_2
0 2023.01.02 1 2023.01.01 BR 2 SALE-1 SALE-2 1 1.0
1 2023.01.02 1 2023.01.01 DE 1 SALE-1 NaN 1 NaN
2 2023.01.02 1 2023.01.01 NL 1 SALE-1 NaN 1 NaN