Here is my data frame:
month | year | category | size | sold |
---|---|---|---|---|
6 | 2022 | shirt | M | 52 |
5 | 2022 | shirt | M | 45 |
1 | 2022 | shirt | S | 61 |
12 | 2021 | shirt | S | 89 |
12 | 2021 | pant | S | 72 |
7 | 2022 | shirt | M | 42 |
8 | 2022 | shirt | M | 55 |
8 | 2022 | pants | 41 | 9 |
What I would like is to roll up previous month to another column:
Like this:
current_month_year | previous_month_year | category | size | sold_current | sold_previous |
---|---|---|---|---|---|
6-2022 | 5-2022 | shirt | M | 52 | 45 |
1-2022 | 12-2021 | shirt | S | 61 | 89 |
12-2021 | pant | S | 72 | 0 | |
8-2022 | 7-2022 | shirt | M | 55 | 42 |
8-2022 | pant | 41 | 9 | 0 |
how would I do this?
I have no idea how to do this so don't have any code to show.
You can create a new DataFrame with the output columns you want and iterate over the original DataFrame to filter the size and category in adjacent months to get the previous sales. The current sales and the rest of the rows are easy to add by simply copying from one DataFrame to the other and doing a small transformation to calculate the previous month.
I would do something like this:
df = pd.DataFrame({'month': [6, 5, 1, 12, 12, 7, 8, 8], 'year': [2022, 2022, 2022, 2021, 2021, 2022, 2022, 2022], 'category': ['shirt', 'shirt', 'shirt', 'shirt', 'pant', 'shirt', 'shirt', 'pants'], 'size': ['M', 'M', 'S', 'S', 'S', 'M', 'M', '41'], 'sold': [52, 45, 61, 89, 72, 42, 55, 9]})
df_new = pd.DataFrame(columns=['current_month_year', 'previous_month_year', 'category', 'size', 'sold_current', 'sold_previous'])
df_new['current_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)
for index, row in df.iterrows():
if row['month'] == 1:
df_new.loc[index, 'previous_month_year'] = '12-' + str(row['year'] - 1)
else:
df_new.loc[index, 'previous_month_year'] = str(row['month'] - 1) + '-' + str(row['year'])
df_new['category'] = df['category']
df_new['size'] = df['size']
df_new['sold_current'] = df['sold']
for index, row in df.iterrows():
sold_previous = df[(df['month'].astype(str) + '-' + df['year'].astype(str)) == df_new.loc[index, 'previous_month_year']]
sold_previous = sold_previous[sold_previous['category'] == df_new.loc[index, 'category']]
sold_previous = sold_previous[sold_previous['size'] == df_new.loc[index, 'size']]
sold_previous = sold_previous['sold'].values
if sold_previous.size > 0:
df_new.loc[index, 'sold_previous'] = sold_previous[0]
else:
df_new.loc[index, 'sold_previous'] = 0
This would be the output:
current_month_year previous_month_year category size sold_current sold_previous
0 6-2022 5-2022 shirt M 52 45
1 5-2022 4-2022 shirt M 45 0
2 1-2022 12-2021 shirt S 61 89
3 12-2021 11-2021 shirt S 89 0
4 12-2021 11-2021 pant S 72 0
5 7-2022 6-2022 shirt M 42 52
6 8-2022 7-2022 shirt M 55 42
7 8-2022 7-2022 pants 41 9 0