Have a dataframe in following format
data = {'regions':["USA", "USA", "USA", "FRANCE", "FRANCE","FRANCE"],
'dates':['2024-08-03', '2024-08-10', '2024-08-17','2024-08-03', '2024-08-10', '2024-08-17'],
'values': [3, 4, 5, 7, 8,0],
}
df = pd.DataFrame(data)
regions dates values
0 USA 2024-08-03 3
1 USA 2024-08-10 4
2 USA 2024-08-17 5
3 FRANCE 2024-08-03 7
4 FRANCE 2024-08-10 8
5 FRANCE 2024-08-17 0
Need to change this df from long to wide format. Use the most recent dates as current date, and the other two dates will be lagged dates. Expected output is like
regions dates values_lag2 values_lag1 values
USA 2024-08-17 3 4 5
FRANCE 2024-08-17 7 8 0
Currently I used a for loop to manually to change the format. Just wondering if there is a more elegant way to realize it. Thanks
If same dates per each regions
is possible convert column to datetimes, pivoting, change columns names and add column with maximal date
s:
df['dates'] = pd.to_datetime(df['dates'])
out = df.pivot(index='regions', columns='dates', values='values')
out.columns = [f'values_lag{i-1}' if i!=1 else 'values'
for i in range(len(out.columns), 0, -1)]
out = df.groupby('regions')['dates'].max().to_frame().join(out).reset_index()
print (out)
regions dates values_lag2 values_lag1 values
0 FRANCE 2024-08-17 7 8 0
1 USA 2024-08-17 3 4 5
Another idea if possible different datetimes and need only ordering add sorting, counter by groupby.cumcount
and pivoting with helper column g
:
df['dates'] = pd.to_datetime(df['dates'])
df = df.sort_values(['regions', 'dates'])
df['g'] = df.groupby('regions').cumcount(ascending=False)
out = (df.pivot(index='regions', columns='g', values='values')
.sort_index(ascending=False, axis=1))
out.columns=[f'values_lag{i}' if i!=0 else 'values' for i in out.columns]
out = df.groupby('regions')['dates'].max().to_frame().join(out).reset_index()
print (out)
regions dates values_lag2 values_lag1 values
0 FRANCE 2024-08-17 7 8 0
1 USA 2024-08-17 3 4 5