Suppose I have the following dataframe:
What I want is process the dataframe to the following form:
I tried pd.melt function but unfortunately it just does really poor job with multiple columns melting. Any idea please?
Regards
Copy all columns names to each row and last add new column Qte
:
df1 = df.set_index('Date')
df2 = pd.DataFrame(np.broadcast_to(df1.columns.values, df1.shape), index=df1.index)
df2 = df2.rename(columns=lambda x: 'Crit{}'.format(x+1)).assign(Qte=df1.iloc[:, -1])
print (df2)
Crit1 Crit2 Crit3 Crit4 Qte
Date
2016-06 XX YY ZZ FF 100
2016-07 XX YY ZZ FF 300
2019-08 XX YY ZZ FF 400
Timings:
N = 1000
a = np.random.randint(100, size=N)
df = pd.DataFrame({'FF':a, 'YY':a,'ZZ':a, 'XX':a}, index=pd.date_range('2000-01-01', periods=N, freq='M').to_period('M'))
df = df.reset_index().rename(columns={'index':'Date'})
print (df.head())
Date FF XX YY ZZ
0 2000-01 49 49 49 49
1 2000-02 50 50 50 50
2 2000-03 72 72 72 72
3 2000-04 81 81 81 81
4 2000-05 29 29 29 29
In [165]: %%timeit
...: (df.set_index(['Date'])
...: .stack()
...: .reset_index(-1)
...: .groupby('Date')
...: .apply(lambda x: pd.concat([pd.Series(x.level_1.values),pd.Series(x[0].unique()[0])]))
...: .set_axis(['Crit1','Crit2','Crit3','Crit4','Qte'],axis=1, inplace=False))
...:
1 loop, best of 3: 904 ms per loop
In [166]: %%timeit
...: df1 = df.set_index('Date')
...: pd.DataFrame(np.broadcast_to(df1.columns.values, df1.shape), index=df1.index).rename(columns=lambda x: 'Crit{}'.format(x+1)).assign(Qte=df1.iloc[:, -1])
...:
...:
100 loops, best of 3: 2.89 ms per loop