I am trying to optimize my runtime at applying a stack() functionality.
Initial Dataframe
ID SCORE1 SCORE2 YEAR
0 1111 3 4 2019
1 1111 NaN 3 2019
2 1111 5 4 2019
3 2222 6 7 2019
4 2222 2 NaN 2019
5 3333 NaN 9 2019
6 3333 4 NaN 2019
7 4444 NaN NaN 2019
8 4444 5 6 2019
This groupby.apply() below worked.
But, It takes forever on a bigger Dataset (3 Million records = 25 mins)
var = df.groupby('ID').apply(lambda x: x.iloc[:, 1:3].stack())
Output Achieved
ID
1111 0 SCORE1 3
SCORE2 4
1 SCORE2 3
2 SCORE1 5
SCORE2 4
2222 3 SCORE1 6
SCORE2 7
4 SCORE1 2
3333 5 SCORE2 9
6 SCORE1 4
4444 8 SCORE1 5
SCORE2 6
Desired output : Same
How can I Optimize this performance ?
Can I use transform() ? How ? It does not have a stack() call
Appreciate all your insights at handing such scenarios
You can do with melt
, and I do not think grouby
is necessary here
df.drop('YEAR',1).melt('ID').dropna()
df.set_index('ID').drop('YEAR',1).stack()