I have a data frame that looks as below:
5.29559 NaN 2.38176 NaN 0.51521 NaN 0.04454 0.00000 None None None None None None None None
0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 0 2 None None None
4.32454 NaN 1.77600 NaN 0.04454 NaN 0.00000 None None None None None None None None None
0 NaN NaN NaN NaN 0 NaN NaN 0 NaN NaN 2 None None None None
I am trying to generate a data frame by remove all the NaN values and trying to make the current data frame look like this:
5.29559 2.38176 0.51521 0.04454 0.00000
0 0 0 0 2
4.32454 1.77600 0.04454 0.00000
0 0 0 2
Can someone please help? I tried the dropna() method but it did not help.
Let's try stacking to eliminate nans, then reset the index for each level and finally unstack again:
(df.stack()
.groupby(level=0)
.apply(lambda df: df.reset_index(drop=True))
.unstack())
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN
Explanation:
First, stack to remove NaNs
df.stack()
0 0 5.29559
2 2.38176
4 0.51521
6 0.04454
7 0.00000
1 0 0.00000
5 0.00000
8 0.00000
11 0.00000
12 2.00000
2 0 4.32454
2 1.77600
4 0.04454
6 0.00000
3 0 0.00000
5 0.00000
8 0.00000
11 2.00000
dtype: float64
You'll notice the inner level of the index isn't monotonically increasing. let's fix that with groupby.apply
_.groupby(level=0).apply(lambda df: df.reset_index(drop=True))
0 0 5.29559
1 2.38176
2 0.51521
3 0.04454
4 0.00000
1 0 0.00000
1 0.00000
2 0.00000
3 0.00000
4 2.00000
2 0 4.32454
1 1.77600
2 0.04454
3 0.00000
3 0 0.00000
1 0.00000
2 0.00000
3 2.00000
dtype: float64
now we unstack
_.unstack()
0 1 2 3 4
0 5.29559 2.38176 0.51521 0.04454 0.0
1 0.00000 0.00000 0.00000 0.00000 2.0
2 4.32454 1.77600 0.04454 0.00000 NaN
3 0.00000 0.00000 0.00000 2.00000 NaN