I am trying to use Pandas to transform a df from this:
Account | Value | OtherColumn | Another Column | |
---|---|---|---|---|
0 | A | 1.0 | 1.0 | 1.0 |
1 | A | NaN | NaN | NaN |
2 | A | NaN | NaN | NaN |
3 | A | 6.0 | NaN | NaN |
4 | A | NaN | NaN | NaN |
5 | B | 3.0 | 2.0 | NaN |
6 | B | NaN | NaN | NaN |
7 | B | 4.0 | NaN | NaN |
8 | B | NaN | NaN | NaN |
9 | C | NaN | NaN | NaN |
10 | C | 5.0 | 5.0 | NaN |
To:
Account | Value | OtherColumn | Another Column | |
---|---|---|---|---|
0 | A | 1.0 | 1.0 | 1.0 |
1 | A | 1.0 | 1.0 | 1.0 |
2 | A | 1.0 | 1.0 | 1.0 |
3 | A | 6.0 | NaN | NaN |
4 | A | 6.0 | NaN | NaN |
5 | B | 3.0 | 2.0 | NaN |
6 | B | 3.0 | 2.0 | NaN |
7 | B | 4.0 | NaN | NaN |
8 | B | 4.0 | NaN | NaN |
9 | C | 5.0 | 5.0 | NaN |
10 | C | 5.0 | 5.0 | NaN |
What I would like to do is, within a group, is to fill missing data though a combination of ffill() and bfill(). However, I have certain requirements:
If 'Value' is not NaN, then I don't want any columns in that row filled.
If 'Value' is NaN it should be filled with the first row above in which 'Value' is not NaN, within the group 'Account'. Then the process should be repeated by filling backward.
What I tried was this:
data = {'Account': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
'Value': [1.0, np.nan, np.nan, 6.0, np.nan, 3.0, np.nan, 4.0, np.nan, np.nan, 5.0],
'OtherColumn': [1.0, np.nan, np.nan, np.nan, np.nan, 2.0, np.nan, np.nan, np.nan, np.nan, 5.0],
'Another Column': [1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
df_filled = df.groupby('Account').ffill().bfill()
print(df_filled)
However, this returns a dataframe that looks like:
Value | OtherColumn | Another Column | |
---|---|---|---|
0 | 1.0 | 1.0 | 1.0 |
1 | 1.0 | 1.0 | 1.0 |
2 | 1.0 | 1.0 | 1.0 |
3 | 6.0 | 1.0 | 1.0 |
4 | 6.0 | 1.0 | 1.0 |
5 | 3.0 | 2.0 | NaN |
6 | 3.0 | 2.0 | NaN |
7 | 4.0 | 2.0 | NaN |
8 | 4.0 | 2.0 | NaN |
9 | 5.0 | 5.0 | NaN |
10 | 5.0 | 5.0 | NaN |
As you can see, on rows 3 and 4, the OtherColumn and Another Column have been filled with 1.0. I want these rows to remain as NaN. Same with rows 7 and 8 which are filled with the value of 2.0.
I cannot use ffill(1) because the number of rows that need to be forward (or back) filled varies)
Here I'm doing ffill and bfill on Value column first because it looks like your business logic is to ffill and bfill OtherColumn
and Another Column
based on group of Account
and Value
or at least I think it should be based on what you have described.
Once we do that, then we can group the dataframe based on Account
and Value
and ffill and bfill the group. I'm using here apply
to do the ffill and bfill since chaining the two methods .ffill().bfill()
results in a behavior not expected (since result of ffill would be a series and we are applying bfill to a series instead of dataframe). See this link for more info
import pandas as pd
import numpy as np
data = {'Account': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
'Value': [1.0, np.nan, np.nan, 6.0, np.nan, 3.0, np.nan, 4.0, np.nan, np.nan, 5.0],
'OtherColumn': [1.0, np.nan, np.nan, np.nan, np.nan, 2.0, np.nan, np.nan, np.nan, np.nan, 5.0],
'Another Column': [1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
df['Value'] = df.groupby('Account')['Value'].ffill().bfill()
df_filled = df.groupby(['Account', 'Value']).apply(lambda x: x.ffill().bfill())
print(df_filled)
Output
Account Value OtherColumn Another Column
Account Value
A 1.0 0 A 1.0 1.0 1.0
1 A 1.0 1.0 1.0
2 A 1.0 1.0 1.0
6.0 3 A 6.0 NaN NaN
4 A 6.0 NaN NaN
B 3.0 5 B 3.0 2.0 NaN
6 B 3.0 2.0 NaN
4.0 7 B 4.0 NaN NaN
8 B 4.0 NaN NaN
C 5.0 9 C 5.0 5.0 NaN
10 C 5.0 5.0 NaN
EDIT: You can drop level to get the dataframe without multilevel index
df_filled = df_filled.droplevel(['Account', 'Value'])
Output
Account Value OtherColumn Another Column
0 A 1.0 1.0 1.0
1 A 1.0 1.0 1.0
2 A 1.0 1.0 1.0
3 A 6.0 NaN NaN
4 A 6.0 NaN NaN
5 B 3.0 2.0 NaN
6 B 3.0 2.0 NaN
7 B 4.0 NaN NaN
8 B 4.0 NaN NaN
9 C 5.0 5.0 NaN
10 C 5.0 5.0 NaN