Pandas: How to fill NaN within a group, only if a certain column contains NaN

I am trying to use Pandas to transform a df from this:

	Account	Value	OtherColumn	Another Column
0	A	1.0	1.0	1.0
1	A	NaN	NaN	NaN
2	A	NaN	NaN	NaN
3	A	6.0	NaN	NaN
4	A	NaN	NaN	NaN
5	B	3.0	2.0	NaN
6	B	NaN	NaN	NaN
7	B	4.0	NaN	NaN
8	B	NaN	NaN	NaN
9	C	NaN	NaN	NaN
10	C	5.0	5.0	NaN

To:

	Account	Value	OtherColumn	Another Column
0	A	1.0	1.0	1.0
1	A	1.0	1.0	1.0
2	A	1.0	1.0	1.0
3	A	6.0	NaN	NaN
4	A	6.0	NaN	NaN
5	B	3.0	2.0	NaN
6	B	3.0	2.0	NaN
7	B	4.0	NaN	NaN
8	B	4.0	NaN	NaN
9	C	5.0	5.0	NaN
10	C	5.0	5.0	NaN

What I would like to do is, within a group, is to fill missing data though a combination of ffill() and bfill(). However, I have certain requirements:

If 'Value' is not NaN, then I don't want any columns in that row filled.

If 'Value' is NaN it should be filled with the first row above in which 'Value' is not NaN, within the group 'Account'. Then the process should be repeated by filling backward.

What I tried was this:

data = {'Account': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
        'Value': [1.0, np.nan, np.nan, 6.0, np.nan, 3.0, np.nan, 4.0, np.nan, np.nan, 5.0],
        'OtherColumn': [1.0, np.nan, np.nan, np.nan, np.nan, 2.0, np.nan, np.nan, np.nan, np.nan, 5.0],
        'Another Column': [1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data)
df_filled = df.groupby('Account').ffill().bfill()
print(df_filled)

However, this returns a dataframe that looks like:

	Value	OtherColumn	Another Column
0	1.0	1.0	1.0
1	1.0	1.0	1.0
2	1.0	1.0	1.0
3	6.0	1.0	1.0
4	6.0	1.0	1.0
5	3.0	2.0	NaN
6	3.0	2.0	NaN
7	4.0	2.0	NaN
8	4.0	2.0	NaN
9	5.0	5.0	NaN
10	5.0	5.0	NaN

As you can see, on rows 3 and 4, the OtherColumn and Another Column have been filled with 1.0. I want these rows to remain as NaN. Same with rows 7 and 8 which are filled with the value of 2.0.

I cannot use ffill(1) because the number of rows that need to be forward (or back) filled varies)

Solution

Here I'm doing ffill and bfill on Value column first because it looks like your business logic is to ffill and bfill OtherColumn and Another Column based on group of Account and Value or at least I think it should be based on what you have described.

Once we do that, then we can group the dataframe based on Account and Value and ffill and bfill the group. I'm using here apply to do the ffill and bfill since chaining the two methods .ffill().bfill() results in a behavior not expected (since result of ffill would be a series and we are applying bfill to a series instead of dataframe). See this link for more info

import pandas as pd
import numpy as np

data = {'Account': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
        'Value': [1.0, np.nan, np.nan, 6.0, np.nan, 3.0, np.nan, 4.0, np.nan, np.nan, 5.0],
        'OtherColumn': [1.0, np.nan, np.nan, np.nan, np.nan, 2.0, np.nan, np.nan, np.nan, np.nan, 5.0],
        'Another Column': [1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data)
df['Value'] = df.groupby('Account')['Value'].ffill().bfill()
df_filled = df.groupby(['Account', 'Value']).apply(lambda x: x.ffill().bfill())
print(df_filled)

Output

                 Account  Value  OtherColumn  Another Column
Account Value                                               
A       1.0   0        A    1.0          1.0             1.0
              1        A    1.0          1.0             1.0
              2        A    1.0          1.0             1.0
        6.0   3        A    6.0          NaN             NaN
              4        A    6.0          NaN             NaN
B       3.0   5        B    3.0          2.0             NaN
              6        B    3.0          2.0             NaN
        4.0   7        B    4.0          NaN             NaN
              8        B    4.0          NaN             NaN
C       5.0   9        C    5.0          5.0             NaN
              10       C    5.0          5.0             NaN

EDIT: You can drop level to get the dataframe without multilevel index

df_filled = df_filled.droplevel(['Account', 'Value'])

Output

   Account  Value  OtherColumn  Another Column
0        A    1.0          1.0             1.0
1        A    1.0          1.0             1.0
2        A    1.0          1.0             1.0
3        A    6.0          NaN             NaN
4        A    6.0          NaN             NaN
5        B    3.0          2.0             NaN
6        B    3.0          2.0             NaN
7        B    4.0          NaN             NaN
8        B    4.0          NaN             NaN
9        C    5.0          5.0             NaN
10       C    5.0          5.0             NaN

	Account	Value	OtherColumn	Another Column
0	A	1.0	1.0	1.0
1	A	NaN	NaN	NaN
2	A	NaN	NaN	NaN
3	A	6.0	NaN	NaN
4	A	NaN	NaN	NaN
5	B	3.0	2.0	NaN
6	B	NaN	NaN	NaN
7	B	4.0	NaN	NaN
8	B	NaN	NaN	NaN
9	C	NaN	NaN	NaN
10	C	5.0	5.0	NaN

	Account	Value	OtherColumn	Another Column
0	A	1.0	1.0	1.0
1	A	NaN	NaN	NaN
2	A	NaN	NaN	NaN
3	A	6.0	NaN	NaN
4	A	NaN	NaN	NaN
5	B	3.0	2.0	NaN
6	B	NaN	NaN	NaN
7	B	4.0	NaN	NaN
8	B	NaN	NaN	NaN
9	C	NaN	NaN	NaN
10	C	5.0	5.0	NaN

	Account	Value	OtherColumn	Another Column
0	A	1.0	1.0	1.0
1	A	NaN	NaN	NaN
2	A	NaN	NaN	NaN
3	A	6.0	NaN	NaN
4	A	NaN	NaN	NaN
5	B	3.0	2.0	NaN
6	B	NaN	NaN	NaN
7	B	4.0	NaN	NaN
8	B	NaN	NaN	NaN
9	C	NaN	NaN	NaN
10	C	5.0	5.0	NaN