pythonpandasdataframe# Pandas: How to fill NaN within a group, only if a certain column contains NaN

I am trying to use Pandas to transform a df from this:

Account | Value | OtherColumn | Another Column | |
---|---|---|---|---|

0 | A | 1.0 | 1.0 | 1.0 |

1 | A | NaN | NaN | NaN |

2 | A | NaN | NaN | NaN |

3 | A | 6.0 | NaN | NaN |

4 | A | NaN | NaN | NaN |

5 | B | 3.0 | 2.0 | NaN |

6 | B | NaN | NaN | NaN |

7 | B | 4.0 | NaN | NaN |

8 | B | NaN | NaN | NaN |

9 | C | NaN | NaN | NaN |

10 | C | 5.0 | 5.0 | NaN |

To:

Account | Value | OtherColumn | Another Column | |
---|---|---|---|---|

0 | A | 1.0 | 1.0 | 1.0 |

1 | A | 1.0 |
1.0 |
1.0 |

2 | A | 1.0 |
1.0 |
1.0 |

3 | A | 6.0 | NaN | NaN |

4 | A | 6.0 |
NaN | NaN |

5 | B | 3.0 | 2.0 | NaN |

6 | B | 3.0 |
2.0 |
NaN |

7 | B | 4.0 | NaN | NaN |

8 | B | 4.0 |
NaN | NaN |

9 | C | 5.0 |
5.0 |
NaN |

10 | C | 5.0 | 5.0 | NaN |

What I would like to do is, within a group, is to fill missing data though a combination of ffill() and bfill(). However, I have certain requirements:

If 'Value' is *not* NaN, then I don't want any columns in that row filled.

If 'Value' is NaN it should be filled with the first row above in which 'Value' is not NaN, within the group 'Account'. Then the process should be repeated by filling backward.

What I tried was this:

```
data = {'Account': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
'Value': [1.0, np.nan, np.nan, 6.0, np.nan, 3.0, np.nan, 4.0, np.nan, np.nan, 5.0],
'OtherColumn': [1.0, np.nan, np.nan, np.nan, np.nan, 2.0, np.nan, np.nan, np.nan, np.nan, 5.0],
'Another Column': [1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
df_filled = df.groupby('Account').ffill().bfill()
print(df_filled)
```

However, this returns a dataframe that looks like:

Value | OtherColumn | Another Column | |
---|---|---|---|

0 | 1.0 | 1.0 | 1.0 |

1 | 1.0 | 1.0 | 1.0 |

2 | 1.0 | 1.0 | 1.0 |

3 | 6.0 | 1.0 |
1.0 |

4 | 6.0 | 1.0 |
1.0 |

5 | 3.0 | 2.0 | NaN |

6 | 3.0 | 2.0 | NaN |

7 | 4.0 | 2.0 |
NaN |

8 | 4.0 | 2.0 |
NaN |

9 | 5.0 | 5.0 | NaN |

10 | 5.0 | 5.0 | NaN |

As you can see, on rows 3 and 4, the OtherColumn and Another Column have been filled with 1.0. I want these rows to remain as NaN. Same with rows 7 and 8 which are filled with the value of 2.0.

I cannot use ffill(1) because the number of rows that need to be forward (or back) filled varies)

Solution

Here I'm doing ffill and bfill on Value column first because it looks like your business logic is to ffill and bfill `OtherColumn`

and `Another Column`

based on group of `Account`

and `Value`

or at least I think it should be based on what you have described.

Once we do that, then we can group the dataframe based on `Account`

and `Value`

and ffill and bfill the group. I'm using here `apply`

to do the ffill and bfill since chaining the two methods `.ffill().bfill()`

results in a behavior not expected (since result of ffill would be a series and we are applying bfill to a series instead of dataframe). See this link for more info

```
import pandas as pd
import numpy as np
data = {'Account': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
'Value': [1.0, np.nan, np.nan, 6.0, np.nan, 3.0, np.nan, 4.0, np.nan, np.nan, 5.0],
'OtherColumn': [1.0, np.nan, np.nan, np.nan, np.nan, 2.0, np.nan, np.nan, np.nan, np.nan, 5.0],
'Another Column': [1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
df['Value'] = df.groupby('Account')['Value'].ffill().bfill()
df_filled = df.groupby(['Account', 'Value']).apply(lambda x: x.ffill().bfill())
print(df_filled)
```

Output

```
Account Value OtherColumn Another Column
Account Value
A 1.0 0 A 1.0 1.0 1.0
1 A 1.0 1.0 1.0
2 A 1.0 1.0 1.0
6.0 3 A 6.0 NaN NaN
4 A 6.0 NaN NaN
B 3.0 5 B 3.0 2.0 NaN
6 B 3.0 2.0 NaN
4.0 7 B 4.0 NaN NaN
8 B 4.0 NaN NaN
C 5.0 9 C 5.0 5.0 NaN
10 C 5.0 5.0 NaN
```

EDIT: You can drop level to get the dataframe without multilevel index

```
df_filled = df_filled.droplevel(['Account', 'Value'])
```

Output

```
Account Value OtherColumn Another Column
0 A 1.0 1.0 1.0
1 A 1.0 1.0 1.0
2 A 1.0 1.0 1.0
3 A 6.0 NaN NaN
4 A 6.0 NaN NaN
5 B 3.0 2.0 NaN
6 B 3.0 2.0 NaN
7 B 4.0 NaN NaN
8 B 4.0 NaN NaN
9 C 5.0 5.0 NaN
10 C 5.0 5.0 NaN
```

- Python Jinja2 LaTeX Table
- Getting attributes of a class
- How can I print many significant figures in Python?
- How to allow list append() method to return the new list
- Calculate Last Friday of Month in Pandas
- Python type hint for Iterable[str] that isn't str
- How to iterate over a list in chunks
- How to exit the entire application from a Python thread?
- Running shell command and capturing the output
- How do I pass a variable by reference?
- Convert range(r) to list of strings of length 2 in python
- How can I get the start and end dates for each week?
- how to use send_message() in python-telegram-bot
- Python conditional replacement based on element type
- How can I count the number of items in an arbitrary iterable (such as a generator)?
- Find longest consecutive range of numbers in list
- Insert text in braces with asyncpg
- How does one put a link / url to the web-site's home page in Django?
- How to determine if a path is a subdirectory of another?
- Custom Keybindings for Ipython terminal
- FastAPI asynchronous background tasks blocks other requests?
- How to make sure that information from one file is duplicated into several text documents, without specific lines
- Installing a Python environment with Anaconda
- sklearn pipeline model predicting same results for all input
- Brew command not found after installing Anaconda Python
- How to get an XPath from selenium webelement or from lxml?
- Pipe PuTTY console to Python script
- How to align the axes of a figure in matplotlib?
- Persist ParentDocumentRetriever of langchain
- How to reset index in a pandas dataframe?