Search code examples
pythonpandasdataframefillna

How Do I Fill Blank Space Before and After Filled Cells?


I am working to try to group my data by a Department property and then once I have it grouped fill in the space above and below two fields that are filled in at the middle of the dataset (Rating and Number).

I have tried getting the groupby to work but to no avail. My plan is to get the groupby working then apply the following code to see if I can get the fill to work correctly.

# This won't work on its own because I need to group the data first.    
df = df.mask(df == 0).ffill()

This is what I am starting with:

| Department   | Range    | Rating       | Number |
|--------------|----------|--------------|--------|
| Admin        | 0 (None) |              |        |
| Admin        | 01 to 3  |              |        |
| Admin        | 01 to 3  |              |        |
| Admin        | 01 to 3  |              |        |
| Admin        | 04 to 6  | 2. On Target | 2      |
| Admin        | 04 to 6  | 2. On Target | 2      |
| Admin        | 04 to 6  | 2. On Target | 2      |
| Admin        | 07 to 10 |              |        |
| Admin        | 07 to 10 |              |        |
| Admin        | 07 to 10 |              |        |
| Admin        | 07 to 10 |              |        |
| Distribution | 0 (None) |              |        |
| Distribution | 01 to 3  |              |        |
| Distribution | 01 to 3  |              |        |
| Distribution | 01 to 3  |              |        |
| Distribution | 04 to 6  | 2. On Target | 2      |
| Distribution | 04 to 6  | 2. On Target | 2      |
| Distribution | 04 to 6  | 2. On Target | 2      |
| Distribution | 07 to 10 |              |        |
| Distribution | 07 to 10 |              |        |
| Distribution | 07 to 10 |              |        |
| Distribution | 07 to 10 |              |        |

This is what I want

| Department   | Range    | Rating       | Number |
|--------------|----------|--------------|--------|
| Admin        | 0 (None) | 1. Too Low   | 1      |
| Admin        | 01 to 3  | 1. Too Low   | 1      |
| Admin        | 01 to 3  | 1. Too Low   | 1      |
| Admin        | 01 to 3  | 1. Too Low   | 1      |
| Admin        | 04 to 6  | 2. On Target | 2      |
| Admin        | 04 to 6  | 2. On Target | 2      |
| Admin        | 04 to 6  | 2. On Target | 2      |
| Admin        | 07 to 10 | 3. Too High  | 3      |
| Admin        | 07 to 10 | 3. Too High  | 3      |
| Admin        | 07 to 10 | 3. Too High  | 3      |
| Admin        | 07 to 10 | 3. Too High  | 3      |
| Distribution | 0 (None) | 1. Too Low   | 1      |
| Distribution | 01 to 3  | 1. Too Low   | 1      |
| Distribution | 01 to 3  | 1. Too Low   | 1      |
| Distribution | 01 to 3  | 1. Too Low   | 1      |
| Distribution | 04 to 6  | 2. On Target | 2      |
| Distribution | 04 to 6  | 2. On Target | 2      |
| Distribution | 04 to 6  | 2. On Target | 2      |
| Distribution | 07 to 10 | 3. Too High  | 3      |
| Distribution | 07 to 10 | 3. Too High  | 3      |
| Distribution | 07 to 10 | 3. Too High  | 3      |
| Distribution | 07 to 10 | 3. Too High  | 3      |

Is there any dynamic way to do this?


Solution

  • You can use pd.concat with groupby and utilise a custom function for filling logic:

    # convert to numeric
    df['Number'] = pd.to_numeric(df['Number'])
    
    # assign values by index
    def filler(x):
        idx = np.where(x['Number'].notnull())[0]
        x.iloc[:idx[0], -2:] = ['1. Too Low', 1]
        x.iloc[idx[-1]+1:, -2:] = ['3. Too High', 3]
        return x
    
    # concatenate transformed dataframe slices
    res = pd.concat(df_slice.pipe(filler) for _, df_slice in df.groupby('Department'))
    

    Result:

    print(res)
    
          Department     Range        Rating  Number
    0          Admin  0 (None)    1. Too Low     1.0
    1          Admin   01 to 3    1. Too Low     1.0
    2          Admin   01 to 3    1. Too Low     1.0
    3          Admin   01 to 3    1. Too Low     1.0
    4          Admin   04 to 6  2. On Target     2.0
    5          Admin   04 to 6  2. On Target     2.0
    6          Admin   04 to 6  2. On Target     2.0
    7          Admin  07 to 10   3. Too High     3.0
    8          Admin  07 to 10   3. Too High     3.0
    9          Admin  07 to 10   3. Too High     3.0
    10         Admin  07 to 10   3. Too High     3.0
    11  Distribution  0 (None)    1. Too Low     1.0
    12  Distribution   01 to 3    1. Too Low     1.0
    13  Distribution   01 to 3    1. Too Low     1.0
    14  Distribution   01 to 3    1. Too Low     1.0
    15  Distribution   04 to 6  2. On Target     2.0
    16  Distribution   04 to 6  2. On Target     2.0
    17  Distribution   04 to 6  2. On Target     2.0
    18  Distribution  07 to 10   3. Too High     3.0
    19  Distribution  07 to 10   3. Too High     3.0
    20  Distribution  07 to 10   3. Too High     3.0
    21  Distribution  07 to 10   3. Too High     3.0