Search code examples
pythonpandasdataframelookupfillna

Populate NaN cells in dataframe table based on reference table based on specific row(s) and column values


I have two tables. The first reference table is below:

| Name | Target  | Bonus |
|------|--------:|------:|
| Joe  |      40 |    46 |
| Phil |      38 |    42 |
| Dean |      65 |    70 |

The Python code to generate the table is:

# Data for the table
data = {
    'Name': ['Joe', 'Phil', 'Dean'],
    'Target': [40, 38, 65],
    'Bonus': [46, 42, 70]
}

# Creating the DataFrame
ref = pd.DataFrame(data)

My second table is below:

| week       | Metrics | Joe | Dean |
|------------|---------|----:|-----:|
| 11/6/2023  | Target  |  40 |   65 |
| 11/6/2023  | Bonus   |  46 |   70 |
| 11/6/2023  | Score   |  33 |   71 |
| 11/13/2023 | Target  |  40 |  NaN |
| 11/13/2023 | Bonus   |  46 |  NaN |
| 11/13/2023 | Score   |  45 |  NaN |
| 11/20/2023 | Target  |  40 |   65 |
| 11/20/2023 | Bonus   |  46 |   70 |
| 11/20/2023 | Score   |  35 |   68 |
| 11/27/2023 | Target  | NaN |   65 |
| 11/27/2023 | Bonus   | NaN |   70 |
| 11/27/2023 | Score   | NaN |   44 |
| 12/4/2023  | Target  |  40 |   65 |
| 12/4/2023  | Bonus   |  46 |   70 |
| 12/4/2023  | Score   |  42 |   66 |

The Python code to generate this table is:

# Data for the new table
data = {
    'week': ['11/6/2023', '11/6/2023', '11/6/2023', '11/13/2023', '11/13/2023', '11/13/2023',
             '11/20/2023', '11/20/2023', '11/20/2023', '11/27/2023', '11/27/2023', '11/27/2023',
             '12/4/2023', '12/4/2023', '12/4/2023'],
    'Metrics': ['Target', 'Bonus', 'Score', 'Target', 'Bonus', 'Score',
                'Target', 'Bonus', 'Score', 'Target', 'Bonus', 'Score',
                'Target', 'Bonus', 'Score'],
    'Joe': [40, 46, 33, 40, 46, 45, 40, 46, 35, None, None, None, 40, 46, 42],
    'Dean': [65, 70, 71, None, None, None, 65, 70, 68, 65, 70, 44, 65, 70, 66]
}

# Creating the DataFrame
df = pd.DataFrame(data)

As you can see Dean has a week where his Target, Bonus, and Score cells are blank. So does Joe in a later week. In these specific instances where the cell is NaN I want to populate them using the following rules:

  • Get Target and Bonus cell values for each person from the first reference table and populate the NaN cell accordingly.
  • Set the Score cell equal to the Target cell value for the person.

My desired output table would look like this:

| week       | Metrics | Joe | Dean |
|------------|---------|----:|-----:|
| 11/6/2023  | Target  |  40 |   65 |
| 11/6/2023  | Bonus   |  46 |   70 |
| 11/6/2023  | Score   |  33 |   71 |
| 11/13/2023 | Target  |  40 |   65 |
| 11/13/2023 | Bonus   |  46 |   70 |
| 11/13/2023 | Score   |  45 |   65 |
| 11/20/2023 | Target  |  40 |   65 |
| 11/20/2023 | Bonus   |  46 |   70 |
| 11/20/2023 | Score   |  35 |   68 |
| 11/27/2023 | Target  |  40 |   65 |
| 11/27/2023 | Bonus   |  46 |   70 |
| 11/27/2023 | Score   |  40 |   44 |
| 12/4/2023  | Target  |  40 |   65 |
| 12/4/2023  | Bonus   |  46 |   70 |
| 12/4/2023  | Score   |  42 |   66 |

Solution

  • Only one block of NaN per column at most

    Another possible solution, which loops through the df columns corresponding to each person and, for each block of NaN (identified by loc), assigns the corresponding block of values in ref (also identified by loc):

    names = ['Joe', 'Dean']
    
    d = ref.assign(Score = ref['Target'])
    
    for x in names:
        df.loc[df[x].isna(), x] = d.loc[d['Name'].eq(x), 'Target':'Score'].T.values
    

    General case

    In case there is more than a single block of NaN per person, we need to change the code slightly:

    names = ['Joe', 'Dean']
    
    d = ref.assign(Score = ref['Target'])
    
    for x in names:
        n_blocks = df[x].isna().sum() // 3
        df.loc[df[x].isna(), x] = np.tile(d.loc[d['Name'].eq(x), 'Target':'Score']
                                          .values.flatten(), n_blocks)
    

    Edit

    To satisfy the new requirement of the OP: Instead of order Target, Bonus and Score, it is needed the order Bonus, Target and Score. In such a case, we need to readjust the previous code:

    names = ['Joe', 'Dean']
    
    d = ref.assign(Score = ref['Target'])
    d = d[['Name', 'Bonus', 'Target', 'Score']]
    
    for x in names:
        n_blocks = df[x].isna().sum() // 3
        df.loc[df[x].isna(), x] = np.tile(d.loc[d['Name'].eq(x), 'Bonus':'Score']
                                          .values.flatten(), n_blocks)
    

    Output:

              week Metrics   Joe  Dean
    0    11/6/2023  Target  40.0  65.0
    1    11/6/2023   Bonus  46.0  70.0
    2    11/6/2023   Score  33.0  71.0
    3   11/13/2023  Target  40.0  65.0
    4   11/13/2023   Bonus  46.0  70.0
    5   11/13/2023   Score  45.0  65.0
    6   11/20/2023  Target  40.0  65.0
    7   11/20/2023   Bonus  46.0  70.0
    8   11/20/2023   Score  35.0  68.0
    9   11/27/2023  Target  40.0  65.0
    10  11/27/2023   Bonus  46.0  70.0
    11  11/27/2023   Score  40.0  44.0
    12   12/4/2023  Target  40.0  65.0
    13   12/4/2023   Bonus  46.0  70.0
    14   12/4/2023   Score  42.0  66.0