I have two tables. The first reference table is below:
| Name | Target | Bonus |
|------|--------:|------:|
| Joe | 40 | 46 |
| Phil | 38 | 42 |
| Dean | 65 | 70 |
The Python code to generate the table is:
# Data for the table
data = {
'Name': ['Joe', 'Phil', 'Dean'],
'Target': [40, 38, 65],
'Bonus': [46, 42, 70]
}
# Creating the DataFrame
ref = pd.DataFrame(data)
My second table is below:
| week | Metrics | Joe | Dean |
|------------|---------|----:|-----:|
| 11/6/2023 | Target | 40 | 65 |
| 11/6/2023 | Bonus | 46 | 70 |
| 11/6/2023 | Score | 33 | 71 |
| 11/13/2023 | Target | 40 | NaN |
| 11/13/2023 | Bonus | 46 | NaN |
| 11/13/2023 | Score | 45 | NaN |
| 11/20/2023 | Target | 40 | 65 |
| 11/20/2023 | Bonus | 46 | 70 |
| 11/20/2023 | Score | 35 | 68 |
| 11/27/2023 | Target | NaN | 65 |
| 11/27/2023 | Bonus | NaN | 70 |
| 11/27/2023 | Score | NaN | 44 |
| 12/4/2023 | Target | 40 | 65 |
| 12/4/2023 | Bonus | 46 | 70 |
| 12/4/2023 | Score | 42 | 66 |
The Python code to generate this table is:
# Data for the new table
data = {
'week': ['11/6/2023', '11/6/2023', '11/6/2023', '11/13/2023', '11/13/2023', '11/13/2023',
'11/20/2023', '11/20/2023', '11/20/2023', '11/27/2023', '11/27/2023', '11/27/2023',
'12/4/2023', '12/4/2023', '12/4/2023'],
'Metrics': ['Target', 'Bonus', 'Score', 'Target', 'Bonus', 'Score',
'Target', 'Bonus', 'Score', 'Target', 'Bonus', 'Score',
'Target', 'Bonus', 'Score'],
'Joe': [40, 46, 33, 40, 46, 45, 40, 46, 35, None, None, None, 40, 46, 42],
'Dean': [65, 70, 71, None, None, None, 65, 70, 68, 65, 70, 44, 65, 70, 66]
}
# Creating the DataFrame
df = pd.DataFrame(data)
As you can see Dean has a week where his Target, Bonus, and Score cells are blank. So does Joe in a later week. In these specific instances where the cell is NaN I want to populate them using the following rules:
My desired output table would look like this:
| week | Metrics | Joe | Dean |
|------------|---------|----:|-----:|
| 11/6/2023 | Target | 40 | 65 |
| 11/6/2023 | Bonus | 46 | 70 |
| 11/6/2023 | Score | 33 | 71 |
| 11/13/2023 | Target | 40 | 65 |
| 11/13/2023 | Bonus | 46 | 70 |
| 11/13/2023 | Score | 45 | 65 |
| 11/20/2023 | Target | 40 | 65 |
| 11/20/2023 | Bonus | 46 | 70 |
| 11/20/2023 | Score | 35 | 68 |
| 11/27/2023 | Target | 40 | 65 |
| 11/27/2023 | Bonus | 46 | 70 |
| 11/27/2023 | Score | 40 | 44 |
| 12/4/2023 | Target | 40 | 65 |
| 12/4/2023 | Bonus | 46 | 70 |
| 12/4/2023 | Score | 42 | 66 |
Only one block of NaN per column at most
Another possible solution, which loops through the df
columns corresponding to each person and, for each block of NaN
(identified by loc
), assigns the corresponding block of values in ref
(also identified by loc
):
names = ['Joe', 'Dean']
d = ref.assign(Score = ref['Target'])
for x in names:
df.loc[df[x].isna(), x] = d.loc[d['Name'].eq(x), 'Target':'Score'].T.values
General case
In case there is more than a single block of NaN
per person, we need to change the code slightly:
names = ['Joe', 'Dean']
d = ref.assign(Score = ref['Target'])
for x in names:
n_blocks = df[x].isna().sum() // 3
df.loc[df[x].isna(), x] = np.tile(d.loc[d['Name'].eq(x), 'Target':'Score']
.values.flatten(), n_blocks)
Edit
To satisfy the new requirement of the OP: Instead of order Target, Bonus and Score, it is needed the order Bonus, Target and Score. In such a case, we need to readjust the previous code:
names = ['Joe', 'Dean']
d = ref.assign(Score = ref['Target'])
d = d[['Name', 'Bonus', 'Target', 'Score']]
for x in names:
n_blocks = df[x].isna().sum() // 3
df.loc[df[x].isna(), x] = np.tile(d.loc[d['Name'].eq(x), 'Bonus':'Score']
.values.flatten(), n_blocks)
Output:
week Metrics Joe Dean
0 11/6/2023 Target 40.0 65.0
1 11/6/2023 Bonus 46.0 70.0
2 11/6/2023 Score 33.0 71.0
3 11/13/2023 Target 40.0 65.0
4 11/13/2023 Bonus 46.0 70.0
5 11/13/2023 Score 45.0 65.0
6 11/20/2023 Target 40.0 65.0
7 11/20/2023 Bonus 46.0 70.0
8 11/20/2023 Score 35.0 68.0
9 11/27/2023 Target 40.0 65.0
10 11/27/2023 Bonus 46.0 70.0
11 11/27/2023 Score 40.0 44.0
12 12/4/2023 Target 40.0 65.0
13 12/4/2023 Bonus 46.0 70.0
14 12/4/2023 Score 42.0 66.0