I have a dataframe that looks something like this:
ID | hiqual | Wave |
---|---|---|
1 | 1.0 | g |
1 | NaN | i |
1 | NaN | k |
2 | 1.0 | g |
2 | NaN | i |
2 | NaN | k |
3 | 1.0 | g |
3 | NaN | i |
4 | 5.0 | g |
4 | NaN | i |
This is a long format dataframe and I have my hiqual
variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.
I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?
If you dataframe is already ordered by ID
and wave
columns, you can simply fill forward values:
>>> df.sort_values(['ID', 'Wave']).ffill()
ID hiqual Wave
0 1 1.0 g
1 1 1.0 i
2 1 1.0 k
3 2 1.0 g
4 2 1.0 i
5 2 1.0 k
6 3 1.0 g
7 3 1.0 i
8 4 5.0 g
9 4 5.0 i
You can also use explicitly g
values:
g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)
# Output
ID hiqual Wave
0 1 1.0 g
1 1 1.0 i
2 1 1.0 k
3 2 1.0 g
4 2 1.0 i
5 2 1.0 k
6 3 1.0 g
7 3 1.0 i
8 4 5.0 g
9 4 5.0 i
# g_vals
ID
1 1.0
2 1.0
3 1.0
4 5.0
Name: hiqual, dtype: float64