I have a time series data of about 4000 patients that has missing values and I want to impute NaN values using MissForest algorithm in Python on each patient file separately.
The data looks like this:
HR | Resp | P_ID |
---|---|---|
72.0 | 18.0 | 1 |
NaN | 15.0 | 1 |
80.0 | NaN | 1 |
NaN | 16.0 | 1 |
79.5 | NaN | 1 |
NaN | 19.0 | 2 |
79.5 | 22.5 | 2 |
NaN | NaN | 2 |
NaN | 16.0 | 2 |
85.0 | NaN | 3 |
NaN | 14.5 | 3 |
76.4 | NaN | 3 |
NaN | NaN | 4 |
80.5 | 19.5 | 4 |
75.3 | 18.0 | 4 |
NaN | 21.5 | 4 |
Now, I want to impute NaN values within the patients data in column based on P_ID. Like it will impute P_ID = 1, then P_ID = 2 and so on. Not the imputation on the whole column. The code I am using will impute NaN on whole column of all patients, not in individual Patients column, then the next patient.
imputer = MissForest(max_iter=12, n_jobs=-1)
X_imputed = imputer.fit_transform(df)
df1 = pd.DataFrame(X_imputed)
df1.head()
I did the Mean Imputation within patient itself using the following code, but can't figure out how I can use it for MissForest.
for i in ['HR','Resp']:
df[i] = df[i].fillna(df.groupby('P_ID')[i].transform('mean'))
One solution is I make 4000 data frames of each patient, impute them using MissForest, then combine them together. That will be a hectic task. So I want a solution with looping over the entire dataframe. Kindly help. Thanks.
You can use the following to go through the P_ID
s, then apply the MissForest
only on the filtered values:
for idx in df["ID"].unique():
# check if the column "Resp" is all nan
if not df[df.ID == idx].Resp.any():
df.loc[df.ID == idx, "Resp"] = df.loc[df.ID == idx, "Resp"].fillna(0)
imputer = MissForest(max_iter=12, n_jobs=-1)
x_imp = imputer.fit_transform(df[df.ID == idx])
df.loc[df.ID == idx, :] = x_imp
This gives you:
| | HR | Resp | ID |
|---:|--------:|--------:|-----:|
| 0 | 72 | 18 | 1 |
| 1 | 79.5942 | 15 | 1 |
| 2 | 80 | 15.4617 | 1 |
| 3 | 79.5942 | 16 | 1 |
| 4 | 79.5 | 15.4617 | 1 |
| 5 | 79.5 | 19 | 2 |
| 6 | 79.5 | 22.5 | 2 |
| 7 | 79.5 | 18.9217 | 2 |
| 8 | 79.5 | 16 | 2 |
| 9 | 85 | 14.5 | 3 |
| 10 | 80.786 | 14.5 | 3 |
| 11 | 76.4 | 14.5 | 3 |
| 12 | 79.148 | 20.885 | 4 |
| 13 | 80.5 | 19.5 | 4 |
| 14 | 75.3 | 18 | 4 |
| 15 | 79.148 | 21.5 | 4 |