Search code examples
pythonpandasdataframetime-seriesfillna

Fill missing values using MissForest algorithm on each group in column in python


I have a time series data of about 4000 patients that has missing values and I want to impute NaN values using MissForest algorithm in Python on each patient file separately.

The data looks like this:

HR Resp P_ID
72.0 18.0 1
NaN 15.0 1
80.0 NaN 1
NaN 16.0 1
79.5 NaN 1
NaN 19.0 2
79.5 22.5 2
NaN NaN 2
NaN 16.0 2
85.0 NaN 3
NaN 14.5 3
76.4 NaN 3
NaN NaN 4
80.5 19.5 4
75.3 18.0 4
NaN 21.5 4

Now, I want to impute NaN values within the patients data in column based on P_ID. Like it will impute P_ID = 1, then P_ID = 2 and so on. Not the imputation on the whole column. The code I am using will impute NaN on whole column of all patients, not in individual Patients column, then the next patient.

imputer = MissForest(max_iter=12, n_jobs=-1)
X_imputed = imputer.fit_transform(df)
df1 = pd.DataFrame(X_imputed)
df1.head()

I did the Mean Imputation within patient itself using the following code, but can't figure out how I can use it for MissForest.

for i in ['HR','Resp']:
    df[i] = df[i].fillna(df.groupby('P_ID')[i].transform('mean'))

One solution is I make 4000 data frames of each patient, impute them using MissForest, then combine them together. That will be a hectic task. So I want a solution with looping over the entire dataframe. Kindly help. Thanks.


Solution

  • You can use the following to go through the P_IDs, then apply the MissForest only on the filtered values:

    for idx in df["ID"].unique():
        # check if the column "Resp" is all nan
        if not df[df.ID == idx].Resp.any():
            df.loc[df.ID == idx, "Resp"] = df.loc[df.ID == idx, "Resp"].fillna(0)
        imputer = MissForest(max_iter=12, n_jobs=-1)
        x_imp = imputer.fit_transform(df[df.ID == idx])
        df.loc[df.ID == idx, :] = x_imp
    

    This gives you:

    |    |      HR |    Resp |   ID |
    |---:|--------:|--------:|-----:|
    |  0 | 72      | 18      |    1 |
    |  1 | 79.5942 | 15      |    1 |
    |  2 | 80      | 15.4617 |    1 |
    |  3 | 79.5942 | 16      |    1 |
    |  4 | 79.5    | 15.4617 |    1 |
    |  5 | 79.5    | 19      |    2 |
    |  6 | 79.5    | 22.5    |    2 |
    |  7 | 79.5    | 18.9217 |    2 |
    |  8 | 79.5    | 16      |    2 |
    |  9 | 85      | 14.5    |    3 |
    | 10 | 80.786  | 14.5    |    3 |
    | 11 | 76.4    | 14.5    |    3 |
    | 12 | 79.148  | 20.885  |    4 |
    | 13 | 80.5    | 19.5    |    4 |
    | 14 | 75.3    | 18      |    4 |
    | 15 | 79.148  | 21.5    |    4 |