Search code examples
pythonpandasnumpymissing-datarecode

Create dummy for missing values for variable in Python


I have the following dataframe in:

  a
1 3
2 2
3 Nan
4 3
5 Nan

I need to recode this column so it looks like this:

  df_miss_a
1 0
2 0
3 1
4 0
5 1

I've tried:

df_miss_a = np.where(df['a'] == 'Nan', 1, 0)

and

df_miss_a = np.where(df['a'] == Nan, 1, 0)

The above outputs only 0s.

The format of the output is unimportant.


Solution

  • If you have NaNs in your column you can use pd.Series.isna():

    df_miss_a = df["a"].isna().astype(int)
    print(df_miss_a)
    

    Prints:

    1    0
    2    0
    3    1
    4    0
    5    1
    Name: a, dtype: int64