Search code examples
pythonpandasnanone-hot-encoding

One hot vector in pandas to encode missing values


I am working with a large pandas dataframe and a few columns have lots of missing data. I am not totally confident with my imputation and I believe the presence or absence of data for these variables could be useful information, so I would like to add another column of the dataframe with 0 where the entry is missing and 1 otherwise. Is there a quick/efficient way to do this in pandas?


Solution

  • Try out the following:

    df['New_Col'] = df['Col'].notna().astype('uint8')
    

    Where Col it your column containing np.nan values and New_Col your binary target column indicating whether Col contains np.nan.