I am working with a large pandas dataframe and a few columns have lots of missing data. I am not totally confident with my imputation and I believe the presence or absence of data for these variables could be useful information, so I would like to add another column of the dataframe with 0 where the entry is missing and 1 otherwise. Is there a quick/efficient way to do this in pandas?
Try out the following:
df['New_Col'] = df['Col'].notna().astype('uint8')
Where Col
it your column containing np.nan
values and New_Col
your binary target column indicating whether Col
contains np.nan
.