Search code examples
pythonpandasdataframedummy-variable

Python Generate dummy in dataframe based on another variable


I have dataframe with many variables. I would like to generate a dummy variable based on column 1, for example. If column 1's observation is NaN, then the dummy variable is filled with 0. If column 1' observation is not missing, then the dummy variable is filled with 1. Any ideas? Thanks a lot.


Solution

  • This is the easiest way:

    # sample data
    import pandas as pd 
    import numpy as np
    df = pd.DataFrame()
    df['sample'] = [1,2,np.nan,4,5,np.nan]
    
    # create dummy column
    df['dummy'] = np.where(df['sample'].isna(),0,1)