Search code examples
pandasbitmaskdummy-variableone-hot-encoding

How to encode integer masks as bits into dummy variables in pandas


I would like to encode integer masks stored in pandas dataframe column into respective binary features which correspond to bit positions in those integers. For example, given 4-bit integers, and a decimal value of 11 I would like to derive 4 columns with values 1, 0, 1, 1, and so on across entire column.


Solution

  • You can use:

    df = pd.DataFrame([list('{0:04b}'.format(x)) for x in df['col']], index=df.index).astype(int)
    

    Thank you, @pir for python 3.6+ solution:

    df = pd.DataFrame([list(f'{i:04b}') for i in df['col'].values], df.index)
    

    Numpy

    Convert array to DataFrame - solution from this answer, also added slicing for swap values per rows:

    d = df['col'].values
    m = 4
    df = pd.DataFrame((((d[:,None] & (1 << np.arange(m)))) > 0)[:, ::-1].astype(int))
    #alternative
    #df = pd.DataFrame((((d[:,None] & (1 << np.arange(m-1,-1,-1)))) > 0).astype(int))
    

    Or:

    df = pd.DataFrame(np.unpackbits(d[:,None].astype(np.uint8), axis=1)[:,-m:])