Search code examples
pandasdataframebase64

Decode base64 on a python dataframe with missing values


I have a dataframe with a column val_string that is sometimes populated with a base64 encoded string, and other times is NaN.

df
    base_type   val_int     val_string
0     integer        34            NaN
1      string       NaN   c3RyaW5nMQ==
2     integer       108            NaN
3     integer      3586            NaN
4      string       NaN   c3RyaW5nMg==

How do I apply base64.b64decode to only the rows that have a val_string that is not NaN?

I tried this, only to get a strange OSError: could not get source code:

df['val_string'] = df['val_string'].apply(lambda x: df['val_string'] if pd.isna(df['val_string']) else base64.b64decode(x))

Any help would be much appreciated!


Solution

  • Use boolean indexing:

    from base64 import b64decode
    
    m = df['val_string'].notna()
    df.loc[m, 'val_string'] = df.loc[m, 'val_string'].apply(b64decode)
    

    Or with your approach:

    from base64 import b64decode
    
    df['val_string'] = df['val_string'].apply(lambda x: x if pd.isna(x)
                                              else b64decode(x))
    

    Output:

      base_type  val_int  val_string
    0   integer     34.0         NaN
    1    string      NaN  b'string1'
    2   integer    108.0         NaN
    3   integer   3586.0         NaN
    4    string      NaN  b'string2'