Search code examples
pythonpandasdataframeexport-to-excelnonetype

How to separate between empty string, np.nan and None when using to_excel(...) function for pandas dataframe?


After constructing data frame df containing np.nan, None and empty string ('') values in Python 3.8.3 and Pandas 1.0.4

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'b':None,'c':''},index=[0])
df2 = pd.DataFrame({'a':1,'b':1,'c':None},index=[0])
df = pd.concat([df1,df2], axis=0, sort=True)
print(df)

data frame df looks like

     a     b     c
0  NaN  None      
0  1.0     1  None

Now I would like to store the values to Excel using to_excel() function. However, after running command

df.to_excel('nan_none_empty.xlsx')

the result rather looks like

excel_picture

with np.nan, None and empty string ('') not being separable from each other.

It is possible to separate empty string ('') from np.nan and None with option na_rep as follows

df.to_excel('nan_none_empty2.xlsx',na_rep='?')

giving result

enter image description here

But it seems like, for this problem, I am running out of options for to_excel(...) function in order to be able to separate np.nan from None in the Excel export.

How could one neatly separate between np.nan and None when exporting df to Excel?


Solution

  • If you'd like to distinguish between different null types, your best bet is to replace values before exporting to Excel. Converting to a string is one way to make sure you are not conflating None, np.NaN, pd.NaT, etc...

    df1 = pd.DataFrame({'b':None,'c':''},index=[0])
    df2 = pd.DataFrame({'a':1,'b':1,'c':None},index=[0])
    df = pd.concat([df1,df2], axis=0, sort=True)
    
    null_map = {'nan': '-',
                'None': '?'} # Add the string representation for other types you may need
    
    df = df.applymap(lambda x:
                     null_map[str(x)]
                     if str(x) in null_map
                     else x)
    df.to_excel('nan_none_empty2.xlsx')