Search code examples
pythonexcelpandaspandas.excelwriter

python pandas ExcelWriter issue while formatting cells as text


I have a requirement to color my excel sheet column which is written from a pandas dataframe.

Here is a sample I will give and explain the problem I am facing.

df=pd.DataFrame(columns=['A2','B18'],data=[[10,123415209123456789],[12,1223152012345678088],[10,1234152091234566666],[12,1223152012345678777]],dtype='object')
print(df)

   A2                  B18
0  10   123415209123456789
1  12  1223152012345678088
2  10  1234152091234566666
3  12  1223152012345678777

print(df.info)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
#   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
0   A2      4 non-null      object
1   B18     4 non-null      object
dtypes: object(2)
memory usage: 192.0+ bytes
None

I do have some complex formatting requirements to color, align not only the data but also the header. And thats the reason I thought of using the excelwriter rather than pandas styler.

But there, I am facing an issue with text formatting

#Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(dataFolder + 'tmp_12_test126.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer,sheet_name='Sheet1',index=False)

# Get the xlsxwriter workbook and worksheet objects.
workbook  = writer.book

col_format=[{},{'bg_color': 'cyan', 'num_format': '@'}]

# Start iterating through the columns and the rows to apply the format
for col in range(df.shape[1]):
    for row in range(df.shape[0]):
        if len(col_format[col]) > 0 :
           worksheet.write(row+1,col,df.iloc[row,col],workbook.add_format(col_format[col]))

# Finally write the file
writer.save()

But this writes the B18 column converting it into a scientific notation and causing the lose of last 4 digits value.

Can any one help what am I doing wrong ? I read through and found that 'num_format': '@' is what is required to force the cell to a text formatting. But for some reason it is not working.


Solution

  • Using dtype=object does not convert numbers to strings. Instead it tries to coerce each value into the most appropriate python type. If you want strings instead of numbers, you can either make the default type dtype=str (which coerces all columns) or choose a different type for each column using astype():

    import pandas as pd
    
    df1=pd.DataFrame(columns=['A2','B18'],
                     data=[[10,123415209123456789],
                           [12,1223152012345678088],
                           [10,1234152091234566666],
                           [12,1223152012345678777]],
                     dtype=object)
    
    print('df1 types:')
    [print(idx,type(val[0]),type(val[1])) for idx,val in df1.iterrows()]
    
    df2=pd.DataFrame(columns=['A2','B18'],
                     data=[[10,123415209123456789],
                           [12,1223152012345678088],
                           [10,1234152091234566666],
                           [12,1223152012345678777]],
                     dtype=str)
    
    print('df2 types:')
    [print(idx,type(val[0]),type(val[1])) for idx,val in df2.iterrows()]
    
    df3=pd.DataFrame(columns=['A2','B18'],
                     data=[[10,123415209123456789],
                           [12,1223152012345678088],
                           [10,1234152091234566666],
                           [12,1223152012345678777]]).astype({'A2': int, 'B18':str})
    
    print('df3 types:')
    [print(idx,type(val[0]),type(val[1])) for idx,val in df3.iterrows()]
    

    Output:

    df1 types:
    0 <class 'int'> <class 'int'>
    1 <class 'int'> <class 'int'>
    2 <class 'int'> <class 'int'>
    3 <class 'int'> <class 'int'>
    df2 types:
    0 <class 'str'> <class 'str'>
    1 <class 'str'> <class 'str'>
    2 <class 'str'> <class 'str'>
    3 <class 'str'> <class 'str'>
    df3 types:
    0 <class 'int'> <class 'str'>
    1 <class 'int'> <class 'str'>
    2 <class 'int'> <class 'str'>
    3 <class 'int'> <class 'str'>