Search code examples
pandasopenpyxl

How to force pandas to evaluate formulas of xlsx and not read them as NaN?


Here are the things I am trying to do

  1. Read Excel file with formulas into pandas dataframe
  2. Change values in some cell
  3. Evaluate the cell values according to the formulas
  4. Finally, save that file as html

Here's my code

import pandas as pd
import numpy as np
from openpyxl import load_workbook

wb = load_workbook(filename = 'initial.xlsx')
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)
df['x']['C'] = 10
df.to_excel("processing.xlsx", header=False, index=False)

df1 = pd.read_excel("processing.xlsx", sheetname=0, header=0)
df1.to_html('output.html')

With this code I am able to get fully functioning processing.xlsx but if I try to convert it into html the dataframe reads all formulas as NaN.

Question: How to force pandas to evaluate formulas of xlsx and not read them as NaN?

PS: If I try to convert initial.xlsx into html it works fine so something must be wrong in to_excel output.


Solution

  • I found the problem and a messy workaround. Problem:openpyxl does not keep the types intact while saving. The size of processing.xlsx got reduced by 10kb. Therefore, pandas read it as NaN. Therefore, I had to use xlwings which deciphers the types of cell correctly. PS: It is not the best solution, therefore, other Answers are welcome

    import pandas as pd, xlwings as xw
    def df_from_excel(path):
        app = xw.App(visible=False)
        book = app.books.open(path)
        book.save()
        app.kill()
        return pd.read_excel(path,header=0)
    
    df1 = df_from_excel("1.xlsx")
    print(df1.head())
    df1.to_html('public\output.html')