I've been working on creating a script which converts Excel files into csv for further processing in some of our tools.
For xls, xlsx and xlsm i've found solutions in opening the file as text object or using pandas. When encountering xlsb files i landed on the pyxlsb package, which works... refer to the code used here How can I convert a XLSB file to csv using python?
Sadly i noticed that int() values in the xlsb file seem to be converted to float and consequently written in my csv file as a float. This is obviously undesirable.
Basically i'm looking for a dtype=object type solution. I figured maybe i can ad a step of converting the values back to int with another function. However i think that's inefficient and prone to errors.
I've looked around on the pyxlsb page (https://pypi.org/project/pyxlsb/) without luck.
My code:
to_replace_list = ['\r', '\n', '\\r\\n', '\\' + str(out_del)] # values to replace in all rows
with open_xlsb(file_to_convert) as wb: # open xlsb file using pyxlsb package
for sheet_name in wb.sheets: # loop over all sheets in file
with wb.get_sheet(sheet_name) as sheet: # open xlsb sheet obj
out_file = out_filer(total_filename, sheet_name) # define output file name based on source and sheet
with open(out_file, 'a') as o: # open output csv obj
for row in sheet.rows(): # loop over rows in xlsb obj
print([re.sub(value, '', str(cell.v)) for value in to_replace_list for cell in row])
sidequest: empty values in the xlsb file will be referred to as None in the output. I would like this to be ''.
Since the release of Pandas 1.0.1, read_excel() now has xlsb support
pd.read_excel('path_to_file.xlsb', engine='pyxlsb')