Search code examples
pythonexcelcsvxlsb

Python XLSB to CSV conversion data types


I've been working on creating a script which converts Excel files into csv for further processing in some of our tools.

For xls, xlsx and xlsm i've found solutions in opening the file as text object or using pandas. When encountering xlsb files i landed on the pyxlsb package, which works... refer to the code used here How can I convert a XLSB file to csv using python?

Sadly i noticed that int() values in the xlsb file seem to be converted to float and consequently written in my csv file as a float. This is obviously undesirable.

Basically i'm looking for a dtype=object type solution. I figured maybe i can ad a step of converting the values back to int with another function. However i think that's inefficient and prone to errors.

I've looked around on the pyxlsb page (https://pypi.org/project/pyxlsb/) without luck.

My code:

    to_replace_list = ['\r', '\n', '\\r\\n', '\\' + str(out_del)]  # values to replace in all rows
    with open_xlsb(file_to_convert) as wb:  # open xlsb file using pyxlsb package
        for sheet_name in wb.sheets:  # loop over all sheets in file
            with wb.get_sheet(sheet_name) as sheet:  # open xlsb sheet obj
                out_file = out_filer(total_filename, sheet_name)  # define output file name based on source and sheet
                with open(out_file, 'a') as o:  # open output csv obj
                    for row in sheet.rows():  # loop over rows in xlsb obj
                        print([re.sub(value, '', str(cell.v)) for value in to_replace_list for cell in row])

sidequest: empty values in the xlsb file will be referred to as None in the output. I would like this to be ''.


Solution

  • Since the release of Pandas 1.0.1, read_excel() now has xlsb support

    pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
    

    https://pandas.pydata.org/docs/user_guide/io.html#io-xlsb