Search code examples
pythoncastingxlrd

Convert Float to Integer During xcel to Tab Delimited File Conversion


I have written the following method that converts a .xlsx file to a .txt tab delimited file.

import sys
import xlrd
import csv

def xlsx_to_tab(self, inFile):
    ''' Convert an xlsx file to a tab delimited file. '''
    excel_file = xlrd.open_workbook(inFile)
    worksheet = excel_file.sheet_names()[0]
    sh = excel_file.sheet_by_name(worksheet)
    extentPos = inFile.rfind('.')
    tab_file = open(inFile[:extentPos] + '.txt', 'w')
    writetab = csv.writer(tab_file, delimiter='\t', quoting=csv.QUOTE_ALL)

    for row in range(sh.nrows):
        writetab.writerow(sh.row_values(row))

    tab_file.close()

This method successfully converts between the file formats, however, what appears to be integers in the original excel file are output as floats in the .txt file output.

I understand this may have to do with cell format settings in excel. That being said, I would like to handle this within this script, as I would like to explicitly define elements within the file as an integer with int(), as opposed to checking and altering all the excel files before hand which would be tedious.

This method processes the file row by row. Is there a way to retrieve individual elements within each row so as to type cast them as an integer?

This question kind of addresses the issue:

But shows how to do it for one value at a time as opposed to a whole column of values.

To be a little more clear, I am trying to cast individual columns as integers but not all of them. So only some elements of each row need to be converted.

The numeric columns are columns 1,3, and 5


Solution

  • Use list comprehension to convert the elements of the row to int before writing them to the new file:

    for row in range(sh.nrows):
        new_row = [int(x) for x in sh.row_values(row)]
        writetab.writerow(new_row)
    

    EDIT: with OP's new comments below, what you need to do (although it's not pretty) is:

    int_columns = [1, 3, 5]
    for row in range(sh.nrows):
        new_row = sh.row_values(row)
        for col in int_columns:
            new_row[col] = int(new_row[col])
        writetab.writerow(new_row)
    

    And I would suggest you change your naming a bit. In the outermost loop, you're iterating over integers, but you name your variable row. I'd write it like:

    int_columns = [1, 3, 5]
    for i in range(sh.nrows):
        row = sh.row_values(i)
        for col in int_columns:
            row[col] = int(new_row[col])
        writetab.writerow(row)