Search code examples
pythonexcelxlrdxlwt

how to extract data from one excel workbook and output to another using python xlrd/xlwt?


I am trying to write a script which will automate a copy/paste of employee time sheets from several files to one compiled file. Since they are time sheets with project codes some cells are left blank where an employee worked on a different project that day. Also the files have been converted from xlsx(2007) to .csv.xls which xlrd seems to open just fine.

I do know how to open and create a book object but my knowledge of this module is very limited so I thought maybe a general algorithm would be helpful:

import xlrd, xlwt

put all following in for or while loop to iterate through files:
book = xlrd.open_workbook('mybook.csv.xls')
extract data; store data for ouput
use for loop to iterate over data, output to final sheet
open next file, repeat process storing each output below the previous

I am looking for anything that will help me find the answers, not just code. Any help would be appreciated. Thanks.


Solution

  • This might help ... it reproduces your data as closely as possible (dates remain as dates, empty cells don't become text cells with 0-length contents, booleans and error cells don't become number cells).

    from xlrd import XL_CELL_EMPTY, XL_CELL_TEXT, XL_CELL_NUMBER,
        XL_CELL_DATE, XL_CELL_BOOLEAN, XL_CELL_ERROR, open_workbook
    from xlwt import Row, easyxf, Workbook
    
    method_for_type = {
        XL_CELL_TEXT:    Row.set_cell_text,
        XL_CELL_NUMBER:  Row.set_cell_number,
        XL_CELL_DATE:    Row.set_cell_number,
        XL_CELL_ERROR:   Row.set_cell_error,
        XL_CELL_BOOLEAN: Row.set_cell_boolean,
        }
    
    date_style = easyxf(num_format_str='yyyy-mm-dd')
    other_style = easyxf(num_format_str='General')
    
    def append_sheet(rsheet, wsheet, wrowx=0):
        for rrowx in xrange(rsheet.nrows):
            rrowvalues = rsheet.row_values(rrowx)
            wrow = wsheet.row(wrowx)
            for rcolx, rtype in enumerate(rsheet.row_types(rrowx)):
                if rtype == XL_CELL_EMPTY: continue
                wcolx = rcolx
                wmethod = method_for_type[rtype]
                wstyle = date_style if rtype == XL_CELL_DATE else other_style
                wmethod(wrow, wcolx, rrowvalues[rcolx], wstyle) 
            wrowx += 1
        return wrowx
    
    if __name__ == '__main__':
        import sys, xlrd, xlwt, glob
        rdpattern, wtfname = sys.argv[1:3]
        wtbook = Workbook()
        wtsheet = wtbook.add_sheet('guff')
        outrowx = 0
        for rdfname in glob.glob(rdpattern):
            rdbook = open_workbook(rdfname)
            rdsheet = rdbook.sheet_by_index(0)
            outrowx = append_sheet(rdsheet, wtsheet, outrowx)
            print outrowx
        wtbook.save(wtfname)