Search code examples
pythonxlrd

Python XLRD Get Column Values by Column Names into List of Dictionaries


I have a xlsx file in which the data does not start at first row or column. It looks like this.

enter image description here

Only the column names are known here. The data ends whenever there is "**********" in first column. I need the output in a list of dictionaries, like below.

'ListOfDict': [ { 'A':1, 'B':'ABC', 'C':'Very Good', 'D':'Hardware', 'E':200.2 }, { 'A':2, 'B':'DEF', 'C':'Not so good', 'D':'Software', 'E':100.1}]

I could figure out the column names. But could not get the values. Here is my code.

import xlrd
from itertools import product

wb = xlrd.open_workbook(filename)
ws = wb.sheet_by_index(0)

for row_index, col_index in product(xrange(ws.nrows), xrange(ws.ncols)):
    if ws.cell(row_index, col_index).value == 'A':
        print "({}, {})".format(row_index, col_index)
        break

key1 = [ws.cell(row_index, col_ind).value for col_ind in range(col_index, ws.ncols)]

val = [ws.cell(row_index + i, col_ind).value 
       for i in range(row_index + 1, ws.nrows) 
       for col_ind in range(col_index, ws.ncols)]

But that gives error "list index out of range"

Please help. Thank you.


Solution

  • Your problem is that the loop variable i is already the row_index, not the offset.

    So you simply need to change the row index of the cell to take i:

    val = [ws.cell(i, col_ind).value 
           for i in range(row_index + 1, ws.nrows) 
           for col_ind in range(col_index, ws.ncols)]
    

    Or alternatively fix the creation the offset:

    val = [ws.cell(row_index + i, col_ind).value 
           for i in range(1, ws.nrows - row_index) 
           for col_ind in range(col_index, ws.ncols)]
    

    What I would do is first find the last row according to your condition. Then, with a nested loop, create the dictionaries. Something like:

    import xlrd
    from itertools import product
    
    wb = xlrd.open_workbook(filename)
    ws = wb.sheet_by_index(0)
    
    for row_index in xrange(ws.nrows):
        if ws.cell(row_index, 0).value == '**********':
            last_row = row_index
            break
    
    for row_index, col_index in product(xrange(ws.nrows), xrange(ws.ncols)):
        if ws.cell(row_index, col_index).value == 'A':
            first_row, first_col = row_index, col_index
            print "({}, {})".format(row_index, col_index)
            break
    
    list_of_dicts = []
    for row in range(first_row+1, last_row):
        dict = {}
        for col in range(first_col, ws.ncols:
            key = ws.cell(first_row, col).value
            val = ws.cell(row, col).value
            dict[key] = val
        list_of_dicts.append(dict)
    

    And in a much shorter, unreadable way (just for fun...):

    list_of_dicts = [{ws.cell(first_row, col).value: ws.cell(row, col).value for col in range(first_col, ws.ncols} for row in range(first_row+1, last_row)]