I have a xlsx file in which the data does not start at first row or column. It looks like this.
Only the column names are known here. The data ends whenever there is "**********" in first column. I need the output in a list of dictionaries, like below.
'ListOfDict': [ { 'A':1, 'B':'ABC', 'C':'Very Good', 'D':'Hardware', 'E':200.2 }, { 'A':2, 'B':'DEF', 'C':'Not so good', 'D':'Software', 'E':100.1}]
I could figure out the column names. But could not get the values. Here is my code.
import xlrd
from itertools import product
wb = xlrd.open_workbook(filename)
ws = wb.sheet_by_index(0)
for row_index, col_index in product(xrange(ws.nrows), xrange(ws.ncols)):
if ws.cell(row_index, col_index).value == 'A':
print "({}, {})".format(row_index, col_index)
break
key1 = [ws.cell(row_index, col_ind).value for col_ind in range(col_index, ws.ncols)]
val = [ws.cell(row_index + i, col_ind).value
for i in range(row_index + 1, ws.nrows)
for col_ind in range(col_index, ws.ncols)]
But that gives error "list index out of range"
Please help. Thank you.
Your problem is that the loop variable i
is already the row_index, not the offset.
So you simply need to change the row index of the cell to take i
:
val = [ws.cell(i, col_ind).value
for i in range(row_index + 1, ws.nrows)
for col_ind in range(col_index, ws.ncols)]
Or alternatively fix the creation the offset:
val = [ws.cell(row_index + i, col_ind).value
for i in range(1, ws.nrows - row_index)
for col_ind in range(col_index, ws.ncols)]
What I would do is first find the last row according to your condition. Then, with a nested loop, create the dictionaries. Something like:
import xlrd
from itertools import product
wb = xlrd.open_workbook(filename)
ws = wb.sheet_by_index(0)
for row_index in xrange(ws.nrows):
if ws.cell(row_index, 0).value == '**********':
last_row = row_index
break
for row_index, col_index in product(xrange(ws.nrows), xrange(ws.ncols)):
if ws.cell(row_index, col_index).value == 'A':
first_row, first_col = row_index, col_index
print "({}, {})".format(row_index, col_index)
break
list_of_dicts = []
for row in range(first_row+1, last_row):
dict = {}
for col in range(first_col, ws.ncols:
key = ws.cell(first_row, col).value
val = ws.cell(row, col).value
dict[key] = val
list_of_dicts.append(dict)
And in a much shorter, unreadable way (just for fun...):
list_of_dicts = [{ws.cell(first_row, col).value: ws.cell(row, col).value for col in range(first_col, ws.ncols} for row in range(first_row+1, last_row)]