Search code examples
pythonxlrd

Parsing selective columns and rows using Python


I have xls type files that have 9 columns and varying number of rows. I would like to use xlrd or other modules so that I can extract all the values from the 1st and 2nd columns out of the nine columns, and then use the extracted values individually. So far my code looks like:

    import xlrd
    import openpyxl

    workbook = xlrd.open_workbook('C09.xls')

    sheet_names = workbook.sheet_names()


    sheet = workbook.sheet_by_name(sheet_names[0])

    num_rows = sheet.nrows
    num_cols = sheet.ncols

    plist = [[0 for x in range(3)] for x in range(num_rows)]

    for i in range(num_rows):
        for j in range(3):
           plist[i][j] = sheet.cell(i,j).value

and then use the values in [i] (such as doing multiplications and whatnot) and then pulling the corresponding values from [j].

The above code gives an output like:

    [['Col header 1', 'Col header 2', 'Col header 3'], [1.0, 1000, 2000], [2.0, 1001, 2001], ..... so on]

Is there a simple way of doing so? I am quite new to Python, so I would appreciate it if you could be a bit more specific. Thank you so much!


Solution

  • Some feedback/improvements:

    In your snippet, the initialization part is redundant. This avoids the double iteration:

    plist = [[sheet.cell(i,j).value for j in range(3)] for i in xrange(num_rows)]

    should you run into None values, you can normalize them with:

    plist = [[sheet.cell(i,j).value or 0 for j in range(3)] for i in xrange(num_rows)]
    

    Finally, here is a more Pythonic way of doing 0-initializations:

    plist = [x[:] for x in [[0] * 3] * sheet.nrows