Search code examples
pythonexcelprintingxlrd

Read Excel Files and Only Use Specific Files


in my past questions, I've stated that I'm new to python. I've only used it once for work. And yet again, there's a small project I have to do for work.

I have to read an excel file, and in that excel file, there are 3 columns (col1, col2, col3). There are about 100 rows.

col1 has 2 values A and B. col2 has values ranging from ONLY 1 - 10. col3, there are plenty of different values.

But I want my python program to look at each distinct value in col1, and then at each distinct value in col2, and then calculate the avg of all the corresponding values of col3.

Hopefully, the output can look something like this:

A - 1 = 2.3
A - 2 = 6.2
A - 3 = 5.7
etc. etc.
B - 1 = 3.5
B - 2 = 4.1
B - 3 = 8.1
etc. etc.

I know, it's a lot to ask, but I've done this so far:

import xlrd #import package

#opening workbook and reading first sheet
book = xlrd.open_workbook('trend.xls')
sheet = book.sheet_by_index(0)

#print sheet name, number of rows and columns
#print sheet.name #print sheet name
#print sheet.nrows #print number of rows
#print sheet.ncols #print number of colums

#print cellname along with value in for loop
for row_index in range(sheet.nrows):
    for col_index in range(sheet.ncols):
        print xlrd.cellname(row_index,col_index),'-',
        print sheet.cell(row_index,col_index).value

It started printing all the values within each cell, along with the name and such. But then I realized it's not doing what it's supposed to really be doing. And I can't find a proper tutorial on how to do this.

If any of you have any suggestions, I'll appreciate it very much. Thank you so much!


Solution

  • Try this:

    import xlrd
    
    book = xlrd.open_workbook('trend.xls')
    sheet = book.sheet_by_index(0)
    
    unique_combinations = {}
    
    for row_index in range(sheet.nrows):
        cell_1 = sheet.cell(row_index, 0)
        cell_2 = sheet.cell(row_index, 1)
        cell_3 = sheet.cell(row_index, 2)
        unique_combo = (cell_1.value, int(cell_2.value))
        if unique_combinations.has_key(unique_combo):
            unique_combinations[unique_combo].append(cell_3.value)
        else:
            unique_combinations[unique_combo] = [cell_3.value]
    
    for k in unique_combinations.keys():
        values = unique_combinations[k]
        average = sum(values ) / len(values )
        print '%s - %s = %s' % (k[0], k[1], average)