Search code examples
pythonexcelpython-3.xxlrd

How to read Excel data by column name in python using xlrd


I am trying to read the data of large excel file(almost 100000 row). I am using 'xlrd Module' in python to fetch the data from excel. I want to fetch data by column name(Cascade,Schedule Name,Market) instead of column number(0,1,2). Because my excel columns are not fixed. i know how to fetch data in case of fixed column.

enter image description here

here is the code by which i am fetching data from the excel for fixed column

import xlrd

file_location =r"C:\Users\Desktop\Vision.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet= workbook.sheet_by_index(0)
print(sheet.ncols,sheet.nrows,sheet.name,sheet.number)

for i in range(sheet.nrows):
   flag = 0
   for j in range(sheet.ncols):
      value=sheet.cell(i,j).value

If anyone has any solution of this, kindly let me know

Thanks


Solution

  • Comment: still not working when header of
    fieldnames = ['Cascade', 'Market', 'Schedule', 'Name] and
    Sheet(['Cascade', 'Schedule', 'Name', 'Market']) are equal.

    Keep order of fieldnames in col_idx, was not my initial goal.


    Question: I want to fetch data by column name

    The following OOP solution will work:

    class OrderedByName():
        """
        Privides a generator method, to iterate in Column Name ordered sequence
        Provides subscription, to get columns index by name. using class[name]
        """
        def __init__(self, sheet, fieldnames, row=0):
            """
            Create a OrderedDict {name:index} from 'fieldnames'
            :param sheet: The Worksheet to use
            :param fieldnames: Ordered List of Column Names
            :param row: Default Row Index for the Header Row
            """
            from collections import OrderedDict
            self.columns = OrderedDict().fromkeys(fieldnames, None)
            for n in range(sheet.ncols):
                self.columns[sheet.cell(row, n).value] = n
    
        @property
        def ncols(self):
            """
            Generator, equal usage as range(xlrd.ncols), 
              to iterate columns in ordered sequence
            :return: yield Column index
            """
            for idx in self.columns.values():
                yield idx
    
        def __getitem__(self, item):
            """
            Make class object subscriptable
            :param item: Column Name
            :return: Columns index
            """
            return self.columns[item]
    

    Usage:

    # Worksheet Data
    sheet([['Schedule', 'Cascade', 'Market'],
           ['SF05UB0', 'DO Macro Upgrade', 'Upper Cnetral Valley'],
           ['DE03HO0', 'DO Macro Upgrade', 'Toledo'],
           ['SF73XC4', 'DO Macro Upgrade', 'SF Bay']]
          )
    
    # Instantiate with Ordered List of Column Names
    # NOTE the different Order of Column Names
    by_name = OrderedByName(sheet, ['Cascade', 'Market', 'Schedule'])
    
    # Iterate all Rows and all Columns Ordered as instantiated
    for row in range(sheet.nrows):
        for col in by_name.ncols:
            value = sheet.cell(row, col).value
            print("cell({}).value == {}".format((row,col), value))
    

    Output:

    cell((0, 1)).value == Cascade
    cell((0, 2)).value == Market
    cell((0, 0)).value == Schedule
    cell((1, 1)).value == DO Macro Upgrade
    cell((1, 2)).value == Upper Cnetral Valley
    cell((1, 0)).value == SF05UB0
    cell((2, 1)).value == DO Macro Upgrade
    cell((2, 2)).value == Toledo
    cell((2, 0)).value == DE03HO0
    cell((3, 1)).value == DO Macro Upgrade
    cell((3, 2)).value == SF Bay
    cell((3, 0)).value == SF73XC4
    

    Get Index of one Column by Name

    print("cell{}.value == {}".format((1, by_name['Schedule']),
                                        sheet.cell(1, by_name['Schedule']).value))
    #>>> cell(1, 0).value == SF05UB0
    

    Tested with Python: 3.5