Search code examples
pandasdataframefor-loopordereddictionaryordereddict

Ordereddict - How to iterate to read the same cell in all of the sheets within an Excel file?


In the context of ordereddict, I'd like to loop through all of the sheets in my Excel file, read the content of the B2 cell from each sheet, and print it out. Later, I plan to set conditions based on the string value of that cell.

  • The cell of interest is B2: I would like to print the three different student names, which are Roberto, Miquel Angelo and Leonardo.
  • My current piece of code is only printing 'Leonardo' in Sheet_3 and not considering the information in B2 on the rest of the sheets.
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
import openpyxl
from openpyxl import Workbook, load_workbook

# Defining the file path
file_path = r'C:/Users/machukovich/Desktop/stackoverflow.xlsx'

# Load workbook as openpyxl
reference_workbook = load_workbook(file_path)

# We will mantain the workbook open
wb = reference_workbook.active

# Loading the file into a dictionary of Dataframes
dict_of_df = pd.read_excel(file_path, sheet_name=None, skiprows=2)

# Writting the loop itself (it only prints Leonardo)
for sheet_name, df in dict_of_df.items():
    student_name = wb['B2'].value
    print(f"Student Name: {student_name}")
  • Below, you can find snapshots of the Excel file to further clarification. enter image description here enter image description here enter image description here

Solution

  • From https://openpyxl.readthedocs.io/en/stable/tutorial.html#create-a-workbook :

    A workbook is always created with at least one worksheet. You can get it by using the Workbook.active property: ws = wb.active

    Info: This is set to 0 by default. Unless you modify its value, you will always get the first worksheet by using this method.

    You never change reference_workbook.active and wb is always the first worksheet of the workbook. Then, additionally, you use pandas to load all worksheets by (correctly) passing None as sheet_name. You also iterate over them correctly, but inside the loop you always access wb without actually using the pandas dataframe of the worksheet df. So you basically mixed two solutions.

    So you have two options:

    1: Remove pd.load_excel and use openpyxl by iterating over the worksheets with something like

    for sheet in reference_workbook:
        print(sheet["B2"])
    

    2: Use your OrderedDict with pandas dataframes by accessing df instead of wb. However, because you passed skiprows=2 you cut off B2 and the sheet will start at row 3.