Search code examples
pythonexcelpandasimportlistobject

Open and Fetch data from a ListObject of an Excel sheet with Python


The Problem: Open a ListObject (excel table) of an Excel file from y python environment.

The why: There are multiple solutions to open an excel file in python. Starting with pandas:

import pandas as pd
mysheetName="sheet1"
df = pd.read_excel(io=file_name, sheet_name=mysheetName)

This will pass the sheet1 into a pandas data frame. So far so good.

Other more detailed solution is using specific libraries. This one being a code of a stack overflow question.

from openpyxl import load_workbook
wb2 = load_workbook('test.xlsx')
print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']

worksheet1 = wb2['Sheet1'] # one way to load a worksheet
worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
print(worksheet1['D18'].value)

So far so good as well.

BUT: If you have a ListObject (excel table) in a sheet I did not find any way to access the data of the Listobject. ListObjects are often used by a bit more advance users of Excel; above all when programming macros in VBA. There are very convenient and could be seen as the equivalent of a pandas dataframe in Excel. Having a bridge between Excel Listobject and a pandas data frame seems like super logical. Nevertheless I did not find so far any solution, library or workaround for doing that.

The question. Does anyone know about some python lybrary/solution to directly extract Listobjects form Excel sheets?.

NOTE1: Not nice solution Of course knowing the "placement" of the Listobject it is possible to refer to the start and last cell, but this is a really bad solution because does not allow you to modify the Listobject in the excel file (the python would have to be modified straight away). As soon as the placement of the ListObject changes, or the listobject itself gets bigger, the python code would be broken.

NOTE2: My current solution: I export the listObject from excel (with a macro) into a JSON file and read it from python. But the extra work is obvious. VBA code, extra file etc etc.

Last comment: If someone is interested about this issue but still don't have a clue what is a ListObject in excel here click and see here:

enter image description here


Solution

  • James is right:

    https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html

    https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.table.html

    There is a class in openpyxl to read tables. Also by id:

    class openpyxl.worksheet.table.Table(id=1,...
    

    id=1 would mean the first table of the worksheet.

    Remember always that ListObjects in Excel are called Tables. Thats weird (as oft with VBA). If you work with VBA you might forget that the ListObject=Table.

    With xlwings is also possible. The API is a bit different:

    import xlwings as xw
    wb = xw.Workbook.active()
    xw.Range('TableName[ColumnName]').value
    

    Or to get the column including header and Total row, you could do:

    xw.Range('TableName[[#All], [ColumnName]]').value