Search code examples
python-3.xpandaspython-docxpandas.excelwriter

How to loop lists of list obtained from python-docx where each list is a table and write the tables into a seperate worksheets


I am using python-docx to extract two tables from a document. I have iterated over the tables and created a list of lists. Each individual list represents a table, and within that I have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column. I am facing difficulty when creating a data frame for each table and writing each table on a seperate excel sheet.

from docx.api import Document
import pandas as pd
import csv
import json
import unicodedata

document = Document('Sampletable1.docx')
tables = document.tables
print (len(tables))
big_data = []


for table in document.tables:
    data = []
    Keys = None
    for i, row in enumerate(table.rows):
        text = (cell.text for cell in row.cells)
        if i == 0:
            keys = tuple(text)
            continue
        dic = dict(zip(keys, text))
        data.append(dic)
    big_data.append(data)
 print(big_data)

The output of the above code is:

2

[[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '-0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%'}], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]

How do I access the above lists??

Further I tried to create a pandas data frame

#write the data into a data frame
for thing in big_data:
    #print(thing)
    df = pd.DataFrame(thing)
    print(df)
    writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1')
    writer.save()

I got the first table on the excel but unable to work with second table. I am expecting both the table to be in the same excel workbook(dftable3.xlsx) but in different worksheets(Sheet1,Sheet2)

I have attached the images of the tables.

Thanks in advance

enter image description hereenter image description here


Solution

  • How do I access the above lists??

    You already did, by iterating over them, or printing them. Consider using the pretty-print library:

    import pprint
    pprint.pprint(big_data)
    

    I am expecting ... different worksheets(Sheet1,Sheet2)

    Well, that's unlikely, given the constant 'Sheet1' argument you supplied. Here is one way to accomplish that:

    writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
    for i, thing in enumerate(big_data):
        df = pd.DataFrame(thing)
        df.to_excel(writer, sheet_name=f'Sheet{i}')
    writer.save()
    

    Note the scope of writer -- it must be longer lived than each of the constituent dfs.