Search code examples
pythonlistdictionarypython-docx

Parsing a table data in dictionary format using docx


I am trying to parse a particular set of table data using Python docx module.

The table data looks something like this enter image description here

I need to retrieve the "Authorities" and respective "Versions" in key value format, so that I can use that data for further processing.

I am unable to iterate over the dictionary if I use -

d = OrderedDict(zip(table.cell(rowNo, 0).text, table.cell(rowNo, 2).text))

which gives me orderedDictionary but I cant access the values using d['Juno'] which I am expecting to give me 4.5.6

from docx import Document

document = Document('myfile.docx')

    for table in document.tables:
        printTable = False
        rowNo = 0;
        for row in table.rows:
            for cell in row.cells:
                if cell.text == "Table2":
                    printTable = False
            if printTable:
                print (table.cell(rowNo, 0).text + '=' + table.cell(rowNo, 2).text)
            for cell in row.cells:
                if cell.text == "Authorities":
                    printTable = True
            rowNo += 1

I am getting the data in below format after parsing -

Juno=4.5.6
Acrux=3.5.6
Mars=5.6.7

Solution

  • You can define a dictionary and achieve this -

    from docx import Document
    
    document = Document('myfile.docx')
    data = {}
    for table in document.tables:
        printTable = False
        rowNo = 0;
        for row in table.rows:
            for cell in row.cells:
                if cell.text == "Table2":
                    printTable = False
            if printTable:
                data[table.cell(rowNo, 0).text] = table.cell(rowNo, 2).text
            for cell in row.cells:
                if cell.text == "Authorities":
                    printTable = True
            rowNo += 1
    print (data)
    

    Will give you the expected data in dictionary format