I am trying to parse a particular set of table data using Python docx module.
The table data looks something like this
I need to retrieve the "Authorities" and respective "Versions" in key value format, so that I can use that data for further processing.
I am unable to iterate over the dictionary if I use -
d = OrderedDict(zip(table.cell(rowNo, 0).text, table.cell(rowNo, 2).text))
which gives me orderedDictionary but I cant access the values using d['Juno']
which I am expecting to give me 4.5.6
from docx import Document
document = Document('myfile.docx')
for table in document.tables:
printTable = False
rowNo = 0;
for row in table.rows:
for cell in row.cells:
if cell.text == "Table2":
printTable = False
if printTable:
print (table.cell(rowNo, 0).text + '=' + table.cell(rowNo, 2).text)
for cell in row.cells:
if cell.text == "Authorities":
printTable = True
rowNo += 1
I am getting the data in below format after parsing -
Juno=4.5.6
Acrux=3.5.6
Mars=5.6.7
You can define a dictionary and achieve this -
from docx import Document
document = Document('myfile.docx')
data = {}
for table in document.tables:
printTable = False
rowNo = 0;
for row in table.rows:
for cell in row.cells:
if cell.text == "Table2":
printTable = False
if printTable:
data[table.cell(rowNo, 0).text] = table.cell(rowNo, 2).text
for cell in row.cells:
if cell.text == "Authorities":
printTable = True
rowNo += 1
print (data)
Will give you the expected data in dictionary format