I'm struggling to find the heading name in which a table lies, I'm using python-docx library, I'd like to know the possibility I can use to get the table along its heading name in which it lies inside.
from docx import Document
from docx.shared import Inches
document = Document('test.docx')
tabs = document.tables
You can extract the structured information from docx file using the xml. Try this:
doc = Document("file.docx")
headings = [] #extract only headings from your code
tables = [] #extract tables from your code
tags = []
all_text = []
schema = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
for elem in doc.element.getiterator():
if elem.tag == schema + 'body':
for i, child in enumerate(elem.getchildren()):
if child.tag != schema + 'tbl':
node_text = child.text
if node_text:
if node_text in headings:
tags.append('heading')
else:
tags.append('text')
all_text.append(node_text)
else:
tags.append('table')
break
After the above code you will have the list of tags that will show the structure of document heading, text and table then you can map the respective data from the lists.
Also check the data from tag list to get heading of a table.