I would like to build the table loaded from word in html, but a big problem is the merged cells, the best result I got was returning the value of the cells without repeating the merged cells, but I stopped there, not knowing how I can proceed
from docx import Document
def iter_unique_cells(row):
prior_tc = None
for cell in row.cells:
this_tc = cell._tc
prior_tc = this_tc
yield cell
document = Document("document.docx")
for table in document.tables:
for row in table.rows:
for cell in iter_unique_cells(row):
for paragraph in cell.paragraphs:
print(paragraph.text)
I would rewrite the iter_unique_cells
function to also return whether the current cell is merged or not. You can then integrate this information into the html by adding colspan="2"
to the <td></td>
elements. That should merge the cells (horizontally). To build the html, I would declare a string outside all of the loops and add each element's opening tag at the start of each iteration and the closing tag at the end.
from docx import Document
def iter_unique_cells(row):
... # modify to return cell, is_merged
document = Document("document.docx")
html = ""
for table in document.tables:
html += "<table>"
for row in table.rows:
html += "<tr>"
for cell, is_merged in iter_unique_cells(row):
html += "<td colspan='2'>" if is_merged else "<td>"
for paragraph in cell.paragraphs:
html += f"<p>{paragraph.text}</p>"
html += "</td>"
html += "</tr>"
html += "</table>"