Search code examples
python-3.xpython-docx

Build html with word table merged cells python


I would like to build the table loaded from word in html, but a big problem is the merged cells, the best result I got was returning the value of the cells without repeating the merged cells, but I stopped there, not knowing how I can proceed

from docx import Document

def iter_unique_cells(row):
    prior_tc = None
    for cell in row.cells:
        this_tc = cell._tc
        prior_tc = this_tc
        yield cell


document = Document("document.docx")
for table in document.tables:
    for row in table.rows:
        for cell in iter_unique_cells(row):
            for paragraph in cell.paragraphs:
                print(paragraph.text)

Solution

  • I would rewrite the iter_unique_cells function to also return whether the current cell is merged or not. You can then integrate this information into the html by adding colspan="2" to the <td></td> elements. That should merge the cells (horizontally). To build the html, I would declare a string outside all of the loops and add each element's opening tag at the start of each iteration and the closing tag at the end.

    from docx import Document
    
    def iter_unique_cells(row):
        ...  # modify to return cell, is_merged
    
    document = Document("document.docx")
    html = ""
    for table in document.tables:
        html += "<table>"
        for row in table.rows:
            html += "<tr>"
            for cell, is_merged in iter_unique_cells(row):
                html += "<td colspan='2'>" if is_merged else "<td>"
                for paragraph in cell.paragraphs:
                    html += f"<p>{paragraph.text}</p>"
                html += "</td>"
            html += "</tr>"
        html += "</table>"