Search code examples

How to add the column headers on every page on python-docx?

I'm trying to print a Pandas data frame as .docx file on python. My problem is since the docx file will most of the time print more than 1 page, I want to have the column names of the data frame to be printed for every new page.

Currently my code just prints the whole data frame as is:

# add the header rows.
for j in range(t01.shape[-1]):
 table.cell(0,j).text = t01.columns[j]
# add the rest of the data frame
for i in range(t01.shape[0]):
 for j in range(t01.shape[-1]):
  table.cell(i+1,j).text = str(t01.values[i,j])


  • what you're probably looking after is Repeat Header Rows functionality which can be find in here: ms word screenshot on how to make table row a header row

    since python-docx doesn't have that functionality yet, you can add that flag by yourself. first you need to look for it in the ooxml schema

    note that rows that are declared as header rows will repeat themselves at the beginning of every page if the table can't fit onto a single page. so what you need to do is to declare the first row as a header row. that can be done like:

    from docx import Document
    from docx.oxml import OxmlElement
    doc = Document()
    t = doc.add_table(rows=50, cols=2)
    # set header values
    t.cell(0, 0).text = 'A'
    t.cell(0, 1).text = 'B'
    tbl_header = OxmlElement('w:tblHeader') # create new oxml element flag which indicates that row is header row
    first_row_props = t.rows[0]._element.get_or_add_trPr() # get if exists or create new table row properties el
    first_row_props.append(tbl_header) # now first row is the header row
    for i in range(1, len(t.rows)):
        for j in range(len(t.columns)):
            t.cell(i, j).text = f'i:{i}, j:{j}''t1.docx')

    the end result should look like:

    generated ms word document with repeating header