Search code examples
pythondocxpython-docx

How to add the column headers on every page on python-docx?


I'm trying to print a Pandas data frame as .docx file on python. My problem is since the docx file will most of the time print more than 1 page, I want to have the column names of the data frame to be printed for every new page.

Currently my code just prints the whole data frame as is:

# add the header rows.
for j in range(t01.shape[-1]):
 table.cell(0,j).text = t01.columns[j]
        
# add the rest of the data frame
for i in range(t01.shape[0]):
 for j in range(t01.shape[-1]):
  table.cell(i+1,j).text = str(t01.values[i,j])

Solution

  • what you're probably looking after is Repeat Header Rows functionality which can be find in here: ms word screenshot on how to make table row a header row

    since python-docx doesn't have that functionality yet, you can add that flag by yourself. first you need to look for it in the ooxml schema http://www.datypic.com/sc/ooxml/e-w_tblHeader-1.html

    note that rows that are declared as header rows will repeat themselves at the beginning of every page if the table can't fit onto a single page. so what you need to do is to declare the first row as a header row. that can be done like:

    from docx import Document
    from docx.oxml import OxmlElement
    
    doc = Document()
    t = doc.add_table(rows=50, cols=2)
    
    # set header values
    t.cell(0, 0).text = 'A'
    t.cell(0, 1).text = 'B'
    
    tbl_header = OxmlElement('w:tblHeader') # create new oxml element flag which indicates that row is header row
    first_row_props = t.rows[0]._element.get_or_add_trPr() # get if exists or create new table row properties el
    first_row_props.append(tbl_header) # now first row is the header row
    
    for i in range(1, len(t.rows)):
        for j in range(len(t.columns)):
            t.cell(i, j).text = f'i:{i}, j:{j}'
    
    doc.save('t1.docx')
    

    the end result should look like:

    generated ms word document with repeating header