I'm trying to print a Pandas data frame as .docx file on python. My problem is since the docx file will most of the time print more than 1 page, I want to have the column names of the data frame to be printed for every new page.
Currently my code just prints the whole data frame as is:
# add the header rows.
for j in range(t01.shape[-1]):
table.cell(0,j).text = t01.columns[j]
# add the rest of the data frame
for i in range(t01.shape[0]):
for j in range(t01.shape[-1]):
table.cell(i+1,j).text = str(t01.values[i,j])
what you're probably looking after is Repeat Header Rows
functionality which can be find in here:
since python-docx
doesn't have that functionality yet, you can add that flag by yourself. first you need to look for it in the ooxml schema http://www.datypic.com/sc/ooxml/e-w_tblHeader-1.html
note that rows that are declared as header rows will repeat themselves at the beginning of every page if the table can't fit onto a single page. so what you need to do is to declare the first row as a header row. that can be done like:
from docx import Document
from docx.oxml import OxmlElement
doc = Document()
t = doc.add_table(rows=50, cols=2)
# set header values
t.cell(0, 0).text = 'A'
t.cell(0, 1).text = 'B'
tbl_header = OxmlElement('w:tblHeader') # create new oxml element flag which indicates that row is header row
first_row_props = t.rows[0]._element.get_or_add_trPr() # get if exists or create new table row properties el
first_row_props.append(tbl_header) # now first row is the header row
for i in range(1, len(t.rows)):
for j in range(len(t.columns)):
t.cell(i, j).text = f'i:{i}, j:{j}'
doc.save('t1.docx')
the end result should look like: