I have to add a table from a CSV file around 1500 rows and 9 columns, (75 pages) in a docx word document. using python-docx.
I have tried differents approaches, reading ths csv with pandas or directly openning de csv file, It cost me around 150 minutes to finish the job independently the way I choose
My question is if this could be normal behavior or if exist any other way to improve this task.
I'm using this for loop to read several cvs files and parsing it in table format
for toTAB in listBRUTO:
df= pd.read_csv(toTAB)
# add a table to the end and create a reference variable
# extra row is so we can add the header row
t = doc.add_table(df.shape[0]+1, df.shape[1])
t.style = 'LightShading-Accent1' # border
# add the header rows.
for j in range(df.shape[-1]):
t.cell(0,j).text = df.columns[j]
# add the rest of the data frame
for i in range(df.shape[0]):
for j in range(df.shape[-1]):
t.cell(i+1,j).text = str(df.values[i,j])
#TABLE Format
for row in t.rows:
for cell in row.cells:
paragraphs = cell.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
font = run.font
font.name = 'Calibri'
font.size= Pt(7)
doc.add_page_break()
doc.save('blabla.docx')
Thanks in advance
You'll want to minimize the number of calls to table.cell()
. Because of the way cell-merging works, these are expensive operations that really add up when performed in a tight loop.
I would start with refactoring this block and see how much improvement that yields:
# --- add the rest of the data frame ---
for i in range(df.shape[0]):
for j, cell in enumerate(table.rows[i + 1].cells):
cell.text = str(df.values[i, j])