We have an application creating large pptx's with over 1000 slides and we are using python-pptx
library.
The problem we have is that, as the Presentation grows it becomes slower to add Elements and/or charts to it.
from pptx import Presentation
from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Inches
SLD_LAYOUT_TITLE_AND_CONTENT = 1
prs = Presentation()
slide_layout = prs.slide_layouts[SLD_LAYOUT_TITLE_AND_CONTENT]
for idx in range(2000):
slide = prs.slides.add_slide(prs.slide_layouts[5])
chart_data = CategoryChartData()
chart_data.categories = ['East', 'West', 'Midwest']
chart_data.add_series('Series 1', (19.2, 21.4, 16.7))
x, y, cx, cy = Inches(2), Inches(2), Inches(6), Inches(4.5)
slide.shapes.add_chart(
XL_CHART_TYPE.COLUMN_CLUSTERED, x, y, cx, cy, chart_data
)
print(str(idx))
prs.save('test.pptx')
I wonder if anyone has come across this situation before? It seems that pptx-python has to lookup inside the Presentation thus making it slower per iteration. Or is it the way we are using python to loop and load the variables into memory?
This appears to be an O(N^2) behavior in the chart and slide partname assignment. More details in the GitHub issue thread here: https://github.com/scanny/python-pptx/issues/644#issuecomment-685056215