Search code examples
pythonmatplotlibpdfpages

matplotlib PdfPages - storing a lossy copy of a plot with lots of data


I'm creating plots with matplotlib.pyplot and writing them to pdf. Some of these plots have largeish (up to 100,000) points and obviously have a lot of overlapping points, i.e. certain parts of the chart are just a solid mass. (That's okay - I'm interested in what the sparser parts of the graph look like.)

When I save these plots to pdf, it takes a long time to write, and reading the pdf is even worse. Is there a way to store a "lossy" copy of the plot in the pdf? For example, if I took a screenshot of the plot and embedded it in the pdf, it would load a lot faster.


Solution

  • I recommend trying to plot with the option rasterized:

    pts = np.random.rand(2, 100000)
    plt.scatter(*pts, rasterized=True)
    plt.savefig('rast.pdf')
    

    For comparison:

    plt.scatter(*pts)
    plt.savefig('reg.pdf')
    

    And

    $ ls -lh tmp*.pdf
    177K Dec  9 22:03 tmp_rast.pdf
    1.5M Dec  9 22:02 tmp_reg.pdf