pdfplumber memory hogging (crash with large pdf files)

Using pdfplumber to extract text from large pdf files crashes it.

with pdfplumber.open("data/my.pdf") as pdf:
    for page in pdf.pages:
        **do something**

Solution

Solution found on: https://github.com/jsvine/pdfplumber/issues/193 New:

with pdfplumber.open("data/my.pdf") as pdf:
    for page in pdf.pages:
        run_my_code()
        page.flush_cache()

Old:

with pdfplumber.open("data/my.pdf") as pdf:
    for page in pdf.pages:
        run_my_code()
        del page._objects
        del page._layout

These two seem like are the one with the most responsibility for hogging the memory after each loop, deleting it can assist not hogging the computer memory.

If this does not work please try forcing the garbage collector to clean them.

import gc
with pdfplumber.open("data/my.pdf") as pdf:
    for page in pdf.pages:
        run_my_code()
        del page._objects
        del page._layout
        gc.collect()