Search code examples
pythonpython-camelot

What is better, read all pages at once or page by page in python-camelot?


I will run camelot on a simple digital ocean instance (1 vCPUs, 1GB ram) everyday to extract information from a PDF with +-150 pages and store in a database. What would be a best practice for this:

a) read all pages at once camelot.read_pdf('file.pdf', pages='all', flavor='stream')?

b) read page by page?

for page in range(150):
   camelot.read_pdf('file.pdf', pages=f'{page}', flavor='stream')

Thanks


Solution

  • You would be able to read them all at once if you had the memory required but you probably don't. Hence, extracting data page by page would probably would be what you consider the "best" since after the data extraction you replace it with the next page meaning each time you read in a new page you free up memory space from the last.

    Hope this helped somewhat. :)