Search code examples
pythondjangopdfwkhtmltopdfdjango-wkhtmltopdf

Saving PDFs to disk as they are generated with django-wkhtmltopdf


What I'm trying to implement is this:

  1. User sends query parameters from React FE microservice to the Django BE microservice.
    • URI is something like /api/reports?startingPage=12&dataView=Region
    • These PDFs are way too big to be generated in FE, so doing it server side
  2. Request makes its way into the view.py where the data related to dataView=Region is queried from the database, each row is iterated through and a PDF report is generated for each item
    • Each dataView=Region can consist of a few hundred items and each of those items is its own report that can be a page long or several pages long
  3. As the reports are generated, they should be saved to the server persistent volume claim and not be sent back to FE until they have all run.
  4. When they have all run, I plan to use pypdf2 to combine all of the PDFs into one large file.
  5. At that point, the file is sent back to the FE to download.

I'm only working on 1. and 3. at this point and I'm unable to:

  1. Get the files to save to storage
  2. Prevent the default behavior of the PDF being sent back to the FE after it has been generated

The PDFs are being generated, so that is good.

I'm trying to implement the suggestions as found here, but I'm not getting the desired results:

Save pdf from django-wkhtmltopdf to server (instead of returning as a response)

This is what I currently have on the Django side:

# urls.py

from django.urls import path

from .views import GeneratePDFView

app_name = 'Reports'

urlpatterns = [
    path('/api/reports',
        GeneratePDFView.as_view(), name='generate_pdf'),
]

# views.py

from django.conf import settings
from django.views.generic.base import TemplateView

from rest_framework.permissions import IsAuthenticated

from wkhtmltopdf.views import PDFTemplateResponse

# Create your views here.

class GeneratePDFView(TemplateView):
    permission_classes = [IsAuthenticated]
    template_name = 'test.html'
    filename = 'test.pdf'

    def generate_pdf(self, request, **kwargs):
        context = {'key': 'value'}

        # generate response
        response = PDFTemplateResponse(
            request=self.request,
            template=self.template_name,
            filename=self.filename,
            context=context,
            cmd_options={'load-error-handling': 'ignore'})

        self.save_pdf(response.rendered_content, self.filename)

    # Handle saving the document
    # This is what I'm using elsewhere where files are saved and it works there
    def save_pdf(self, file, filename):
        with open(settings.PDF_DIR + '/' + filename, 'wb+') as destination:
            for chunk in file.chunks():
                destination.write(chunk)
# settings.py
...
DOWNLOAD_ROOT = '/mnt/files/client-downloads/'
MEDIA_ROOT = '/mnt/files/client-submissions/'
PDF_DIR = '/mnt/files/pdf-sections/'
...

I should note the other DOWNLOAD_ROOT and MEDIA_ROOT are working fine where the app uses them. I've even tried using settings.MEDIA_ROOT because I know it works, but still nothing is saved there. But as you can see, I'm starting out super basic and haven't added a query, loops, etc.

My save_pdf() is different than the SO question I linked to because that is what I'm using in other parts of my application and it is saving files fine there. I did try what they provided in the SO question, but had the same results with it not saving. That being:

with open("file.pdf", "wb") as f:
    f.write(response.rendered_content)

So what do I need to do to get these PDFs to save to disk?

Perhaps I need to be using a different library for my needs as django-wkhtmltopdf seems to do a number of things out of the box that I don't want that I'm not clear I can override.


Solution

  • OK, my smooth brain gained a few ripples overnight and figured it out this morning:

    # views.py
    
    class GeneratePDFView(TemplateView):
        permission_classes = [IsAuthenticated]
    
        def get(self, request, *args, **kwargs):
            template_name = 'test.html'
            filename = 'test.pdf'
            context = {'key': 'value'}
    
            # generate response
            response = PDFTemplateResponse(
                request=request,
                template=template_name,
                filename=filename,
                context=context,
                cmd_options={'load-error-handling': 'ignore'})
    
            # write the rendered content to a file
            with open(settings.PDF_DIR + '/' + filename, "wb") as f:
                f.write(response.rendered_content)
    
            return HttpResponse('Hello, World!')
    

    This saved the PDF to disk and also did not respond with the PDF. Obviously a minimally functioning example that I can expand on, but at least got those two issues figured out.