Search code examples
pythondjango-rest-frameworkpypdf

Why I have to convert bytes from BytesIO then convert back to BytesIO so it can be read as PDF file response?


I use PyPDF4 to merge pdf files, then I use the merged pdf as a HttpResponse. I use BytesIO to get the result from PdfFileMerger.

I got it working using this code

def mergePDF(listOfPDFFile):
    merger = PdfFileMerger()
    for file in listOfPDFFile:
        merger.append(PdfFileReader(file))
    _byteIo = BytesIO()
    merger.write(_byteIo)
    return _byteIo.getvalue()

Then when I use the method in APIView to return the merged pdf as a HttpResponse

class DocumentBundlePDFView(APIView):
    def get(self, request, format=None):
        '''
         here goes a process to assign list of document to documentList
        '''
        pdfBytes = mergePDF(documentList)
        pdfFile = io.BytesIO(pdfBytes)
        response = HttpResponse(FileWrapper(pdfFile), content_type='application/pdf')
        return response

But, why I have to create BytesIO object twice to get it working? Initially I return the _byteIO instance then directly pass the instance to FileWrapper but it output 0Kb file.

So I convert the _byteIO instance to bytes then create another BytesIO instance in APIView to get it working.

How can I simplify the code?


Solution

  • In your mergePDF function, instead of returning

    return _byteIo.getvalue()
    

    Do something to the effect of

    _byteIo.seek(0)
    return _byteIo
    

    Initially I return the _byteIO instance then directly pass the instance to FileWrapper but it output 0Kb file.

    The problem is when you write to the file-like object, the cursor is set to the last byte. Just move it back to the beginning, or else it will be like reading from an empty file.