Search code examples
pythonpython-3.xpymupdf

Python UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c


I am trying to open a file with PyMuPDF, do some edits, and then return it to the frontend.

Following there is the code

@app.post('/return_pdf')
async def return_pdf(uploaded_pdf: UploadFile):
    print("Filetype: ", type(uploaded_pdf)) # <class 'starlette.datastructures.UploadFile'>
    document =  fitz.open(stream=BytesIO(uploaded_pdf.file.read()))
    for page in document:
        for area in page.get_text('blocks'):
            box = fitz.Rect(area[:4])
            if not box.is_empty:
                page.add_rect_annot(box)
    
    return {'file': document.tobytes()}

The error I get is the following: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 702: invalid start byte

How can I solve this problem? Thanks in advance

Regarding reading the file, I tried several methods, but apparently BytesIO(uploaded_pdf.file.read()) was the only one accepted by PyMuPDF.

Regarding returning the file, I tried to return it directly, without converting in bytes, but I got a similar error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 10: invalid continuation byte

I though about changing the econding and tried to insert it into fitz.open() but it was not a param.


Solution

  • You can return a PDF file-like response by returning a 'FileResponse' object from the 'starlette.responses' module.

    from starlette.responses import FileResponse
    
    @app.post('/return_pdf')
    async def return_pdf(uploaded_pdf: UploadFile):
        document = fitz.open(stream=BytesIO(uploaded_pdf.file.read()), filetype="pdf")
        for page in document:
            for area in page.get_text('blocks'):
                box = fitz.Rect(area[:4])
                if not box.is_empty:
                    page.add_rect_annot(box)
        
        output_pdf = BytesIO()
        document.save(out_pdf)
        output_pdf.seek(0)
        
        return FileResponse(out_pdf, filename="edited.pdf")
    

    We create a object to hold the pdf which is named 'BiteIO,The edited pdf is saved in 'document.save()',Then reset the buffer position using 'output_pdf.seek(0)' and return as 'textFileResponse(filename)'.

    I hope this might help you.