I am trying to open a file with PyMuPDF, do some edits, and then return it to the frontend.
Following there is the code
@app.post('/return_pdf')
async def return_pdf(uploaded_pdf: UploadFile):
print("Filetype: ", type(uploaded_pdf)) # <class 'starlette.datastructures.UploadFile'>
document = fitz.open(stream=BytesIO(uploaded_pdf.file.read()))
for page in document:
for area in page.get_text('blocks'):
box = fitz.Rect(area[:4])
if not box.is_empty:
page.add_rect_annot(box)
return {'file': document.tobytes()}
The error I get is the following:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 702: invalid start byte
How can I solve this problem? Thanks in advance
Regarding reading the file, I tried several methods, but apparently BytesIO(uploaded_pdf.file.read())
was the only one accepted by PyMuPDF.
Regarding returning the file, I tried to return it directly, without converting in bytes, but I got a similar error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 10: invalid continuation byte
I though about changing the econding and tried to insert it into fitz.open()
but it was not a param.
You can return a PDF file-like response by returning a 'FileResponse' object from the 'starlette.responses' module.
from starlette.responses import FileResponse
@app.post('/return_pdf')
async def return_pdf(uploaded_pdf: UploadFile):
document = fitz.open(stream=BytesIO(uploaded_pdf.file.read()), filetype="pdf")
for page in document:
for area in page.get_text('blocks'):
box = fitz.Rect(area[:4])
if not box.is_empty:
page.add_rect_annot(box)
output_pdf = BytesIO()
document.save(out_pdf)
output_pdf.seek(0)
return FileResponse(out_pdf, filename="edited.pdf")
We create a object to hold the pdf which is named 'BiteIO,The edited pdf is saved in 'document.save()',Then reset the buffer position using 'output_pdf.seek(0)' and return as 'textFileResponse(filename)'.
I hope this might help you.