Search code examples
pythonflaskpypdfpymupdf

How can I edit/modify/replace text in an existing PDF file?


I am working on my final year project, so I working on a website where a user can come and read PDF. I am adding some features such as converting currency to their country currency. I am using flask and pymuPDF for my project and I don't know how I can modify the text at a pdf anyone can help me with this problem?

I heard here that using pymuPDF or pypdf can work, but I didn't find any solution for replacing text.


Solution

  • Using the redaction facility of PyMuPDF is probably the adequate thing to do. The approach:

    1. Identify the location of the text to replace
    2. Erase the text and replace it using redactions

    Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.

    import fitz  # import PyMuPDF
    
    doc = fitz.open("myfile.pdf")
    page = doc[number]  # page number 0-based
    # suppose you want to replace all occurrences of some text
    disliked = "delete this"
    better   = "better text"
    hits = page.search_for("delete this")  # list of rectangles where to replace
    
    for rect in hit:
        page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
           align=fitz.TEXT_ALIGN_CENTER, ...)  # more parameters
    
    page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE)  # don't touch images
    doc.save("replaced.pdf", garbage=3, deflate=True)
    

    This works well with short text and medium quality expectations.

    With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.