Search code examples
pythonimagepdftext

Replacing a word with another word, and replacing an image with another image in a PDF file through python, is this possible?


I need to replace a K words with K other words for every PDF file I have within a certain path file location and on top of this I need to replace every logo with another logo. I have around 1000 PDF files, and so I do not want to use Adobe Acrobat and edit 1 file at a time. How can I start this?

Replacing words seems at least doable as long as there is a decent PDF reader one can access through Python ( Note I want to do this task in Python ), however replacing an image might be more difficult. I will most likely have to find the dimension of the current image and resize the image being used to replace the current image dynamically, whilst the program runs through these PDF files.

Hi, so I've written down some code regarding this:

from pikepdf import Pdf, PdfImage, Name
import os
import glob
from PIL import Image
import zlib

example = Pdf.open(r'...\Likelihood.pdf')
PagesWithImages = []
ImageCodesForPages = []  

# Grab all the pages and all the images in every page. 
for i in example.pages:
    if len(list(i.images.keys())) >= 1:
        PagesWithImages.append(i)
        ImageCodesForPages.append(list(i.images.keys()))

pdfImages = [] 

for i,j in zip(PagesWithImages, ImageCodesForPages):
    for x in j: 
        pdfImages.append(i.images[x])


# Replace every single page using random image, ensure that the dimensions remain the same?
for i in pdfImages:
    pdfimage = PdfImage(i)
    rawimage = pdfimage.obj
    im = Image.open(r'...\panda.jpg')
    pillowimage = pdfimage.as_pil_image()
    print(pillowimage.height)
    print(pillowimage.width)
    im = im.resize((pillowimage.width, pillowimage.height))
    im.show()
    rawimage.write(zlib.compress(im.tobytes()), filter=Name("/FlateDecode"))
    rawimage.ColorSpace = Name("/DeviceRGB")

So just one problem, it doesn't actually replace anything. If you're wondering why and how I wrote this code I actually got it from this documentation:

https://buildmedia.readthedocs.org/media/pdf/pikepdf/latest/pikepdf.pdf

Start at Page 53

I essentially put all the pdfImages into a list, as 1 page can have multiple images. In conjunction with this, the last for loop essentially tries to replace all these images whilst maintaining the same width and height size. Also note, the file path names I changed here and it definitely is not the issue.

Again Thank You


Solution

  • I have figured out what I was doing wrong. So for anyone that wants to actually replace an image with another image in place on a PDF file what you do is:

    from pikepdf import Pdf, PdfImage, Name
    from PIL import Image
    import zlib
    
    example = Pdf.open(filepath, allow_overwriting_input=True)
    PagesWithImages = []
    ImageCodesForPages = []  
    
    # Grab all the pages and all the images in every page. 
    for i in example.pages:
        imagelists = list(i.images.keys())
        if len(imagelists) >= 1:
            for x in imagelists:
                rawimage = i.images[x]
                pdfimage = PdfImage(rawimage)
                rawimage = pdfimage.obj 
                pillowimage = pdfimage.as_pil_image()
                im = Image.open(imagePath)
                im = im.resize((pillowimage.width, pillowimage.height))
                rawimage.write(zlib.compress(im.tobytes()), filter=Name("/FlateDecode"))
                rawimage.ColorSpace = Name("/DeviceRGB")
                rawimage.Width, rawimage.Height = pillowimage.width, pillowimage.height
    
    example.save()
    

    Essentially, I changed the arguments in the first line, such that I specify that I can overwrite. In conjunction, I also added the last line which actually allows me to save.