Search code examples
pythonpdfpymupdf

Sclicing with pymupdf


I'd like to mark several keywords in a pdf document using Python and pymupdf.

The code looks as follows (source: original code):

import fitz

doc = fitz.open("test.pdf")

page = doc[0]

text = "result"

text_instances = page.searchFor(text)

for inst in text_instances:
    highlight = page.addHighlightAnnot(inst)
            highlight.setColors(colors='Red')
    highlight.update()


doc.save("output.pdf")

However, the text gets only marked on one page. I tried changing the code as described in the documentation for pymupdf (documentation) so it slices over all pages.

import fitz

doc = fitz.open("test.pdf")
for page in doc.pages(1, 3, 1):
    pass

text = "result"

text_instances = page.searchFor(text)

for inst in text_instances:
    highlight = page.addHighlightAnnot(inst)
    highlight.setColors(colors='Red')
    highlight.update()


doc.save("output.pdf")

Unfortunately, it still only marks the keywords on one page. What do I need to change, so the keywords get marked on all pages?


Solution

  • There are 2 major issues you had with your code:

    1. Indentation
    2. The start of the slice is zero-based

    Otherwise your understanding of the code seems fine.

    for page in doc.pages(1, 3, 1):
        pass
    

    If you want to loop over pages, you would need to put your highlight code inside the page loop. In addition, you are starting on page 2, not page 1 because page 1 is represented by index 0.

    #! /usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import fitz
    
    doc = fitz.open("test.pdf")
    
    text = "result"
    
    # page = doc[0]
    # for page in doc.pages(start, stop, step):
    for page in doc.pages(0, 3, 1):
        text_instances = page.searchFor(text)
    
        for inst in text_instances:
            highlight = page.addHighlightAnnot(inst) 
            highlight.setColors(colors='Red')
            highlight.update()
        
    doc.save("output.pdf")