I'm writing a script to highlight text from a list of quotes in a PDF. The quotes are in the list text_list
. I use this code to highlight the text in the PDF:
import fitz
#Load Document
doc = fitz.open(filename)
#Iterate over pages
for page in doc:
# iterate through each text using for loop and annotate
for i, text in enumerate(text_list):
rl = page.search_for(text, quads = True)
page.add_highlight_annot(rl)
# Print how many results were found
print(str(i) + " instances highlighted in pdf")
I now want to get a list of the quotes that were not found and highlighted and was wondering if there is any simple way to get a list of the matches page.search_for
found (or of those quotes it didn't find).
The list of hit rectangles / quads rl
will be empty if nothing was found.
I suggest you check if rl == []:
and depend adding highlights on this as well as adding the respective text to some no_hit
list.
Probably better the other way round:
Your text list better should be a Python set
. If a text was ever found put it in another, found_set
. At end of processing subtract (set difference) the found set from text_list
set.