How could I search the contents of PDF files in a directory/subdirectory? I am looking for some command line tools. It seems that grep
can't search PDF files.
Your distribution should provide a utility called pdftotext
:
find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' \;
The "-" is necessary to have pdftotext output to stdout, not to files.
The --with-filename
and --label=
options will put the file name in the output of grep.
The optional --color
flag is nice and tells grep to output using colors on the terminal.
(In Ubuntu, pdftotext
is provided by the package xpdf-utils
or poppler-utils
.)
This method, using pdftotext
and grep
, has an advantage over pdfgrep
if you want to use features of GNU grep
that pdfgrep
doesn't support. Note: pdfgrep-1.3.x supports -C
option for printing line of context.