Search code examples
stringsearchgrepcygwin

How does grep return a result, then when opened, control+f cannot find?


I ran the following

grep -irln "mold" 

against a directory using cygwin on my Windows 7 Enterprise machine at work and it found a match in a particular pdf file. However, when I open the file via adobe or chrome and do control+f and search for mold, no results are found. This PDF has been through an OCR service. So I guess my question is how is it possible for grep to return results but then do a ctrl+f on the open file and get nothing?


Solution

  • It seems you are misunderstanding that grep looks for every occurance in a file and that a PDF file is a written in markup language to render the graphical appearence of text and images.
    Using a very simple text file as example

    $ cat << EOF > example.txt
    > one dog
    > two cats
    > three chickens
    > EOF
    

    we convert it to postscript and than to pdf

    $ a2ps example.txt -o example.ps
    [example.txt (plain): 1 page on 1 sheet]
    [Total: 1 page on 1 sheet] saved into the file `example.ps'
    $ ps2pdf example.ps example.pdf
    

    so we have 3 files with the same text, but the postscript and the PDF have their specific markup languange around the original text.
    Now if we ask grep to look for the chicken

    $ grep chicken example.*
    example.ps:(three chickens) N
    example.txt:three chickens
    

    you can see that the PDF file does not contain chicken as plain text. This is because the original text is compressed inside the PDF.

    Your result of mold is a false positive. The text inside the PDF is compressed and grep can not find it.