Search code examples
pdfpostscripteps

How to get bounding boxes of elements in EPS files


I need to check if a EPS/PDF file contains any vector elements

First I convert the PDF to EPS and remove all text elements and images from the file like this

pdftocairo -f $page_number -l $page_number -eps $input - | sed '/BT/,/ET/ d' | sed '/^8 dict dup begin$/,/^Q$/ c Q' > $output

But how can I then check if any elements are written to the canvas?


Solution

  • What do you mean, exactly, by 'vector elements' ? Anything except an actual bitmap image ? Why do you care ? Perhaps if you explained what you want to achieve it would be easier to help you.

    Note that the approach you are using is by no means guaranteed to work, there can easily be 'elements' in the file which won't be removed by your rather basic approach to finding image.

    You could use Ghostscript; run the file to a bitmap and specify -dFILTERTEXT and -dFILTERIMAGES. Then examine the pixels fo the bitmap to see if any are non-white. If they are, then there was vector content i the file. You could probably use something like ImageMagick to count the colours and see if there's more than 1.

    Or run the file to bitmap twice, once normally, and once with -dFILTERVECTOR. Compare the two bitmaps (MD5 on them would be sufficient). If there are no differences then there was no vector content.