Search code examples
pdfghostscriptpostscript

Get pdf Mediabox from pdf stream using postscript


I want to get the mediabox from a pdf stream (not from a file).

I currently have this postscript file (script.ps):

() = File dup (r) file runpdfbegin
/PDFPageCount pdfpagecount def

% Print out the Page Size info for each page.
() = 1 1 PDFPageCount {
    dup (Page ) print =print
    pdfgetpage dup
    /MediaBox pget {
      aload pop exch 4 1 roll exch sub 3 1 roll sub
      ( ) print =print ( ) print =print
    } if
    () = flush
  } for
() = quit

If I run it for a pdf file it works perfectly.

gs -sNODISPLAY -sFile=file.pdf script.ps

But i want it to run it over a stream:

cat file.pdf | gs -sNODISPLAY script.ps -_

Is this possible?


Solution

  • You can't 'stream' a PDF file, because it requires random access to the internals of the file in order to interpret it. (for example, the cross-reference table is normally stored towards the end of the file, and the offset to the cross reference table is stored at the end of the file)

    If you feed a PDF file to Ghostscript via stdin normally (ie not using your PostScript code) then Ghostscript writes it to a temporary file on disk before it starts processing it.

    Note that your PostScript code is highly Ghostscript-specific (its using PostScript extensions that only exist on Ghostscript) and won't work with any other interpreter.

    The code expects to read from a file:

    () = File dup (r) file runpdfbegin
    

    So that won't work. You would have to do the same trickery as Ghostscript's PDF interpreter and write the stdin to a file before you ran the interpreter. It hardly seems worth coding that in PostScript, probably easier to write it to a file and then invoke Ghostscript on the file.