Omit temporary files when compression PDF with ghostscript

In order to compress uploaded PDF-files before storing them into a database, I have this code in a mojolicious controller:

    # if > 100k compress with gs
    my $pdf;
    if ($size > 100_000) {
        # create tmp-file to be read by gs
        my $tmp_fn = '/tmp/badb_pdf_input.pdf';
        $file->move_to("$tmp_fn");

        use Capture::Tiny 'capture';
        my ($stdout, $stderr, $exit) = capture {
            my $cmd = '/usr/local/bin/gs';
            my @args = (qw( -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=- )) ;
            push @args, $tmp_fn;
            system($cmd, @args) == 0
                 or die "system @args failed: $!"
        };
        die "ERROR compressing pdf: $stderr" if $stderr;
        unlink $tmp_fn;
        $pdf = $stdout;
    } else {
        $pdf = $file->slurp;
    }

Does anyone know a way, to avoid the temporary file for input (/tmp/badb_pdf_input.pdf)?

Solution

OK firstly you aren't 'compressing the PDF file'. What you are doing is interpreting the original PDF file, creating a sequence of marking operations, and then creating a new PDF file from those marking operations. This is not the same thing and its important to appreciate the difference.

For example, one of the things that is then possible is colour converting the data, or reducing the resolution of images (both of which potentially take place when you select /ebook). If you merely 'compressed' the file you wouldn't be altering the data, so these sorts of changes wouldn't be possible.

However, you are also potentially losing information. The only target goal for Ghostscript's pdfwrite device is that the visual appearance should be unchanged (as far as is reasonable, if you change resolution and so on). Metadata may not be preserved. Indeed, the fact that the pdfwrite device doesn't preserve certain metadata (like embedded Illustrator files for example) is part of the reason that it can produce smaller PDF files.

I kn ow nothing about 'mojolicious' but you appear to be trying to send data to Ghostscript via stdin and read the resulting PDF back from stdout ?

If so then you will actually be creating a number of temporary files. It isn't possible to process a PDF file from stdin, in general, as the PDF format requires random access to the file. So if you pipe a PDF file into stdin, the first thing Ghostscript will do is create a temporary file and put the PDF file input from stdin into it. Then it can interpret the file. Also pdfwrite will create numerous temporary files as it goes about creating the output.

You 'can' select stdout as the destination for the PDF file, but.....

As I mentioned the PDF format is random-access, and its common practice to write portions of the file, leaving space for the bits you don't know yet, then rewind the file and fill them in when you do. Obviously this won't work with a non-seekable stream. Currently the pdfwrite device only does this when creating a Linearized (optimized for fast web view) PDF file, but I won't guarantee that future versions of pdfwrite won't require the ability to seek in the output file.

So the short answer is you can set OutputFile to be stdout, but it is not guaranteed to work.