Search code examples
.netpdfghostscript

GhostScript generates a blank PDF file on a specific PDF document


I'm using GhostScript (currently 9.27) to reduce the size of PDF files on my application before uploading them to a file server. The issue I'm facing is that some PDF files are converted to a blank PDF file, however, if I open the original PDF file with Adobe Acrobat and save it and then execute my GhostScript rutine it runs fine, the PDF is displayed and is correctly "compressed" (reduced quality).

I've tried different PDF settings, however the desired one is /ebook, so I would like to make it work with ebook quality. I'm using a GhostScript Wrapper (gonna post the code here) and the function I'm calling is:

RunGS("-dQUIET", "-dBATCH", "-dNOPAUSE", "-dNOGC", "-dPDFSETTINGS=/ebook", , "-sDEVICE=pdfwrite", "-sOutputFile=" & OUTPUT_FILE, INPUT_FILE)

It takes longer than usual when the final result is a blank PDF file and it returns this error:

I've just noticed I was getting an error callback... it says:

GhostScriptUnrecoverable error, exit code -100

This is the non working file (original): https://docdro.id/YuZslRm

And this the file after begin saved with Acrobat, which works fine: https://docdro.id/cAoUCS5

Here the wrapper, just in case:

Module GhostscriptDllLib

Private Declare Function gsapi_new_instance Lib "gsdll32.dll" _
  (ByRef instance As IntPtr, _
  ByVal caller_handle As IntPtr) As Integer

Private Declare Function gsapi_set_stdio Lib "gsdll32.dll" _
  (ByVal instance As IntPtr, _
  ByVal gsdll_stdin As StdIOCallBack, _
  ByVal gsdll_stdout As StdIOCallBack, _
  ByVal gsdll_stderr As StdIOCallBack) As Integer

Private Declare Function gsapi_init_with_args Lib "gsdll32.dll" _
  (ByVal instance As IntPtr, _
  ByVal argc As Integer, _
  <MarshalAs(UnmanagedType.LPArray, ArraySubType:=UnmanagedType.LPStr)> _
  ByVal argv() As String) As Integer

Private Declare Function gsapi_exit Lib "gsdll32.dll" _
  (ByVal instance As IntPtr) As Integer

Private Declare Sub gsapi_delete_instance Lib "gsdll32.dll" _
  (ByVal instance As IntPtr)

'--- Run Ghostscript with specified arguments

Public Function RunGS(ByVal ParamArray Args() As String) As Boolean

    Dim InstanceHndl As IntPtr
    Dim NumArgs As Integer
    Dim StdErrCallback As StdIOCallBack
    Dim StdInCallback As StdIOCallBack
    Dim StdOutCallback As StdIOCallBack

    NumArgs = Args.Count

    StdInCallback = AddressOf InOutErrCallBack
    StdOutCallback = AddressOf InOutErrCallBack
    StdErrCallback = AddressOf InOutErrCallBack

    '--- Shift arguments to begin at index 1 (Ghostscript requirement)

    ReDim Preserve Args(NumArgs)
    System.Array.Copy(Args, 0, Args, 1, NumArgs)

    '--- Start a new Ghostscript instance

    If gsapi_new_instance(InstanceHndl, 0) <> 0 Then
        Return False
        Exit Function
    End If

    '--- Set up dummy callbacks

    gsapi_set_stdio(InstanceHndl, StdInCallback, StdOutCallback, StdErrCallback)

    '--- Run Ghostscript using specified arguments

    gsapi_init_with_args(InstanceHndl, NumArgs + 1, Args)

    '--- Exit Ghostscript

    gsapi_exit(InstanceHndl)

    '--- Delete instance

    gsapi_delete_instance(InstanceHndl)

    Return True

End Function

'--- Delegate function for callbacks

Private Delegate Function StdIOCallBack(ByVal handle As IntPtr, _
  ByVal Strz As IntPtr, ByVal Bytes As Integer) As Integer

'--- Dummy callback for standard input, standard output, and errors

Private Function InOutErrCallBack(ByVal handle As IntPtr, _
  ByVal Strz As IntPtr, ByVal Bytes As Integer) As Integer

    Dim objString As String
    objString = Marshal.PtrToStringAnsi(Strz, Bytes)       
    Return 0

End Function

Any ideas about how to avoid this? I wouldn't mind to take an express process or something else. As I said this only happens with some specific files (we get them from our customers), but probably 98% of them are size reduced correctly.


Solution

  • OK so you say 'it doesn't prompt any error', however when I run your file here Ghostscript starts by saying:

    **** Warning: Discovered more entries in xref than declared in trailer /Size
       **** Warning:  File has an invalid xref entry:  2.  Rebuilding xref table.
    

    And then on every page says:

       **** Error: stream operator isn't terminated by valid EOL.
                   Output may be incorrect.
       **** Error: stream operator isn't terminated by valid EOL.
                   Output may be incorrect.
    

    and ends up with:

       **** This file had errors that were repaired or ignored.
       **** The file was produced by:
       **** >>>>  <<<<
       **** Please notify the author of the software that produced this
       **** file that it does not conform to Adobe's published PDF
       **** specification.
    
       **** The rendered output from this file may be incorrect.
    

    Which I would have said was a fairly large number of errors. Note that when you save the file from Acrobat it will, naturally, fix these syntax problems, so of course Ghostscript will then not complain, as the saved file is valid.

    That said, using a command line based on yours:

    "c:\program files\gs\gs9.27\bin\gswin64c" -sDEVICE=pdfwrite -sOutputFile=out.pdf -dBATCH -dNOPAUSE -dNOGC -dPDFSETTINGS=/ebook 20194114_EXPORT_DOCS_Original.pdf
    

    produces fewer warnings, because you've specified -dQUIET. If you're trying to investigate a problem, then suppressing warnings is probably not ideal. Are you seeing any of the back channel output from Ghostscript ? If so you should post it here as well. If not, then you need to implement code to capture it, it's important information.

    NB don't use -dNOGC, that's a debugging only switch. I know, people keep posting it as part of their command line, usually because they 'researched it' (found it on Google). Don't use it.

    Anyway, with that command line I get a PDF file which looks reasonable and is 20% the size of the original.

    Using your command line (or something as close to it as I can) doesn't reproduce the problem for me (either on 32-bit or 64-bit, using current code or the 9.27 release) so I can only speculate as to problems. If you had set -dPDFSTOPONERROR that would exit immediately on reading the file (with a lengthy error message), and would produce an empty PDF file. I can't think of any other way you could get that, especially 'with no error'.

    FWIW by default Ghostscript attempts to repair invalid PDF files, or at least ignore errors as far as possible. The PDFSTOPONERROR switch is intended for use in commercial environments where it's important that files which might not render correctly are flagged and checked/rejected/repaired rather than being wastefully printed.

    On which note; I notice that you appear to be using Ghostscript commercially, and are linking to the DLL. I feel I should point you to the licence under which Ghostscript is supplied (AGPL v3), you should probably check that your usage is valid under the terms of that license.