Search code examples
c#ghostscriptpdfaghostscript.net

How to generate proper PDF/A with embedding IIC profile using Ghostscript and Ghostscript .NET


Current scenario:

I'm trying to generate proper and conformant PDF/A, based on normal PDF documents, and after spend some hours in investigation, we've decided to make use of Ghostscript capabilities. This bussiness requirement has been set for a bigger C# project I'm working in, but first of all I started some testing with Ghostscript commands over Windows context on the one hand, and created an isolated console application that uses Ghostscript .NET on the other, to test viability of this feature.

We concentrated efforts in PFD/A-1B format for this first test, and make use of VeraPDF and PDF-Tools to check conformance for generated files.

The following tests have been completed with a few different PDF files, some of them were files actually generated by our project application. For simplicity, and in case anyone wants to check, I provide a simple PDF (with only a few lines of text) which has been used and tested in same way and that reproduces same behavior.

Download PDF for testing

Ghostscript command testing

Execution

Using GhostScript v 9.52, I tried the following command:

gswin32c.exe -dNOSAFER -dPDFA=1 -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -dNOPAUSE -dBATCH -o result.pdf "C:\GS_PDFA\PDFA_def.ps" WriterPDF.pdf

*Note: Even I read that -dNOSAFER parameter is not recommended, I wasn't able to generate PDF without it for /invalidfileaccess errors. I suspect that access permissions are the cause, as found searching all over Stackoverflow (GhostScript: Error: /invalidfileaccess in --file--), but still haven't found any solution that works for me.

Also tried following command but same error (located desired ICC profile in same file as .ps template file):

gswin32c.exe --permit-file-read=c:/GS_PDFA/srgb.icc -dPDFA=1 -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -dNOPAUSE -dBATCH -o result2.pdf C:/GS_PDFA/PDFA_def_FULL.ps WriterPDF.pdf 

For PDFA profile, I tried provinding default PDFA_def.ps template found on /lib, inside Ghostscript installation directory. After that, tried with PDFA_def.ps template file, updating lines:

/ICCProfile (C:/GS_PDFA/srgb.icc)

and

/OutputConditionIdentifier (sRGB)

Result and validation

Result: Download PDF generated by command line

VeraPDF says:

PDF file is compilant with Validation Profile requirements

PDF-Tools says:

The document does conform to the PDF/A-1b standard.

In addition, when opened with Adobe Reader DC, conformance tab shows all detailed info for the selected format (PFD/A-1B), but does not display OutputIntent, even PDFA_def.ps template was set as parameter, and sRGB ICC profile figured inside the template file. Adobe conformance status missed OutputIntend capture

Ghostscript .NET console application:

Execution

I tried writing code based on same parameters used during Ghostscript testing:

string outputFile = @"C:\temp\output.pdf";
string inputFile = @"C:\temp\WriterPDF.pdf";

GhostscriptPipedOutput gsPipedOutput = new GhostscriptPipedOutput();

// pipe handle format: %handle%hexvalue
string outputPipeHandle = "%handle%" + int.Parse(gsPipedOutput.ClientHandle).ToString("X2");

using (GhostscriptProcessor processor = new GhostscriptProcessor())
{
    List<string> switches = new List<string>();
    switches.Add("-empty");
    switches.Add("-dPDFA=1");
    switches.Add("-sColorConversionStrategy=RGB");
    switches.Add("-dPDFACompatibilityPolicy=1");
    switches.Add("-dBATCH");
    switches.Add("-dNOPAUSE");
    switches.Add("-sDEVICE=pdfwrite");
    switches.Add("-o" + outputPipeHandle);
    //switches.Add("c:/GS_PDFA/PDFA_def.ps");
    switches.Add(inputFile);

    try
    {
        processor.StartProcessing(switches.ToArray(), null);

        byte[] rawDocumentData = gsPipedOutput.Data;
        
        File.WriteAllBytes(outputFile, rawDocumentData);

    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);

        Console.ReadLine();
    }
    finally
    {
        gsPipedOutput.Dispose();
        gsPipedOutput = null;
    }
}

*Note: Notice that -dNOSAFER parameter is not used this time. If included, result is the same, no additional information or detailed error. If commented line (switches.Add("c:/GS_PDFA/PDFA_def.ps");) is included, then the application raises error:

An error occured when call to 'gsapi_init_with_args' is made: -100

I tried to prevent error usign another location for template file, but unsuccesfully. Also added code line on top: switches.Add("-Ic:/GS_PDFA/"); but same error.

Result and validation

Result: Download PDF generated by GS .NET DLL

VeraPDF says:

If no PDFA_def.ps template file is set, the resultant file does not pass the validation check.

PDF file is not compilant with Validation Profile requirements

PDF-Tools says:

The document does conform to the PDF/A-1b standard.

In addition, when opened with Adobe Reader DC, conformance tab shows all detailed info for the selected format (PFD/A-1B), and now OutputIntent is present, but the details are incomplete, as Identifier and Info values are not shown. Adobe conformance status OutputIntend incomplete capture

Questions

  • According Ghostscript commands, is there a way to generate PDF/A with proper ICC information? For what I've seen, none of the results were really satifactory, so what am I suppose to do to embed this info succesfully in PDF/A generated files?
  • Guessing that Ghostscript commands would do the trick to achieve an conformat PDF/A fil with proper ICC profile inlcuded, and since we plan to use Ghostscript .NET, how can I inlcude the PDF/A template file as parameter in C# code?

Thanks a lot in advance.

[EDIT]

I was not able to change permissions using --permit-file-read. I usually have ps and icc files in C:\GS_PDFA, but tried with them on GS local installation folder, but always the same error:

Error: /invalidfileaccess in --file-- Operand stack: --nostringval-- --nostringval-- (C:/GS_PDFA/srgb.icc) (r) Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Dictionary stack: --dict:741/1123(ro)(G)-- --dict:0/20(G)-- --dict:76/200(L)-- Current allocation mode is local Last OS error: Permission denied Current file position is 2118

Made a lot of tests with console application using Ghostscript .NET, even placing PDFA_def.ps and srgb.icc files inside solution folder, and same error. Tried locating main GS installation files in C:\GS_PDFA, including ICC profile (srgb.icc), open command prompt and tested again, using Ghostscript commands, but all unsuccessful.

Here are some examples of the commands I tried:

--permit-file-read=c:/GS_PDFA/srgb.icc
 --permit-file-read="c:/GS_PDFA/srgb.icc"
 --permit-file-read="c:/GS_PDFA/srgb.icc"
 --permit-file-read=srgb.icc
 --permit-file-read="c:\GS_PDFA\srgb.icc"
 --permit-file-read="/srgb.icc"
 --permit-file-read=/srgb.icc
 --permit-file-read="\srgb.icc"
 --permit-file-read=\srgb.icc
 --permit-file-read=c:/GS_PDFA/
 --permit-file-read="c:/GS_PDFA/"
 --permit-file-read=c:\GS_PDFA\
 --permit-file-read=c:/GS_PDFA/****.icc
 --permit-file-read=c:/GS_PDFA/*.icc
 --permit-file-read=c:/GS_PDFA/*

I tried moving files, changing location, folder, etc. I tried changing folder isntallation, even with Ghostscriptx64... Is there something i missed about installation?

Please, does anybody have a working sample for windows which could help me?


Solution

  • You should not use -dNOSAFER, you should instead add files/directories to the permitted file reading list using the --permit-file-read switch. The file which needs to be read is the OutputIntent profile which is one of the main ingredients of the pdfa_def.ps file. See below.

    If you do not include the pdfa_def.ps file then you will not get an OutputIntent in the final PDF/A file and it will not be PDF/A compliant (unless you specify UseDeviceIndependentColor as the ColorConversionStrategy). That's why your code example doesn't work. Noticing that PDF-Tools still says the file is valid, I would stop using that as a validator, it clearly isn't reliable. I've found VeraPDF to be the best validator personally (it's better than the Acrobat built-in verification).

    I'm surprised that the command line you have shown at the top of the question produces a valid PDF/A file, unless you have modified the pdfa_def.ps file? You are supposed to and in particular you must modify the value associated with the /ICCProfile key. That value (a string inside parentheses) needs to be a fully qualified path to the ICC profile and either the ICC profile file or the directory it resides in needs to be added to the permitted list of files to read see the documentation here under -dSAFER.

    Assuming you have done so, then the resulting PDF file should be a PDF/A-1b conformant file. And indeed according to your question, VeraPDF says it is conformant so I'm unclear on what your problem is there. It would be much more useful to share the input and output PDF files rather than a picture of (part of) the Acrobat display.

    So to answer your questions:

    1. Yes there is a way to generate a PDF/A file with ICC information (it isn't valid if it doesn't have an OutputIntent) and your command line does so. If you have not modified the pdfa_def.ps file appropriately you may still have a problem.

    2. As far as I know you run the pdfa_def.ps file using Ghostscript.NET in exactly the same way as you do on the command line, you just put it in the list of arguments. So you just need to uncomment the line you've commented. Of course, you haven't included -dNOSAFER, nor added the ICC profile to the list of permitted files to read, so you will get an error. I am surprised you are getting a fatal error though, I'd expect an invalidaccess, but the obvious thing to do is to add -dNOSAFER to the arguments. The back channel output might be useful, it may have more information, and you haven't included that.

    Oh, and I would not write to a pipe either. The pdfwrite device expects to be writing to a file and it may try to seek within the file while writing it. If it does and you've written to a pipe (or other non-seekable output), then it's going to fail.

    You don't need to add -f to the argument list, and this:

    switches.Add("-dNOPAUSEgsArgs");
    

    looks suspicious to me, that looks like it ought to be -dNOPAUSE.

    Finally, if you intend to distribute this application you should check the terms of the AGPL, I believe that Artifex will consider the use of Ghostscript.NET and the Ghostscript DLL to be a 'derivative work' and you may need a commercial license.

    Edit

    The output_gscommand.pdf has this:

    1 0 obj
    <</Type /Catalog /Pages 3 0 R
    /OutputIntents [ 5 0 R ]
    /Metadata 27 0 R
    >>
    
    5 0 obj
    <</OutputConditionIdentifier(sRGB)
    /DestOutputProfile  4 0 R 
    /S/GTS_PDFA1
    /Type/OutputIntent>>
    endobj
    

    So that's an OutputIntent specified in the Catalog, the only OutputIntent has a PDFA1 identifier, a valid OutputConditionIdentifier (which is only used for human-readable information), and an ICC profile. As far as I can see that's entirely valid.

    Both VeraPDF and the preflight tool in Adobe Acrobat X (Pro) verify the PDF file as conforming. So I think the file is a conforming PDF/A file (the Acrobat X preflight tool also lists the OutputIntent as sRGB(Custom) ICC OutputProfile: "Artifex Software sRGB ICC Profile").

    I've no idea why DC is not showing the OutputIntent, I can see no problems with the file.