Search code examples
javavalidationpdf-generationpdfboxpdfa

PdfBox: PDF/A-1A to PDF/A-3A


i have the following problem: i want to transform a PDF/A-1A document to a PDF/A-3A. The original document is validated by Arobat Reader Pro, so i can asume it is PDF/A-1A conform.

I try to convert the PDF metadata with the following code:

private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException  {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);

XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);

XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");

XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());

PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);

XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");

PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);

pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");

metadata.importXMPMetadata(xmp);

return cat;

}

If i try to validate the new file with Acrobat again, i get a validation error:

CIDset in subset font is incomplete (font contains glyphs that are not listed)

if i try to validate the file with this online validator (http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx) it is a valid PDF/A-3A....

am i missing something?

is nobody able to help?

EDIT: Here is the PDF file


Solution

  • This worked for us to be fully PDF/A-3 compliant regarding the CIDset issue:

    private void removeCidSet(PDDocumentCatalog catalog) {
    
      COSName cidSet = COSName.getPDFName("CIDSet");
    
      // iterate over all pdf pages
      for (Object object : catalog.getAllPages()) {
        if (object instanceof PDPage) {
    
          PDPage page = (PDPage) object;
          Map<String, PDFont> fonts = page.getResources().getFonts();
          Iterator<String> iterator = fonts.keySet().iterator();
    
          // iterate over all fonts
          while (iterator.hasNext()) {
            PDFont pdFont = fonts.get(iterator.next());
    
            if (pdFont instanceof PDType0Font) {
              PDType0Font typedFont = (PDType0Font) pdFont;
    
              if (typedFont.getDescendantFont() instanceof PDCIDFontType2Font) {
                PDCIDFontType2Font f = (PDCIDFontType2Font) typedFont.getDescendantFont();
                PDFontDescriptor fontDescriptor = f.getFontDescriptor();
    
                if (fontDescriptor instanceof PDFontDescriptorDictionary) {
                  PDFontDescriptorDictionary fontDict = (PDFontDescriptorDictionary) fontDescriptor;
                  fontDict.getCOSDictionary().removeItem(cidSet);
                }
              }
            }
          }
        }
      }
    }