Search code examples
javapdfpdfbox

Unable to add PDF/A valid meta data


Java: 1.8

pdfbox: 2.0.18

preflight: 2.0.18

I can create a working PDF but our requirements is that it must conform to PDF/A standards. I've managed to fix all validation issues apart from metadata.

Without adding any meta data I get the error:

The fileexample.pdf is not valid, error(s) :
7.1 : Error on MetaData, Metadata is not a stream

Following some examples (No documentation exists for this under v2+) I've come up with the follow:

PDMetadata documentMetadata = new PDMetadata(document);
XMPMetadata xmpMetadata = XMPMetadata.createXMPMetadata();
xmpMetadata.createAndAddPFAIdentificationSchema();
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream out = new ByteArrayOutputStream();
serializer.serialize(xmpMetadata, out, false);
documentMetadata.importXMPMetadata(out.toByteArray());
catalog.setMetadata(documentMetadata);

Which gives me the error:

The fileexample.pdf is not valid, error(s) :
7.1 : Error on MetaData, xmp should start with a processing instruction

I've then tried changing the serializer to true for withXpacket and I get the following error:

org.apache.pdfbox.preflight.exception.ValidationException: Failed while validating
    at org.apache.pdfbox.preflight.process.MetadataValidationProcess.validate(MetadataValidationProcess.java:162)
    at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:102)
    at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:141)
    at org.apache.pdfbox.preflight.PreflightDocument.validate(PreflightDocument.java:166)
    at uk.ac.port.pdf.Main.validate(Main.java:53)
    at uk.ac.port.pdf.Main.main(Main.java:21)
Caused by: org.apache.pdfbox.preflight.exception.ValidationException: Schemas not found in the given metadata representation
    at org.apache.pdfbox.preflight.metadata.RDFAboutAttributeConcordanceValidation.validateRDFAboutAttributes(RDFAboutAttributeConcordanceValidation.java:51)
    at org.apache.pdfbox.preflight.process.MetadataValidationProcess.validate(MetadataValidationProcess.java:99)
    ... 5 more

At this point I clearly don't understand how this is meant to work. I've found a lot of different examples but they all appear to work for v1.8 and not v2+.

Could someone please provide me with a good working example of adding PDF/A metadata and schema to the PDF file? The website has no documentation, especially for PDF/A.


Solution

  • It turns out I was missing a simple step. There are default values setup but you have to tell it which part and conformance level your PDF/A document will follow. For example 1B.

    XMPMetadata xmp = XMPMetadata.createXMPMetadata();
    
    XMPBasicSchema xmpschema = xmp.createAndAddXMPBasicSchema();
    xmpschema.setCreatorTool(creatorTool);
    xmpschema.setCreateDate(creationDate);
    
    DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
    dc.setTitle(title);
    
    PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
    id.setPart(1);
    id.setConformance("B");
    XmpSerializer serializer = new XmpSerializer();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    serializer.serialize(xmp, baos, true);
    PDMetadata metadata = new PDMetadata(document);
    metadata.importXMPMetadata(baos.toByteArray());
    catalog.setMetadata(metadata);
    

    I now get:

    The file example.pdf is a valid PDF/A-1b file
    

    and when check with an online validator..

    Compliance  pdfa-1b