Java: 1.8
pdfbox: 2.0.18
preflight: 2.0.18
I can create a working PDF but our requirements is that it must conform to PDF/A standards. I've managed to fix all validation issues apart from metadata.
Without adding any meta data I get the error:
The fileexample.pdf is not valid, error(s) :
7.1 : Error on MetaData, Metadata is not a stream
Following some examples (No documentation exists for this under v2+) I've come up with the follow:
PDMetadata documentMetadata = new PDMetadata(document);
XMPMetadata xmpMetadata = XMPMetadata.createXMPMetadata();
xmpMetadata.createAndAddPFAIdentificationSchema();
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream out = new ByteArrayOutputStream();
serializer.serialize(xmpMetadata, out, false);
documentMetadata.importXMPMetadata(out.toByteArray());
catalog.setMetadata(documentMetadata);
Which gives me the error:
The fileexample.pdf is not valid, error(s) :
7.1 : Error on MetaData, xmp should start with a processing instruction
I've then tried changing the serializer to true for withXpacket and I get the following error:
org.apache.pdfbox.preflight.exception.ValidationException: Failed while validating
at org.apache.pdfbox.preflight.process.MetadataValidationProcess.validate(MetadataValidationProcess.java:162)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:102)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:141)
at org.apache.pdfbox.preflight.PreflightDocument.validate(PreflightDocument.java:166)
at uk.ac.port.pdf.Main.validate(Main.java:53)
at uk.ac.port.pdf.Main.main(Main.java:21)
Caused by: org.apache.pdfbox.preflight.exception.ValidationException: Schemas not found in the given metadata representation
at org.apache.pdfbox.preflight.metadata.RDFAboutAttributeConcordanceValidation.validateRDFAboutAttributes(RDFAboutAttributeConcordanceValidation.java:51)
at org.apache.pdfbox.preflight.process.MetadataValidationProcess.validate(MetadataValidationProcess.java:99)
... 5 more
At this point I clearly don't understand how this is meant to work. I've found a lot of different examples but they all appear to work for v1.8 and not v2+.
Could someone please provide me with a good working example of adding PDF/A metadata and schema to the PDF file? The website has no documentation, especially for PDF/A.
It turns out I was missing a simple step. There are default values setup but you have to tell it which part and conformance level your PDF/A document will follow. For example 1B.
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
XMPBasicSchema xmpschema = xmp.createAndAddXMPBasicSchema();
xmpschema.setCreatorTool(creatorTool);
xmpschema.setCreateDate(creationDate);
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle(title);
PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
id.setPart(1);
id.setConformance("B");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(baos.toByteArray());
catalog.setMetadata(metadata);
I now get:
The file example.pdf is a valid PDF/A-1b file
and when check with an online validator..
Compliance pdfa-1b