Search code examples
pdfmetadataapache-foppdfa

PDF/A conforming metadata with FOP


I can't get a PDF/A-1a (not even PDF/A-1b according to pdfbox preflight) conforming PDF with metadata with FOP 2.1.

Let's say I want to set date, language, title and description:

<fo:declarations xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" 
  xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  xml:lang="de">
    <x:xmpmeta xmlns:x="adobe:ns:meta/" id="hc_meta">
        <rdf:RDF>
            <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="">
                <xmp:CreatorTool>hx</xmp:CreatorTool>
                <dc:language>
                    <rdf:Bag>
                        <rdf:li>de</rdf:li>
                    </rdf:Bag>
                </dc:language>
                <dc:title>
                    <rdf:Alt>
                        <rdf:li xml:lang="de">Schrieb 2016-003 - Dings AG</rdf:li>
                    </rdf:Alt>
                </dc:title>
                <dc:creator>
                    <rdf:Seq>
                        <rdf:li>hxxxdingens Consulting GmbH, Rodger Moore</rdf:li>
                    </rdf:Seq>
                </dc:creator>
                <dc:description>
                    <rdf:Alt>
                        <rdf:li xml:lang="de">Schrieb 2016-003 - Dings AG XXX R 7 99 3 - 2016-06-30 (2016:06:30)</rdf:li>
                    </rdf:Alt>
                </dc:description>
                <dc:date>
                    <rdf:Seq>
                        <rdf:li>2016:06:30</rdf:li>
                    </rdf:Seq>
                </dc:date>
            </rdf:Description>
        </rdf:RDF>
    </x:xmpmeta>
</fo:declarations>

Then the output will not conform:

$ java -jar ~/prog/hcbriefe/preflight-app-2.0.2.jar test_1.pdf
The file test_1.pdf is not valid, error(s) :
7.2 : Error on MetaData, Title present in the document catalog dictionary can't be found in XMP information (Property is not defined)
7.2 : Error on MetaData, Subject present in the document catalog dictionary can't be found in XMP information (Subject not found in XMP (dc:description["x-default"] not found))

But when I call exiftool to set title and description on the PDF, it will pass this test:

$ cp test_1.pdf test_1mod.pdf
$ exiftool -title="Schrieb 2016-003 - Dings AG" \
  -description="Schrieb 2016-003 - Dings AG XXX R 7 99 3 - 2016-06-30 (2016:06:30)" \
   test_1mod.pdf
    1 image files updated

$ java -jar ~/prog/hcbriefe/preflight-app-2.0.2.jar test_1mod.pdf
The file test_1mod.pdf is a valid PDF/A-1b file

What do I have to put in the fo metadata to make it conforming out-of-the-box or straight out of FOP?


Solution

  • After some comparing I found out. The language in the description and title elements may not be set to de but must be set to x-default like

          ...
          <dc:title>
              <rdf:Alt>
                  <rdf:li xml:lang="x-default">Schrieb 2016-003 - Dings AG</rdf:li>
              </rdf:Alt>
          </dc:title>
          ...
          <dc:description>
              <rdf:Alt>
                  <rdf:li xml:lang="x-default">Schrieb 2016-003 - Dings AG XXX R 7 99 3 - 2016-06-30 (2016:06:30)</rdf:li>
              </rdf:Alt>
          </dc:description>
          <dc:date>
    <!-- some validators will complain if date has : instead of - !! -->
              <rdf:Seq>
                  <rdf:li>2016-06-30</rdf:li>
              </rdf:Seq>
          </dc:date>
          ...
    

    Then it will pass the pdfbox preflight test.

    Additionally, date must have - separators between y, m, d to conform with the online pdf-tools.com validator.