Search code examples
gopdfxmp

PDF/A XMP Data is not picked up


I'm trying create a PDF/A file for invoices. Therefore I'm trying to set the XMP Headers for my file using the gofpdf Library. Setting the headers seem to work fine but the XMP Data is not recognised by any of my validators like exiftool or a validation website. I'm using the PDF library like this: You can find a reproducable example here.

    pdf, customerNumber, err := GeneratePDF(type, id, user, nil)
    if err != nil {
        return err
    }

    pointerVal := reflect.ValueOf(pdf.Fpdf)
    val := reflect.Indirect(pointerVal)

    member := val.FieldByName("pdfVersion")
    ptrToY := unsafe.Pointer(member.UnsafeAddr())
    realPtrToY := (*string)(ptrToY)
    *realPtrToY = "1.4"
    pdf.SetXmpMetadata(XMP_HEADER)

    err = s.SavePDFAndRespondWith(type, id, customerNumber, user, pdf)
    if err != nil {
        return err
    }

The content XMP content looks like this and is taken out of a working sample file. The sample file is not generated with Go and gofpdf.

     var XMP_HEADER = []byte(`
    <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
    <x:xmpmeta xmlns:x="adobe:ns:meta/">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title><rdf:Alt><rdf:li xml:lang="x-default" ></rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li></rdf:li></rdf:Seq></dc:creator><dc:subject><rdf:Bag><rdf:li></rdf:li></rdf:Bag></dc:subject><dc:format>application/pdf</dc:format><dc:description><rdf:Alt><rdf:li xml:lang="x-default" ></rdf:li></rdf:Alt></dc:description></rdf:Description>
    <rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/"><pdf:Producer>iTextSharp 4.1.0 (based on iText 2.1.0)</pdf:Producer><pdf:Keywords></pdf:Keywords></rdf:Description>
    <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/"><xmp:ModifyDate>2020-03-13T08:44:31+01:00</xmp:ModifyDate><xmp:CreatorTool>Symtrax - Compleo Suite</xmp:CreatorTool><xmp:CreateDate>2020-03-13T08:44:31+01:00</xmp:CreateDate></rdf:Description>
    <rdf:Description rdf:about="" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"><pdfaid:part>3</pdfaid:part><pdfaid:conformance>A</pdfaid:conformance></rdf:Description>
    <rdf:Description xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/" xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#" xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#" rdf:about=""><pdfaExtension:schemas><rdf:Bag><rdf:li rdf:parseType="Resource"><pdfaSchema:schema>Factur-X PDFA Extension Schema</pdfaSchema:schema><pdfaSchema:namespaceURI>urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#</pdfaSchema:namespaceURI><pdfaSchema:prefix>fx</pdfaSchema:prefix><pdfaSchema:property><rdf:Seq><rdf:li rdf:parseType="Resource"><pdfaProperty:name>DocumentFileName</pdfaProperty:name><pdfaProperty:valueType>Text</pdfaProperty:valueType><pdfaProperty:category>external</pdfaProperty:category><pdfaProperty:description>name of the embedded XML invoice file</pdfaProperty:description></rdf:li><rdf:li rdf:parseType="Resource"><pdfaProperty:name>DocumentType</pdfaProperty:name><pdfaProperty:valueType>Text</pdfaProperty:valueType><pdfaProperty:category>external</pdfaProperty:category><pdfaProperty:description>INVOICE</pdfaProperty:description></rdf:li><rdf:li rdf:parseType="Resource"><pdfaProperty:name>Version</pdfaProperty:name><pdfaProperty:valueType>Text</pdfaProperty:valueType> <pdfaProperty:category>external</pdfaProperty:category><pdfaProperty:description>The actual version of the Factur-X XML schema</pdfaProperty:description></rdf:li><rdf:li rdf:parseType="Resource"><pdfaProperty:name>ConformanceLevel</pdfaProperty:name><pdfaProperty:valueType>Text</pdfaProperty:valueType><pdfaProperty:category>external</pdfaProperty:category><pdfaProperty:description>The conformance level of the embedded Factur-X data</pdfaProperty:description></rdf:li></rdf:Seq></pdfaSchema:property></rdf:li></rdf:Bag></pdfaExtension:schemas></rdf:Description><rdf:Description xmlns:fx="urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#" rdf:about="" fx:ConformanceLevel="EN 16931" fx:DocumentFileName="factur-x.xml" fx:DocumentType="INVOICE" fx:Version="1.0"/>
</rdf:RDF></x:xmpmeta>
<?xpacket end="w"?>`)

When opening the result file (example) you can see the embedded XMP Data as:

<< /Type /Metadata /Subtype /XML /Length 3286 >>
stream
  <<< ... RDF ...  >>>
endstream
endobj
6 0 obj
<<
/Producer (FPDF 1.7)
/CreationDate (D:20200615175638)
>>
endobj
7 0 obj

This XMP just doesn't seem to get picked up by any validator or adobe.

Any help is appreciated.


Solution

  • You can see the data, but as far as PDF is concerned, it's just rubbish in the file that is never used. Valid XMP metadata needs to be announced in the PDF structure, specifically in the Catalog object. Your catalog object looks like this:

    7 0 obj
    <<
    /Type /Catalog
    /Pages 1 0 R
    >>
    endobj
    

    A healthy PDF file looks like this:

    5 0 obj
    <<
    /Metadata 2 0 R
    /Pages 1 0 R
    /Type /Catalog
    >>
    endobj
    

    Indentation and object numbers are of course not important. What is important is that the Catalog should contain a key named "Metadata" that points to your XMP stream. Paragraph 7.7.2 in my version of the PDF specification.

    So you'll need to find out how to make that happen using the library you have.

    PS: By the way, interestingly an XMP scanner app created to be file format agnostic (as originally at least the idea was), would pick up your XMP as it would just do a scan on the file searching for the XMP signature :)