Search code examples
javapdfconverterspdfa

Trouble with conversion from pdf to pdf/a using jobconverter


Trying to convert pdf to pdf/a using http://kapion.ru/convert-to-pdfa-with-jodconverter/ guide.

After execution getting encoded pdf file. it looks like:

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(de-CH) /StructTreeRoot 17 0 R/MarkInfo<</Marked 
true>>>>
endobj
2 0 obj

code:

@Test
public void a() throws OfficeException {
    OfficeManager officeManager = LocalOfficeManager.make();
    DocumentConverter converter = LocalConverter.make(officeManager);
    try {
        officeManager.start();
        File inputFile = new File("C:/Users/user/Desktop/9.pdf");
        File pdfFile = new File("C:/Users/user/Desktop/Output/9.pdf");
        DocumentFormat pdfFormat = getDocumentFormatPDFA();
        converter.convert(inputFile).to(pdfFile).as(pdfFormat).execute();
    } catch (OfficeException e) {
        e.printStackTrace();
    } finally {
        if (officeManager.isRunning())
            officeManager.stop();
    }
}

private static DocumentFormat getDocumentFormatPDFA() {
    // PDF/A version
    final int PDFX1A2001 = 1;
    final Map<String, Integer> pdfOptions = new HashMap<>();
    pdfOptions.put("SelectPdfVersion", PDFX1A2001);
    return DocumentFormat.builder()
            .inputFamily(DocumentFamily.TEXT)
            .name("PDF/A")
            .extension("pdf")
            .mediaType("pdf")
            .storeProperty(DocumentFamily.TEXT, "FilterData", pdfOptions)
            .storeProperty(DocumentFamily.TEXT, "FilterName", "writer_pdf_Export")
            .unmodifiable(false)
            .build();
}

Could you, please, help me with that?


Solution

  • OpenOffice does not support pdf reading. The way how it is could be possible to convert pdf using external libs into docx\html, then convert it into pdf/a using OpenOffice.