Search code examples
javapdfitextmetadata

Reading PDF metadata in Java using itextpdf 5.5.13.3 vs itext-core 8.0.3


I need to read some metadata from a PDF file. I have a code base on itextpdf library which does the job:

    static String getPdfFormVersion() throws IOException {

        InputStream inputStream = PDFVersionExtractor.class.getResourceAsStream("/test.pdf");
        final PdfReader pdfReader = new PdfReader(inputStream);
   
        final byte[] docMetaData = pdfReader.getMetadata();
        try (final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
            byteArrayOutputStream.write(docMetaData);
            final String fileXML = byteArrayOutputStream.toString(StandardCharsets.UTF_8.name());
            final String versionNode = fileXML.substring(fileXML.indexOf("<desc:version"), fileXML.indexOf("</desc:version>"));
            return versionNode;
        }
    }

the library which I am using is:

        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itextpdf</artifactId>
            <version>5.5.13.3</version>
        </dependency>

I would like to migrate to the newest version of itext:

         <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>itext-core</artifactId>
            <version>8.0.3</version>
            <type>pom</type>
        </dependency>
        <dependency>
            <groupId>com.itextpdf</groupId>
            <artifactId>kernel</artifactId>
            <version>8.0.3</version>
            <type>pom</type>
        </dependency> 

But unfortunately the .getMetaData() method is not available any longer. I have tried to find an equivalent in the PdfReader class but without any success. How can I extract PDF metadata using the newest version of itextpdf?


Solution

  • Thanks to @K J comments I have created a solution which seems to work same way as the one before:

    
    import com.itextpdf.kernel.pdf.*;
    
    import java.io.*;
    import java.nio.charset.StandardCharsets;
    
    public class PDFVersionExtractor {
    
        public static void main(String[] args) throws IOException {
            String pdfFormVersion = getPdfFormVersion();
            System.out.println(pdfFormVersion);
        }
    
        static String getPdfFormVersion() throws IOException {
            InputStream inputStream = PDFVersionExtractor.class.getResourceAsStream("/test.pdf");
            final PdfReader pdfReader = new PdfReader(inputStream);
           
            // create pdf document representation from the reader
            PdfDocument pdfDoc = new PdfDocument(pdfReader);
           
            // read xmpMetadata from the document
            final byte[] docMetaData = pdfDoc.getXmpMetadata();
           
            try (final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
                byteArrayOutputStream.write(docMetaData);
                final String fileXML = byteArrayOutputStream.toString(StandardCharsets.UTF_8);
                return fileXML.substring(fileXML.indexOf("<desc:version"), fileXML.indexOf("</desc:version>"));
            }
        }
    }
    

    What I have done is, I am using PdfDocument class in order to call the getXmpMetadata method. The returned byte[] seems to have the same information as the one returned by PdfReader.getMetaData.