Search code examples
c#pdfxmp

Get PDF XMP Metadata without loading the complete document


With libraries like iTextSharp or iText you can extract metadata from PDF documents via a PdfReader:

using (var reader = new PdfReader(pdfBytes))
{
    return reader.Metadata == null ? null : Encoding.UTF8.GetString(reader.Metadata);
}

These kind of libraries completely parse the PDF document before being able to soup up the metadata. This will, in my case, lead to high usage of system resources since we get many requests per second, with large PDF's.

Is there a way to extract the metadata from the PDF without completely loading it in memory first?


Solution

  • iText 5.x allows partial reading of PDFs, too, it merely looks a bit more complicated.

    Instead of

    using (var reader = new PdfReader(pdfBytes))
    

    use

    using (var reader = new PdfReader(new RandomAccessFileOrArray(pdfBytes), null, true))
    

    where the final true requests partial reading.