With libraries like iTextSharp or iText you can extract metadata from PDF documents via a PdfReader:
using (var reader = new PdfReader(pdfBytes))
{
return reader.Metadata == null ? null : Encoding.UTF8.GetString(reader.Metadata);
}
These kind of libraries completely parse the PDF document before being able to soup up the metadata. This will, in my case, lead to high usage of system resources since we get many requests per second, with large PDF's.
Is there a way to extract the metadata from the PDF without completely loading it in memory first?
iText 5.x allows partial reading of PDFs, too, it merely looks a bit more complicated.
Instead of
using (var reader = new PdfReader(pdfBytes))
use
using (var reader = new PdfReader(new RandomAccessFileOrArray(pdfBytes), null, true))
where the final true
requests partial reading.