Search code examples
c#gembox-document

GemBox DocumentModel.Load() cannot read Pdf file


Currently i am unable to load original pdf document using GemBox. it gives me below error in image. and I am using Acrobat 9.

I have tried using 8/16/2018 fixes too. Any suggestion will be highly appreciated.

Basic Code i am using is,

using GemBox.Document;
using System;

namespace Pdf2Text
{
   class Program
   {

      [STAThread]
      static void Main(string[] args)
      {
          ComponentInfo.SetLicense("My-License");

          DocumentModel document = null;
          document = DocumentModel.Load(@"E:\data\testing\HA021.pdf");
          document.Save(@"E:\data\testing\HA021.docx");
      }
    }
}

Solution

  • EDIT:

    In the newer versions of GemBox.Document there is another PDF reader that is intended for high-fidelity tasks, see Convert PDF to Word.

    Here is how to use it:

    var document = DocumentModel.Load("Sample.pdf",
        new PdfLoadOptions() { LoadType = PdfLoadType.HighFidelity });
    document.Save("Sample.docx");
    

    ORIGINAL:

    The current implementation of PDF reader in GemBox.Document is still in beta and cannot handle this PDF feature, "iref streams" which are cross-reference tables stored in streams.

    However, GemBox.Pdf can handle cross-reference streams so as a workaround you could do something like the following:

    // Load PDF with GemBox.Pdf.
    var pdfDocument = PdfDocument.Load("Sample.pdf");
    pdfDocument.SaveOptions.CrossReferenceType = PdfCrossReferenceType.Table;
    
    // Save PDF with GemBox.Pdf.
    var pdfStream = new MemoryStream();
    pdfDocument.Save(pdfStream);
    
    // Load PDF with GemBox.Document.
    var document = DocumentModel.Load(pdfStream, LoadOptions.PdfDefault);
    

    Last regarding the conversion of PDF to DOCX, GemBox.Document's PDF reader is currently intended for extracting text and tables from PDF files, it's not intended for any high fidelity requirement.