Search code examples
c#pdfpdfsharp

PdfSharp.SharpZipLib.SharpZipBaseException - Header checksum illegal


I'm really stuck, trying to grab the first page of a PDF which isn't password protected (i.e. Adobe Reader can open it just fine without a prompt), I'm getting an error returned when I call PdfReader.Open()

using (var pdfStream = new MemoryStream(_underlyingBytes))
{
    using (var allPages = PdfReader.Open(pdfStream, string.Empty, PdfDocumentOpenMode.ReadOnly))
    {
        if (allPages.PageCount < 1) throw new ArgumentException("PDF has no pages");

                using (var firstPage = new PdfDocument())
                {
                    firstPage.AddPage(allPages.Pages[0]);

                    using (var stream = new MemoryStream())
                    {
                        firstPage.Save(stream);
                        _underlyingBytes = stream.ToArray();

                        return this;
                    }
                }
     }
}

EDIT

Here's the PDF I'm trying to open

And if anyone was wondering where _underlyingBytes is populated:

using (var stream = new MemoryStream())
{
    blob.DownloadToStream(stream);
    stream.Position = 0;
    _underlyingBytes = stream.ToArray();
}

Solution

  • Well, I didn't manage to get this working using PDFSharp, but I did finally manage to using ITextSharp. They expose a static property "unethicalreading" which allows you to open PDFs which are password protected - still not 100% on why this is the case if adobe reader can open the PDF without a password...

    Anyway, code now is:

    using (var pdfStream = new MemoryStream(_underlyingBytes))
    {
        PdfReader.unethicalreading = true;
        using (var reader = new PdfReader(pdfStream))
        {
            if (reader.NumberOfPages < 1) throw new ArgumentException("PDF has no pages");
    
            using (var document = new Document(reader.GetPageSizeWithRotation(1)))
            {
                using (var outputStream = new MemoryStream())
                {
                    using (var pdfCopyProvider = new PdfCopy(document, outputStream))
                    {
                        document.Open();
    
                        var importedPage = pdfCopyProvider.GetImportedPage(reader, 1);
                        pdfCopyProvider.AddPage(importedPage);
    
                        document.Close();
                        reader.Close();
    
    
                        _underlyingBytes = outputStream.ToArray();
                        return this;
                    }
                }
            }
        }
    }
    

    This works well, and the MemoryStream could easily be replaced with a FileStream if the intent was to write the first page to disk.

    Hopefully this helps someone in future.