Search code examples
asp.net-web-apipdfsharp

PdfSharp, ASP.NET Web API, and PDFs created by Word 2010


This is a pretty specific scenario. I have the following code within a function to read PDF files that users upload to my ASP.NET Web API:

Using pd As PdfSharp.Pdf.PdfDocument = PdfSharp.Pdf.IO.PdfReader.Open(filePath, PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly)
    For Each page As PdfSharp.Pdf.PdfPage In pd.Pages
        Dim seq As PdfSharp.Pdf.Content.Objects.CSequence
        Try
           seq = PdfSharp.Pdf.Content.ContentReader.ReadContent(page)
        Catch ex As Exception
            // log the exception
        End Try
        // do stuff with seq
    Next
End Using

The code above works fine for a bunch of PDFs I have. However, when I create a PDF using Word "save as" and then use that to test with, the thread just completely exits. It gets to the Try block, calls the ReadContent function, and then exits. Never reaches the code below. Never throws an exception. The enclosing function never completes, the API Controller never completes. Nothing throws exceptions. The whole request thread just goes away.

I don't even know how this is possible.


Solution

  • With the sample PDF you supplied on the PDFsharp forum, the parser runs into an endless loop. ASP.NET is not my area of expertise, but I guess that ASP.NET will silently kill the thread after a time-out - no exception, no logging.

    A few lines of code will solve this problem.

    See your thread on the PDFsharp forum:
    http://forum.pdfsharp.net/viewtopic.php?p=7911#p7911