Search code examples
c#itext

IText 7 in C# locking bad pdf


I am running into a problem where I am using IText 7 to check a PDF that a user has downloaded off the internet.

For my test case I created a text file with garbage in it and saved it as a pdf. I know its not valid.

In the code I am trying to open the PDF using PDFReader.

An exception is being thrown, this is expected. When debugging the code the Reader object is null when it gets to the finally spot. So the reader.close() isn't even firing. I am even copying the file to a temp directory just to ensure nothing else is holding the file.

I am then unable to delete the PDF file either in code or manually in a file explorer after the exception. Here is some of my code. I removed everything but the Reader part. Also this code is after I have tried a few things, so you are seeing my attempt with the file being copied to a temp file. I am attempted to delete the temp file in the finally part. That is failing on a corrupt file.

Here are both the exceptions that are thrown when attempting to validate a bad PDF. The first is from the PDFReader call.

2021-04-09 13:18:11,079 ERROR GUI.Form1 - PDF header not found.
iText.IO.IOException: PDF header not found. at
iText.IO.Source.PdfTokenizer.GetHeaderOffset() at
iText.Kernel.Pdf.PdfReader.GetOffsetTokeniser(IRandomAccessSource> byteSource) at
iText.Kernel.Pdf.PdfReader..ctor(String filename, ReaderProperties properties) at
iText.Kernel.Pdf.PdfReader..ctor(FileInfo file) at
GUI.Form1.validatePDF(FileInfo pdfFile, HashSet`1 tmpMd5s)

The Second is from the attempt to delete the temp file

2021-04-09 13:18:11,116 ERROR GUI.Form1 - The process cannot access the file
'C:\Users\ret63\AppData\Local\Temp\tmp27DE.tmp' because it is being used by another process.
System.IO.IOException: The process cannot access the file 'C:\Users\ret63\AppData\Local\Temp\tmp27DE.tmp' because it is being used by another process. at
System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileInfo.Delete() at
GUI.Form1.validatePDF(FileInfo pdfFile, HashSet`1 tmpMd5s)

PdfDocument pdfDoc = null;
PdfReader reader = null;

try
{
    using (reader = new PdfReader(testFile))
    {
        //pdfDoc = new PdfDocument(reader);
        //pdfDoc = new PdfDocument(new PdfReader(pdfFile.FullName));
        //Console.WriteLine("Number of Pages: " + pdfDoc.GetNumberOfPages());
        //pdfDoc.Close();
    }
}
catch(Exception ex)
{
    log.Error(ex.Message, ex);
    throw new Exception("Invalid PDF File: " + pdfFile.Name);
}
finally
{
    if (reader != null)
    {
        reader.Close();
    }
    if (pdfDoc != null && !pdfDoc.IsClosed())
    {
        pdfDoc.Close();
    }

    try
    {
        if (testFile.Exists)
        {
            testFile.Delete();
        }
    }
    catch (Exception ee)
    {
        Console.WriteLine(ee.Message);
    }
}

Solution

  • Looks like an iText bug. If you trace out what gets called by the PdfReader constructor, you see that it creates a FileStream that is conditionally locked. The FileStream gets wrapped in a RandomAccessSource which is then wrapped in a PdfTokenizer in GetOffsetTokeniser. If GetHeaderOffset throws on line 1433, that tok local is never closed.