I am running into a problem where I am using IText 7 to check a PDF that a user has downloaded off the internet.
For my test case I created a text file with garbage in it and saved it as a pdf. I know its not valid.
In the code I am trying to open the PDF using PDFReader.
An exception is being thrown, this is expected. When debugging the code the Reader object is null when it gets to the finally spot. So the reader.close() isn't even firing. I am even copying the file to a temp directory just to ensure nothing else is holding the file.
I am then unable to delete the PDF file either in code or manually in a file explorer after the exception. Here is some of my code. I removed everything but the Reader part. Also this code is after I have tried a few things, so you are seeing my attempt with the file being copied to a temp file. I am attempted to delete the temp file in the finally part. That is failing on a corrupt file.
Here are both the exceptions that are thrown when attempting to validate a bad PDF. The first is from the PDFReader call.
2021-04-09 13:18:11,079 ERROR GUI.Form1 - PDF header not found.
iText.IO.IOException: PDF header not found. at
iText.IO.Source.PdfTokenizer.GetHeaderOffset() at
iText.Kernel.Pdf.PdfReader.GetOffsetTokeniser(IRandomAccessSource> byteSource) at
iText.Kernel.Pdf.PdfReader..ctor(String filename, ReaderProperties properties) at
iText.Kernel.Pdf.PdfReader..ctor(FileInfo file) at
GUI.Form1.validatePDF(FileInfo pdfFile, HashSet`1 tmpMd5s)
The Second is from the attempt to delete the temp file
2021-04-09 13:18:11,116 ERROR GUI.Form1 - The process cannot access the file
'C:\Users\ret63\AppData\Local\Temp\tmp27DE.tmp' because it is being used by another process.
System.IO.IOException: The process cannot access the file 'C:\Users\ret63\AppData\Local\Temp\tmp27DE.tmp' because it is being used by another process. at
System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileInfo.Delete() at
GUI.Form1.validatePDF(FileInfo pdfFile, HashSet`1 tmpMd5s)
PdfDocument pdfDoc = null;
PdfReader reader = null;
try
{
using (reader = new PdfReader(testFile))
{
//pdfDoc = new PdfDocument(reader);
//pdfDoc = new PdfDocument(new PdfReader(pdfFile.FullName));
//Console.WriteLine("Number of Pages: " + pdfDoc.GetNumberOfPages());
//pdfDoc.Close();
}
}
catch(Exception ex)
{
log.Error(ex.Message, ex);
throw new Exception("Invalid PDF File: " + pdfFile.Name);
}
finally
{
if (reader != null)
{
reader.Close();
}
if (pdfDoc != null && !pdfDoc.IsClosed())
{
pdfDoc.Close();
}
try
{
if (testFile.Exists)
{
testFile.Delete();
}
}
catch (Exception ee)
{
Console.WriteLine(ee.Message);
}
}
Looks like an iText bug. If you trace out what gets called by the PdfReader
constructor, you see that it creates a FileStream
that is conditionally locked. The FileStream
gets wrapped in a RandomAccessSource
which is then wrapped in a PdfTokenizer
in GetOffsetTokeniser
. If GetHeaderOffset
throws on line 1433, that tok
local is never closed.