Search code examples
c#sqlpdfadobecorrupt

PDF file cannot be opened in adobe upon download from website but can be opened in Edge


I have a web app running aspx, it is a bit of a legacy app. During submit, users upload supporting documents, usually pictures converted to pdf, to our sql server where they are stored as binaries and they can downloaded again at various times during approval.

However, we have just started getting an issue now where our users can not open the pdf's in adobe and get the dreaded "The file is damaged and could not be repaired." error message. They can still be opened in MS Edge, so they are not actually corrupted. I have verified that the pdf's can be opened fine before being uploaded.

HttpPostedFile file = this.attachmentUploader.PostedFile;
if (file == null)
{
    file = Session["postedFile"] as HttpPostedFile;
}

if (file != null)
{
    var fileName = this.attachmentUploader.FileName;
    fileName = fileName.Length >= 100 ? string.Concat(fileName.Substring(0, 50).Trim(), ".pdf") : fileName;
    Attachment attachment = new Attachment()
    {
        FileName = fileName,
        File = this.attachmentUploader.FileBytes
    };

    db.Attachments.Add(attachment);
    db.SaveChanges();
}

This is the download code

byte[] file = null;
// Code here to pull file from db
if (file != null)
{
    Response.Buffer = true;
    Response.ContentType = "application/pdf";
    Response.AddHeader("Content-Disposition",     "attachment;filename=support_doc.pdf");
    Response.OutputStream.Write(file, 0, file.Length);
}

Any help appreciated!


Solution

  • The downloaded file actually consists of two concatenated files, the actual PDF and a HTML file.

    The HTML file is nearly 70 KB in size, and in the absence of external JavaScript and images it looks like this:

    [--- image removed for privacy reasons ---]

    I assume that after your "download code" some other code adds this HTML to the output.

    You might want to search that code, or you might want to simply close the Response.OutputStream and finish the response right after Response.OutputStream.Write(file, 0, file.Length).


    According to the PDF specification a PDF processor has to start reading a PDF from its end where there are cross reference information, but in case of the file at hand there are nearly 70 KB of trash as far as PDF syntax is concerned.

    Thus, it is ok for any PDF viewer to reject the file as invalid.