Search code examples
.netdatabasems-access

Extracting files from an Attachment field in an Access database


We are working on a project where we need to migrate data stored in an Access database to a cache database. The Access database contains columns with a data type of Attachment; some of the tuples contain multiple attachments. I am able to obtain the filenames of these files by using .FileName, but I'm unsure how to determine when one file ends and another starts in .FileData.

I am using the following to obtain this data:

System.Data.OleDb.OleDbCommand command= new System.Data.OleDb.OleDbCommand();
command.CommandText = "select [Sheet1].[pdf].FileData,* from [Sheet1]";
command.Connection = conn;
System.Data.OleDb.OleDbDataReader rdr = command.ExecuteReader();

Solution

  • (My original answer to this question was misleading. It worked okay for PDF files that were subsequently opened with Adobe Reader, but it did not always work properly for other types of files. The following is the corrected version.)

    Unfortunately we cannot directly retrieve the contents of a file in an Access Attachment field using OleDb. The Access Database Engine prepends some metadata to the binary contents of the file, and that metadata is included if we retrieve the .FileData via OleDb.

    To illustrate, a document named "Document1.pdf" is saved to an Attachment field using the Access UI. The beginning of that PDF file looks like this:

    Original.png

    If we use the following code to try and extract the PDF file to disk

    using (OleDbCommand cmd = new OleDbCommand())
    {
        cmd.Connection = con;
        cmd.CommandText = 
                "SELECT Attachments.FileData " +
                "FROM AttachTest " +
                "WHERE Attachments.FileName='Document1.pdf'";
        using (OleDbDataReader rdr = cmd.ExecuteReader())
        {
            rdr.Read();
            byte[] fileData = (byte[])rdr[0];
            using (var fs = new FileStream(
                    @"C:\Users\Gord\Desktop\FromFileData.pdf", 
                    FileMode.Create, FileAccess.Write))
            {
                fs.Write(fileData, 0, fileData.Length);
                fs.Close();
            }
        }
    }
    

    then the resulting file will include the metadata at the beginning of the file (20 bytes in this case)

    FromFileData.png

    Adobe Reader is able to open this file because it is robust enough to ignore any "junk" that may appear in the file before the '%PDF-1.4' signature. Unfortunately not all file formats and applications are so forgiving of extraneous bytes at the beginning of the file.

    The only Official™ way of extracting files from an Attachment field in Access is to use the .SaveToFile method of an ACE DAO Field2 object, like so:

    // required COM reference: Microsoft Office 14.0 Access Database Engine Object Library
    //
    // using Microsoft.Office.Interop.Access.Dao; ...
    var dbe = new DBEngine();
    Database db = dbe.OpenDatabase(@"C:\Users\Public\Database1.accdb");
    Recordset rstMain = db.OpenRecordset(
            "SELECT Attachments FROM AttachTest WHERE ID=1",
            RecordsetTypeEnum.dbOpenSnapshot);
    Recordset2 rstAttach = rstMain.Fields["Attachments"].Value;
    while ((!"Document1.pdf".Equals(rstAttach.Fields["FileName"].Value)) && (!rstAttach.EOF))
    {
        rstAttach.MoveNext();
    }
    if (rstAttach.EOF)
    {
        Console.WriteLine("Not found.");
    }
    else
    {
        Field2 fld = (Field2)rstAttach.Fields["FileData"];
        fld.SaveToFile(@"C:\Users\Gord\Desktop\FromSaveToFile.pdf");
    }
    db.Close();
    

    Note that if you try to use the .Value of the Field2 object you will still get the metadata at the beginning of the byte sequence; the .SaveToFile process is what strips it out.