Search code examples
c#asp.netcsvimportoledb

Unable to detect bad data for bulk import from CSV using Microsoft.ACE.OLEDB.12


For instance, if a date column includes an added letter, it gets treated as a null and I don't get any warning.

I exhausted all of Microsoft's documentation and there is no indication this behavior can be changed. Only found one article in all of google that delt with this and it said can't be changed.

The schema.ini is created through code, but this is what it looks like.

[NewEmployees.csv]
ColNameHeader=True
Format=CSVDelimited
DateTimeFormat=dd-MMM-yy
Col1=FirstName Text
Col2=LastName Text
Col3="Hire Date" Date

Below is the most relevant lines of code

string strSql = "SELECT * FROM [" + FileUpload1.FileName + "]";
                string strCSVConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + targetFolder + ";" + "Extended Properties='text;HDR=YES;'";
                OleDbDataAdapter oleda = new OleDbDataAdapter(strSql, strCSVConnString);
                DataTable importData = new DataTable();
                oleda.Fill(importData);

                GridView1.DataSource = importData;
                GridView1.DataBind();

If anyone wants the entire ASP.Net code than it is shown below. It will allow a user to select a file on their computer, create a folder whose name is based on the current date and time, create a schema.ini and save it to the folder, save the uploaded csv file to the folder, than query the csv file and bind it to a gridview. It's good code but useless if it can't detect bad data.

Code Behind

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

using System.IO;
using System.Data;
using System.Data.OleDb;

using System.Data.SqlClient;
using System.Data;

namespace WebApplication1
{
    public partial class EmployeeImport : System.Web.UI.Page
    {
        public string GetDateTimeStampedFolderName()
        {
            return string.Format("{0:yyyy-MM-dd_hh-mm-ss-tt}", DateTime.Now);
        }

        public void CreateSchemIni(string targetFolder, string fileName)
        {
            using (FileStream filestr = new FileStream(targetFolder + "/schema.ini", FileMode.Create, FileAccess.Write))
            {
                using (StreamWriter writer = new StreamWriter(filestr))
                {
                    writer.WriteLine("[" + FileUpload1.FileName + "]");
                    writer.WriteLine("ColNameHeader=True");
                    writer.WriteLine("Format=CSVDelimited");
                    writer.WriteLine("DateTimeFormat=dd-MMM-yy");
                    writer.WriteLine("Col1=FirstName Text");
                    writer.WriteLine("Col2=LastName Text");
                    writer.WriteLine("Col3=\"Hire Date\" Date");
                    writer.Close();
                    writer.Dispose();
                }
                filestr.Close();
                filestr.Dispose();
            }
        }

        private void UploadAndImport()
        {
            if (FileUpload1.HasFile)
            {
                string targetFolder = Server.MapPath("~/Uploads/Employees/" + GetDateTimeStampedFolderName());

                if (System.IO.Directory.Exists(targetFolder) == false)
                {
                    System.IO.Directory.CreateDirectory(targetFolder);
                }

                FileUpload1.SaveAs(Path.Combine(targetFolder, FileUpload1.FileName));

                CreateSchemIni(targetFolder, FileUpload1.FileName);

                string strSql = "SELECT * FROM [" + FileUpload1.FileName + "]";
                string strCSVConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + targetFolder + ";" + "Extended Properties='text;HDR=YES;'";
                OleDbDataAdapter oleda = new OleDbDataAdapter(strSql, strCSVConnString);
                DataTable importData = new DataTable();
                oleda.Fill(importData);

                GridView1.DataSource = importData;
                GridView1.DataBind();
            }
        }

        protected void UploadButton_Click(object sender, EventArgs e)
        {
            if (FileUpload1.HasFile)
            {
                UploadAndImport();
            }
        }
    }
}

ASPX

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="EmployeeImport.aspx.cs" Inherits="WebApplication1.EmployeeImport" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>

         <asp:FileUpload ID="FileUpload1" runat="server" />

        <br />
        <asp:Button ID="UploadButton" runat="server" Text="Upload" 
            onclick="UploadButton_Click" />
        <asp:GridView ID="GridView1" runat="server">
        </asp:GridView>

    </div>
    </form>
</body>
</html>

Solution

  • There are 2 significant silent but deadly problems that can arise from using the Microsoft.ACE.OLEDB.12.0 along with schema.ini to import data. I am posting a solution to both. Although one of them only works with SQL Server, a similar solution may work for other databases.

    1. Bad data such as a date with a letter such as "5/20/2016a" will be converted to null and it does not throw an exception or warning when that happens. It will merrily go about its way and corrupt your data.
    2. Specifying column type in the schema.ini is done by it's ordinal position and will completely ignore the headers in the CSV. If columns are out of sequence in the CSV you will not get an exception or warning. And your data will be corrupted.

    For example, if the schema.ini contains:

    Col1=FirstName Text
    Col2=LastName Text
    Col3="Hire Date" Date
    

    And the CSV has FirstName,LastName in the wrong order:

    LastName,FirstName,HireDate
    Smith,Jon,5/1/2016
    Moore,Larry,5/15/2016
    

    The ACE driver is not smart enough to recognize that the headers are out of order and the data will be imported incorrectly.

    Solution to problem 1 - Bad Data

    The solution I came up with is to use the schema.ini to specify all columns as text fields and use the System.Data.SqlClient.SqlBulkCopy to import the data to SQL Server. When SQLBulkCopy finds bad data, it is smart enough to throw an exception and block the import of the entire CSV, even if only the last record is bad.

    Solution to problem 2 - CSV columns are out of order, or contain missing/extra columns

    To solve this problem I created 2 DataTables, one is filled with schema and no data. The one that is filled with only schema must be done prior to creating the schema.ini, as once the schema.ini is created, the headers in the CSV will be ignored.

    DataTable importData = new DataTable();
    DataTable importDataSourceSchema = new DataTable();
    
    // Fill the schema prior to creating the schema.ini, as this is the only way to get the headers from the CSV
    oleda.FillSchema(importDataSourceSchema, System.Data.SchemaType.Source);
    CreateSchemIni(targetFolder, FileUpload1.FileName);
    oleda.Fill(importData);
    

    I then created a function that validates the headers in the CSV are in the correct order and that the CSV contains the correct number of columns:

    private bool ValidateHeaders(DataTable importData, DataTable importDataSourceSchema)
    {
        bool isValid = true;
    
        if (importData.Columns.Count != importDataSourceSchema.Columns.Count)
        {
            isValid = false;
            ValidationLabel.Text = ValidationLabel.Text + "<br />Wrong number of columns";
        }
    
        for (int i = 0; i < importData.Columns.Count; i++)
        {
            if (importData.Columns[i].ColumnName != importDataSourceSchema.Columns[i].ColumnName)
            {
                ValidationLabel.Text = ValidationLabel.Text + "<br />Error finding column " + importData.Columns[i].ColumnName;
                isValid = false;
            }
        }
        return isValid;
    }
    

    I then call ValidateHeaders prior to performing a the bulk import

    if (ValidateHeaders(importData, importDataSourceSchema))
    {
        using (SqlBulkCopy bulkCopy = new SqlBulkCopy([Add your ConnectionString here]))
        {
            bulkCopy.DestinationTableName = "dbo.EmployeeImport";
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("FirstName", "FirstName"));
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("LastName", "LastName"));
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Hire Date", "HireDate"));
            try
            {
                bulkCopy.WriteToServer(importData);
                ValidationLabel.Text = "Success";
                GridView1.DataSource = importData;
                GridView1.DataBind();
            }
            catch (Exception e)
            {
                ValidationLabel.Text = e.Message;
            }
        }
    }
    

    Below is full proof of concept code written for ASP.NET WebForms

    ASPX

    <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="EmployeeImport.aspx.cs" Inherits="WebApplication1.EmployeeImport" %>
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head runat="server">
        <title></title>
    </head>
    <body>
        <form id="form1" runat="server">
        <div>
    
             <asp:FileUpload ID="FileUpload1" runat="server" />
    
            <br />
            <asp:Button ID="UploadButton" runat="server" Text="Upload" 
                onclick="UploadButton_Click" />
    
            <br />
            Data Imported: <asp:Label ID="ValidationLabel" runat="server" ForeColor="Red"></asp:Label>
            <asp:GridView ID="GridView1" runat="server">
            </asp:GridView>
    
        </div>
        </form>
    </body>
    </html>
    

    Code Behind

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Web;
    using System.Web.UI;
    using System.Web.UI.WebControls;
    
    using System.IO;
    using System.Data;
    using System.Data.OleDb;
    
    using System.Data.SqlClient;
    using System.Data;
    
    namespace WebApplication1
    {
        public partial class EmployeeImport : System.Web.UI.Page
        {
            public string GetDateTimeStampedFolderName()
            {
                return string.Format("{0:yyyy-MM-dd_hh-mm-ss-tt}", DateTime.Now);
            }
    
            public void CreateSchemIni(string targetFolder, string fileName)
            {
                using (FileStream filestr = new FileStream(targetFolder + "/schema.ini", FileMode.Create, FileAccess.Write))
                {
                    using (StreamWriter writer = new StreamWriter(filestr))
                    {
                        writer.WriteLine("[" + FileUpload1.FileName + "]");
                        writer.WriteLine("ColNameHeader=True");
                        writer.WriteLine("Format=CSVDelimited");
                        writer.WriteLine("Col1=FirstName Text");
                        writer.WriteLine("Col2=LastName Text");
                        writer.WriteLine("Col3=\"Hire Date\" Text");
                        writer.Close();
                        writer.Dispose();
                    }
                    filestr.Close();
                    filestr.Dispose();
                }
            }
    
            private bool ValidateHeaders(DataTable importData, DataTable importDataSourceSchema)
            {
    
                bool isValid = true;
    
                if (importData.Columns.Count != importDataSourceSchema.Columns.Count)
                {
                    isValid = false;
                    ValidationLabel.Text = ValidationLabel.Text + "<br />Wrong number of columns";
                }
    
                for (int i = 0; i < importData.Columns.Count; i++)
                {
                    if (importData.Columns[i].ColumnName != importDataSourceSchema.Columns[i].ColumnName)
                    {
                        ValidationLabel.Text = ValidationLabel.Text + "<br />Error finding column " + importData.Columns[i].ColumnName;
                        isValid = false;
                    }
                }
    
                return isValid;
            }
    
            private void UploadAndImport()
            {
                if (FileUpload1.HasFile)
                {
                    string targetFolder = Server.MapPath("~/Uploads/Employees/" + GetDateTimeStampedFolderName());
    
                    if (System.IO.Directory.Exists(targetFolder) == false)
                    {
                        System.IO.Directory.CreateDirectory(targetFolder);
                    }
    
                    FileUpload1.SaveAs(Path.Combine(targetFolder, FileUpload1.FileName));
    
    
    
                    string strSql = "SELECT * FROM [" + FileUpload1.FileName + "]";
                    string strCSVConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + targetFolder + ";" + "Extended Properties='text;HDR=YES;'";
                    using (OleDbDataAdapter oleda = new OleDbDataAdapter(strSql, strCSVConnString))
                    {
                        DataTable importData = new DataTable();
                        DataTable importDataSourceSchema = new DataTable();
    
                        // Fill the schema prior to creating the schema.ini, as this is the only way to get the headers from the CSV
                        oleda.FillSchema(importDataSourceSchema, System.Data.SchemaType.Source);
                        CreateSchemIni(targetFolder, FileUpload1.FileName);
                        oleda.Fill(importData);
    
                        if (ValidateHeaders(importData, importDataSourceSchema))
                        {
                            using (SqlBulkCopy bulkCopy = new SqlBulkCopy([Add your ConnectionString here], SqlBulkCopyOptions.TableLock))
                            {
                                bulkCopy.DestinationTableName = "dbo.EmployeeImport";
                                bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("FirstName", "FirstName"));
                                bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("LastName", "LastName"));
                                bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Hire Date", "HireDate"));
                                try
                                {
                                    bulkCopy.WriteToServer(importData);
                                    ValidationLabel.Text = "Success";
                                    GridView1.DataSource = importData;
                                    GridView1.DataBind();
                                }
                                catch (Exception e)
                                {
                                    ValidationLabel.Text = e.Message;
                                }
                            }
    
    
                        }
                    }
                }
            }
    
            protected void UploadButton_Click(object sender, EventArgs e)
            {
                if (FileUpload1.HasFile)
                {
                    ValidationLabel.Text = "";
                    UploadAndImport();
                }
            }
        }
    }