Search code examples
c#asp.netcsvoledb

OleDB, Misses the first character of data


I have a CSV Reading code for ASP.NET application I maintain. This ASP.NET website is running fine from 3 yrs now, and CSV reading code that use Ole.JetDB.4.0 is doing its work fine, except that once in a while some CSV with more than 4K-5K records create a problem. Usually the problem is that a record at random position [random row] miss the first character of it.

CSV File is just bunch of name and addresses per row, and they are in ASNI Format. CSV is comma seperate, no data have "comma" in data and now enclosing of field in Single or Double quote. Also, it doesn't happen often, We use the same code for say 70K record upload they works fine, but some time say in 3 yrs about 3-4 files have this problem only, we upload about one file daily.

For those who need what I did

using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection
    ("Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties='text;HDR=Yes;FMT=Delimited';Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
{
   string sql_select = "select * from [" + this.FileName + "]";
   System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
   da.SelectCommand = new System.Data.OleDb.OleDbCommand(sql_select, conn);
   DataSet ds = new DataSet();
   // Read the First line of File to know the header
   string[] lines = System.IO.File.ReadAllLines(HttpContext.Current.Server.MapPath("/System/SaleList/") + FileName);
   string header = "";
   if (lines.Length > 0)
      header = lines[0];

   string[] headers = header.Split(',');
   CreateSchema(headers, FileName);
   da.Fill(ds, "ListData");
   DataTable dt = ds.Tables["ListData"];
}

And this code is working fine except the mention thing. I cut some unrelated part so, might not work by copy paste.

EDIT: More information

  1. I try to use ODBC with Microsoft Text Driver, then I use ACE Driver with OleDB. the result is same with all three drive.

  2. If I swap the problem record, with the preceding Row those rows are read quite well, until the next problem row [if more than one row is having problem in original file], if those are only problem row it works fine.

So from above it looks like that something is there that distract character counter, but how I can ensure it working smooth is still a quiz.

EDIT 2: I have submitted it as bug to Microsoft here : https://connect.microsoft.com/VisualStudio/feedback/details/811869/oledb-ace-driver-12-jet-4-0-or-odbc-text-driver-all-fail-to-read-data-properly-from-csv-text-file


Solution

  • I would suggest you examine a problem file with a hex editor - inspect the line that causes the problem and the line immediately preceding it.

    In particular look at the line terminators (CR/LF? CR only? LF only?) and look for any non-printable characters.