Search code examples
c#excelssisoledb

OleDbConnection only finds cell value when workbook is also open in Excel


I have a program (actually SSIS script task, but I don't suppose that matters) that creates an OLE DB connection to an Excel workbook, and reads the cell values in each worksheet, storing them in a SQL Server table.

Each worksheet has several sections of rows, each section being for a separate product. The first two rows of each product section are a quarter row, and a year row. Here is a screen shot:

enter image description here

I use an OleDbDataReader with a "Select *" command to read the data in each sheet into a DataTable. I have a column called "YearQuarter" in my SQL database, where I store a concatenation of the year row value and the preceding quarter row value, with a hyphen between the two strings:

enter image description here

My code is like this:

  OleDbConnection oleExcelConnection = new OleDbConnection(
        "Provider=Microsoft.ACE.OLEDB.12.0;" +
        "Data Source=" + strWkbkFilePath + ";" +
        "Mode=Read;" +
        "Extended Properties=\"Excel 8.0;HDR=No;IMEX=1\"");

   oleExcelConnection.Open();

   DataTable dtCurrSheet = new DataTable();

   // Name of table is in strLoadTblNm.

    OleDbCommand oleExcelCommand;
    OleDbDataReader oleExcelReader;

    oleExcelCommand = excel_conn.CreateCommand();
    oleExcelCommand.CommandText = "Select * From [" + strLoadTblNm + "]";
    oleExcelCommand.CommandType = CommandType.Text;
    oleExcelReader = oleExcelCommand.ExecuteReader();

    // Load worksheet into data table
    dtSheet.Load(oleExcelReader);

    oleExcelReader.Close();

Looking at the output data, I noticed that I was getting inconsistent results. Some rows would have a YearQuarter column value that would have only the Year row value in them, while others would have the cell values from both rows. For example, I'd have "2009 - Year End" followed by just "2010", with no " - 1st Qtr." appended to it.

This is because that quarter cell valued is never loaded into the data reader, as the Dataset Visualizer shows:

enter image description here

Notice also that, in the Dataset, the column that is missing the Quarter cell value also has other numeric values missing their formatting (no commas).

If I save the file as a .csv, all cell values are preserved.

However, I noticed that it wasn't consistent. Sometimes I'd run my package and the same row would now have the full value. So, in the above example, I'd get "2010 - 1st Qtr."

I finally realized that it was working as expected only if I happened to have the workbook open in Excel at the same time that the program was running!

Why would this make a difference? Could it be that there is a macro or something in the workbook that is executed by Excel, but not when the workbook is accessed only via an OLE DB connection? Would the fact that it had been executed in Excel then affect the data obtained by OLE DB? If that's the case, how do I get around this? The spreadsheets are provided to me. So I can't modify them.


Solution

  • I think you're having issues with the auto-formatting thing Excel tries to apply. With an OLEDB connection, I can't see how having the sheet open fixes your problem (obviously very strange).

    Try Adding IMEX = 1 to your connection options to treat the entire sheet as text to see if this is your issue. Pulled from OLEDB connection does not read data from excel sheet Also another good post from an external site: Tips for reading Excel spreadsheets using ADO.NET

    Also, you're pulling data from an excel sheet and writing it to another excel sheet... Same workbook? I have a couple more ideas for ya though depending on your situation.