Search code examples
c#sql-serverssisoledbvisual-studio-2019

Get an SSIS C# Script component to read from Excel via OleDb:


I'm trying to build a SSIS package to read from Excel. I've found an excellent example here that is very close to what I need.

My question here is: Starting with a blank solution, what are the bare minimum steps required to get a C# Script component to read from an .xlsx file?

Here is my code:

using System;
using System.Data;
using System.Data.OleDb;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{    
    public override void CreateNewOutputRows()
    {
        string fileName = @"E:\SFTP\RSS\Results.xlsx";    
        string cstr = "Provider.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\"";

        using (OleDbConnection xlConn = new OleDbConnection(cstr))
        {
            xlConn.Open();
            OleDbCommand xlCmd = xlConn.CreateCommand();
            xlCmd.CommandText = "Select top 10 * from Responses";
            xlCmd.CommandType = CommandType.Text;
            using (OleDbDataReader rdr = xlCmd.ExecuteReader())
            {
                while (rdr.Read())
                {
                    for (int i = 2; i < rdr.FieldCount; i++) //loop from 3 column to last
                    {
                        Output0Buffer.AddRow();
                        Output0Buffer.Question = rdr.GetName(i);
                        Output0Buffer.Response = rdr.ToString();
                    }

                }
            }
            xlConn.Close();
        }
    }

}

These are the kind of issues I'm getting & questions I have: Do I need to configure the connection manager at the solution level? On the script component or does the code contain everything I need?

What datatype/function/method do I use to read a cell as a number, a date or a string?

Here are some of the errors I'm getting:

(2,14): error CS0234: The type or namespace name 'Data' does not exist in the namespace 'System' (are you missing an assembly reference?)
(3,14): error CS0234: The type or namespace name 'Data' does not exist in the namespace 'System' (are you missing an assembly reference?)
(4,17): error CS0234: The type or namespace name 'SqlServer' does not exist in the namespace 'Microsoft' (are you missing an assembly reference?)
(5,17): error CS0234: The type or namespace name 'SqlServer' does not exist in the namespace 'Microsoft' (are you missing an assembly reference?)
(8,27): error CS0246: The type or namespace name 'UserComponent' could not be found (are you missing a using directive or an assembly reference?)
 + additional 10 errors

I'm using: VS2019, SQL Server 2016 SP2, Office 2016.

Here's a screen shot of my Excel Sheet:

Responses

It contains responses to surveys provided by a 3rd party call centre. There's a few columns to identify the call, And then each column header contains the question and the clumn data is the response. It currently has 189 Columns and will grow/change over time. My idea is to select the first 4 columns which reliably identify each row plus 1 additional column For each Q&A, and iterate through the sheet picking up each column header and contents so that can be inserted into a table. As more columns get added they'd just get appended as additional rows. The example code I pasted just shows 2 columns as I was trying to keep the example simple.

This is what I've tried. 1. Add a DataFlow task to Control Flow. ControlFlow

  1. Add Script Component to DataFlow.

When prompted select Source.

Double click and open the editor.

Select Connection Manager. Click Add

On new row select New connection,

Add new SSIS Connection Manager appears.

This is about where I get stuck. I've been picking OLEDB Connection.

  1. In the OLEDB Connection Manager what should I select?

I've tried Access and MS Oledb Simple Provider. And things start to unravel about here. Given that the filename, path and driver details are provided in the script what details do I need for the connection manager?

I started playing with the code, but I have no idea if what I have done upstream is working or is correct. So i'm trying to start from the beginning. I'm asking for help walking through the setup steps to get to the C# point.

Thanks to all for thier responses.

Cheers

Pete


Solution

  • There's nothing that needs to be added to the script component GUI except for the outputs (remember to get the data types correct.)

    Do not set up a connection manager, you are doing that in code because you do not want SSIS trying to figure out your column names and frankly, we don't know the column names after column 4.

    Here is your code. I only changed a few things.

    public override void CreateNewOutputRows()
        {
            //Change this to your filename you do not need a connection manager
            string fileName = @"E:\SFTP\RSS\Results.xlsx";  
            string SheetName = "Sheet1";  
            string cstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\"";
    
            using (System.Data.OleDb.OleDbConnection xlConn = new System.Data.OleDb.OleDbConnection(cstr))
            {
               xlConn.Open();
               System.Data.OleDb.OleDbCommand xlCmd = xlConn.CreateCommand();
               xlCmd.CommandText = "Select * from [" + SheetName + "$]"; //I assume this is the data you want
               xlCmd.CommandType = CommandType.Text;
               using (System.Data.OleDb.OleDbDataReader rdr = xlCmd.ExecuteReader())
               {
                  while (rdr.Read())
                  {
                     for (int i = 4; i < rdr.FieldCount; i++) //loop from 5th column to last
                     {
                        //The first 4 columns are static and added to every row
                        Output0Buffer.AddRow();
                        Output0Buffer.UniqueID = Int32.Parse(rdr[0].ToString());
                        Output0Buffer.Year = Int32.Parse(rdr[1].ToString());
                        Output0Buffer.ReportingWave = rdr.GetString(2);
                        Output0Buffer.SubmissionDate = rdr.GetString(3);
                        Output0Buffer.Question = rdr.GetName(i);
                        Output0Buffer.Answer = rdr.GetString(i);
                     }
    
                }
              }
              xlConn.Close();
           }
        }
    

    This code successfully imported a file that looks like this:

    enter image description here