Search code examples
sql-serverexcelssisetlsql-server-data-tools

load multiple data from excel to sql SSIS


I am working with SSIS and I need to load multiple files with the following (Yellos) format to SQL using SSIS

enter image description here

The problem as you can see is that the files has an horrible format only process / consume records if the column A is populated (e.g: ignoring rows# 14 - X ) and I need to insert the value in D1 into the Date column.

any suggestion?

Regards!


Solution

  • Lets divide this problem into 3 Sub problems:

    1. Get the date value from D1
    2. Start Reading from Row number 4
    3. Ignore all Rows where Column1 is NULL

    Solution

    1. Get the date value from D1

    1. Create 2 SSIS variables, @[User::FilePath] (of type string) that contains the excel file path, @[User::FileDate] (of type string) that we will use it to store the date value
    2. Add a script Task, choose the script language as Visual Basic
    3. Select @[User::FilePath] as a ReadOnly variable and @[User::FileDate] as a ReadWrite variable
    4. Open the Script Editor and use the following code to retrieve the Date Value and store it into @[User::FileDate]

    This will search for the sheet named Refunds and extract the date value from it and store this value into @[User::FileDate]

        m_strExcelPath = Dts.Variables.Item("FilePath").Value.ToString
    
        Dim strSheetname As String = String.Empty
        Dim strDate as String = String.Empty
    
        m_strExcelConnectionString = Me.BuildConnectionString()
    
        Try
    
    
            Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
    
                If OleDBCon.State <> ConnectionState.Open Then
                    OleDBCon.Open()
                End If
    
                'Get all WorkSheets
                m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
                                                                   New Object() {Nothing, Nothing, Nothing, "TABLE"})
    
                'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
    
                For Each schRow As DataRow In m_dtschemaTable.Rows
                    strSheetname = schRow("TABLE_NAME").ToString
    
                    If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
    
                    If Not strSheetname.Tolower.Contains("refunds") Then Continue For
    
                        Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:D1]", OleDBCon)
    
                            Dim dtTable As New DataTable("Table1")
    
    
                            cmd.CommandType = CommandType.Text
    
                            Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
    
                                daGetDataFromSheet.Fill(dtTable)
    
                                'Get Value from column 4 (3 because it is a zero-based index
                                strDate = dtTable.Rows(0).Item(3).ToString
    
                            End Using
    
                        End Using
    
                        'when the first correct sheet is found there is no need to check others
                        Exit For
    
                    End If
                Next
    
                OleDBCon.Close()
    
            End Using
    
        Catch ex As Exception
            Throw New Exception(ex.Message, ex)
        End Tr
    
        Dts.Variables.Item("FileDate").Value = strDate
    
        Dts.TaskResult = ScriptResults.Success
    End Sub
    
    1. In the DataFlow Task add a Derived Column Transformation, add a derived column with the following expression

      @[User::FileDate]
      

    2. Start Reading from Row Number 4

    As we assumed that the Excel File Path is stored in @[User::FilePath]

    1. First open the Excel Connection Manager and uncheck the box First row has column names
    2. In the DataFlow Task, double click on the excel source
    3. Set the source to SQL Command
    4. Use the following command: SELECT * FROM [Refunds$A4:D] , so it will start reading from the row number 4
    5. Columns names will be as the following F1 ... F4 , in the excel source you can go to the Columns Tab and give alias to the columns names, so in the data flow task they will be showed with their aliases

    3. Ignore all Rows Where Column1 is NULL

    1. Add a conditional split after the Excel Source
    2. Split the Flow based on the following expression

      ISNULL([F1]) == False
      

    If you didn't give an alias to F1 otherwise use the alias

    Finally, remember that you must add a derived column (as we said in the first sub-problem) that contains the date value