Search code examples
sql-serverexcelssisetlxlsx

How can I transform data in xlsx file removing merge in cells and transposing some columns to ingest data in SQL Server using SSIS?


I have an As-Is spreadsheet data source with merge applied in some columns and for weekly data columns are used in incremental format, e.g. for the 2019W12 the next column will be populated (R column).

As-Is Spreadsheet Data Source

As-Is Spreadsheet Data Source

I need parse the spreadsheet content and load into SQL Server table using SSIS and proposed format is:

Proposed Spreadsheet Data Transformation

Proposed Spreadsheet Data Transformation

I've tried some alternatives such as apply transformation in SSIS, but column increment case exception in next week load job, I tried to parse and split spreadsheet data with Python (xlrd) but without success to transpose and associate data from columns F to 'N' with columns from A to E. Does anybody faced this type of problem to ingest spreadsheet data using SSIS into SQL Server or have another logical way to transform data before ingestion?


Solution

  • Splitting Question into sub tasks

    Based on your question there are three main functionalities that you are looking to achieve:

    1. Finding an efficient way to manipulate Excel files
    2. Unmerge Cells and fill duplicates values
    3. Transpose Rows into Columns

    Possible solution

    In order to perform a complex transformation you have to do this using one of the following approaches because provides all functionalities that can be done in Microsoft Excel:

    1. .Net Microsoft.Office.Interop.Excel library (C# or VB.NET)
    2. Excel VBA

    The solution you are looking for is complex and very specific to the issue, you have to implement the logic at your own. I will provide some links that can helps you to achieve that:

    Helpful Links

    Unmerge Cells and fill duplicates values

    Manipulating Excel files using C#

    Transpose Excel Rows