Search code examples
c#sql-serverssisssis-2012

Error adding duplicate key to Dictionary in script task


The script has been running well for ages, but has suddenly started falling over stating

Error: 0x0 at (SCR) GetLineageIDs, ProcessDataFlowTask error:: An item with the same key has already been added.
   at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at ST_b90e02c5aa5e4a7a992a1a75c6255cfa.ScriptMain.ProcessDataFlowTask(TaskHost currentDataFlowTask)

The objective is to get lineageIDs mapped to the actual column name in my SSIS package (2012,so no 2016 lineage functionality).

I get that I am trying to add a key that has already been added to my dictionary in the following script, I'm not sure how and why it has suddenly started to error, full script below. I think I need some sort of if block in my ProcessDataFlowTask method, any pointers would be gratefully received and an explanation as to why the duplicate key error is suddenly appearing?

namespace ST_b90e02c5aa5e4a7a992a1a75c6255cfa
{
    /// <summary>
    /// ScriptMain is the entry point class of the script.  Do not change the name, attributes,
    /// or parent of this class.
    /// </summary>
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {


        Dictionary<int, string> lineageId = null;

        public void Main()
        {

            try
            {
                // Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
                // Why the executables?  Well, SSIS won't let us store a reference to the Package itself...
                Dts.Variables["User::execsObj"].Value = ((Package)Dts.Variables["User::execsObj"].Parent).Executables;
                Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
                lineageId = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;

                Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;

                ReadExecutables(execs);

                Dts.TaskResult = (int)ScriptResults.Success;

            }
            catch (Exception ex)
            {
                //An error occurred.  
                Dts.Events.FireError(0, "SSIS variable read error:", ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }  

        }

        private void ReadExecutables(Executables executables)
        {

            try
            {

                foreach (Executable pkgExecutable in executables)
                {
                    if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
                    {
                        TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
                        if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
                        {
                            ProcessDataFlowTask(pkgExecTaskHost);
                        }
                    }
                    else if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
                    {
                        // Recurse into FELCs
                        ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
                    }
                }

            }
            catch (Exception ex)
            {
                //An error occurred.  
                Dts.Events.FireError(0, "ReadExecutables error:", ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }  

        }

        private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
        {

            try
            {

                MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;
                foreach (IDTSComponentMetaData100 currentComponent in currentDataFlow.ComponentMetaDataCollection)
                {
                    // Get the inputs in the component.
                    foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
                        foreach (IDTSInputColumn100 currentInputColumn in currentInput.InputColumnCollection)
                           lineageId.Add(currentInputColumn.ID, currentInputColumn.Name);


                    // Get the outputs in the component.
                    foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
                        foreach (IDTSOutputColumn100 currentoutputColumn in currentOutput.OutputColumnCollection)
                            lineageId.Add(currentoutputColumn.ID, currentoutputColumn.Name);

                }

            }
            catch (Exception ex)
            {
                //An error occurred.  
                Dts.Events.FireError(0, "ProcessDataFlowTask error:", ex.Message + "\r" + ex.StackTrace, String.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }  


        }
        #region ScriptResults declaration
        /// <summary>
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// </summary>
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

    }


}

Solution

  • The 'duplicate key' error occurs if you have a duplicate column name in any of the data flows in the package. I had this problem before and I solved it by adding the following 2 steps.

    Step 1
    Make the following changes to the ProcessDataFlowTask method at both instances of lineageId.Add(...)

    Remember to use currentOutputColumn.Name instead of currentInputColumn.Name at the second instance

    Change from

    lineageId.Add(currentInputColumn.ID, currentInputColumn.Name);
    

    To

    strNewID = currentDataFlowTask.Name + "_" + currentInputColumn.ID.ToString();
    lineageIDs.Add(strNewID, currentInputColumn.Name);
    

    Basically giving a unique name for the Column Name by adding the Data Flow Name plus the '_' (Underscore) as a prefix.


    Step 2
    Since we have modified the Column names while adding to the LineageIDs collection, we need to use the modified column name, (i.e. Prefix Data Flow Name plus the '_' (Underscore)) when comparing column names in Input0_ProcessInputRow method in another script task, which is required and you have not copied above.

        string newColNum = "DataFlowTaskName_" + Row.ErrorColumn.Value.ToString();
        if (lineageIDs.ContainsKey(newColNum))
            Row.ErrorColumnName = lineageIDs[newColNum];
    


    Note: In above code DataFlowTaskName_ is hard coded value and needs to be replaced with the Data Flow Task name in which your second script task exists, since it is not available in the Input0_ProcessInputRow method we need to hard code it.

    This is just one way of doing it, May be you can find another way of handling the duplicates, this is how I did and it worked.


    Hope this helps.