Search code examples
sql-servervb.netssisssis-2012isql

SSIS Script Component - Iterate Through Columns with NULL Values


I am using the following .NET code to iterate through all input columns to create a HashByte value of the columns to then determine if the row has been updated.

The script works fine until a column has a NULL value. How can I code around this without using a TRY CATCH block?

Imports System.Text
Imports System.Reflection
Imports System.Security.Cryptography

Private sha As SHA256 = New SHA256CryptoServiceProvider() 'used To create the SHA256 hash

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    Try

        ' create hashbyte column to identify if row has changed

        Dim concat As StringBuilder = New StringBuilder("")

        ' loop over all the input columns of the buffer row 

        For Each col As PropertyInfo In Row.GetType().GetProperties()

            If Not col.Name.ToLower.EndsWith("isnull") AndAlso Not col.Name.ToLower.Equals("hash") Then
                MsgBox(col.Name.ToLower)
                If col.GetValue(Row, Nothing) Is Nothing Then
                    concat.Append("")
                Else
                    concat.Append(Convert.ToString(col.GetValue(Row).ToString()))
                End If

                concat.Append("|")
            End If

        Next

        Dim sHashBytes As String = concat.ToString().Remove(concat.ToString().Length - 1) ' remove the last character of the string
        Dim bHashBytes As Byte() = sha.ComputeHash(System.Text.UnicodeEncoding.Unicode.GetBytes(sHashBytes))
        Dim sb As StringBuilder = New StringBuilder("")

        For i As Integer = 0 To bHashBytes.Length - 1
            sb.Append(bHashBytes(i).ToString("X2"))
        Next

        Row.Hash = "0x" & sb.ToString()

    Catch ex As Exception
        MsgBox(ex.ToString)
    End Try
End Sub

Thanks, Greg


Solution

  • Well, assume that all input columns are of type String. I post a fragment of code which concatenates all string columns into a string (in C#) for omitted calculations.

    public override void ProcessInput(int InputID, PipelineBuffer Buffer)
    {
      bool fireAgain = true;
      string columnData = null;
    
      while (Buffer.NextRow())
      {
        for (int columnIndex = 0;
          columnIndex < Buffer.ColumnCount;
          columnIndex++)
        {
    
          if (!Buffer.IsNull(columnIndex))
          {
            BufferColumn columnInfo = Buffer.GetColumnInfo(columnIndex);
            switch (columnInfo.DataType)
            {
    
              // check column data type
              case DataType.DT_WSTR:
                columnData += Buffer.GetString(columnIndex);
                break;
    
              // add code to support more data types here
    
              default:
                columnData = "";
                break;
            }
          }
        }
      }
      // Do some calculations...
      base.ProcessInput(InputID, Buffer);
    }
    

    Reason why calling ProcessInput and not Input0_ProcessInputRow - in the latter you have all columns with column named properties, so cannot loop over them. This method is called per each buffer, so mode data (and buffers) you will have, the more calls you will receive.
    Calling base.ProcessInput(InputID, Buffer) is essential for correct buffer handling, marking it processed and being available to further data flow transformations.

    Inspired by Todd McDermid's Blog post on this.