Search code examples
csvssisintegerflat-filessis-data-types

Flat file destination columns data types validation


A source database field of type INT is read through an OLE DB Source. It is eventually written to a Flat File Destination. The destination Flat File Connection Manager > Advanced page reports it as a four-byte signed integer [DT_I4].

This data type made me think it indicated binary. Clearly, it does not. I was surprised that it was not the more generic numeric [DT_NUMERIC].

I changed this type setting to single-byte signed integer [DT_I1]. I expected this to fail, but it did not. The process produced the same result, even though the value of the field was always > 127. Why did this not fail?

Some of the values that are produced are

1679576722
1588667638
1588667638
1497758544
1306849450
1215930367
1215930367
1023011178
1932102084 

Clearly, outside the range of a single-byte signed integer [DT_I1].

As a related question, is it possible to output binary data to a flat file? If so, what settings and where should be used?


Solution

  • After re-reading the question to make sure it matched my proof-edits, I realized that it doesn't appear that I answered your question - sorry about that. I have left the first answer in case it is helpful.

    SSIS does not appear to enforce destination metadata; however, it will enforce source metadata. I created a test file with ranges -127 to 400. I tested this with the following scenarios:

    • Test 1: Source and destination flat file connection managers with signed 1 byte data type.
    • Result 1: Failed
    • Test 2: Source is 4 byte signed and destination is 1 byte signed.
    • Result 2: Pass

    SSIS's pipeline metadata validation only cares about the metadata of the input matching the width of the pipeline. It appears to not care what the output is. Though, it offers you the ability to set the destination to whatever the downstream source is so that it can check and provide a warning if the destination's (i.e., SQL Server) metadata matches or not.

    This was an unexpected result - I expected it to fail as you did. Intuitively, the fact that it did not fail still makes sense. Since we are writing to a CSV file, then there is no way to control what the required metadata is. But, if we hook this to a SQL Server destination and the metadata doesn't match, then SQL Server will frown upon the out of bounds data (see my other answer).

    Now, I would still set the metadata of the output to match what it is in the pipeline as this has important considerations with distinguishing string versus numeric data types. So, if you try to set a datetime as integer then there will be no text qualifier, which may cause an error on the next input process. Conversely, you could have the same problem of setting an integer to a varchar and having, which means it would get a text qualifier.

    I think the fact that destination metadata is not enforced is a bit of a weak link in SSIS. But, it can be negated by just setting it to match the pipeline buffer, which is done automatically assuming it is the last task that is dropped to the design. With that being said, if you update the metadata on the pipeline after development is complete then you are in for a real treat with getting the metadata updated throughout the entire pipeline because some tasks have to be opened and closed while others have to be deleted and re-created in order to update the metadata.

    Additional Information

    TL DR: TinyInt is stored as an unsigned data type in SQL Server, which means it supports values between 0 and 255. So a value greater than 127 is acceptable - up to 255. Anything over will result in an error.

    The byte size indicates the maximum number of possible combinations where the signed/unsigned indicates whether or not the range is split between positive and negative values.

    • 1 byte = TinyInt in SQL Server
    • 1 byte is 8 bits = 256 combinations
    • Signed Range: -128 to 127
    • Unsigned Range: 0 to 255

    It is important to note that SQL Server does not support signing the data types directly. What I mean here is that there is no way to set the integer data types (i.e., TinyInt, Int, and BigInt) as signed or unsigned.

    • TinyInt it is unsigned
    • Int and BigInt are signed

    See reference below: Max Size of SQL Server Auto-Identity Field

    If we attempt to set a TinyInt to any value that is outside of the Unsigned Range (e.g., -1 or 256), then we get the following error message:

    TinyInt Error Message

    This is why you were able to set a value greater than 127.

    Int Error Message:

    Int Error Message

    BigInt Error Message:

    BigInt Error Message

    With respect to Identity columns, if we declare an Identity column as Int (i.e., 32 bit ~= 4.3 billion combinations) and set the seed to 0 with an increment of 1, then SQL Server will only go to 2,147,483,647 rows before it stops, which is the maximum signed value. But, we are short by half the range. If we set the seed to -2,147,483,648 (don't forget to include 0 in the range) then SQL Server will increment through the full range of combinations before stopping.

    References:

    SSIS Data Types and Limitations

    Max Size of SQL Server Auto-Identity Field