Search code examples
sql-serverperformanceblobdenormalizationvarbinary

SQL Server performance: 50 columns vs single binary/varbinary


Is it possible to improve SQL Server 2008 R2 (and newer) insert performance by replacing (say) 50 float columns with a single binary(n) (n being 50 x 4)?

I would presume that using a fixed size binary(n) should improve performance (amount of data is the same, with less work needed to handle all the columns and shorter SQL queries), but many sites recommend against using binary columns, so I would like to see if there are really issues with using this?

Also, the issue is that the table is rather denormalized, and not all columns are filled with values usually, so varbinary(n) would allow me to reduce the row size in many cases. Sometimes only a single column is filled, but ~10 on average.

And then the third question is, how about going a step further and replacing (say) 5 rows x 50 float32 columns with a single varbinary(5*50*4)?

So it would be cool to get some insights into:

  1. Replacing 1 row of 50 float columns with single binary(200);
  2. Replacing 1 row of 50 x float with single varbinary(204) (several bytes for flags/length info) - to save space when columns are unused;
  3. Replacing 5 rows of 50 x float with single varbinary(1024) (several bytes for flags/length info).

Entire row is always read at once in all cases.

(Update)

To clarify, the data being stored is:

 Timestamp_rounded    Value_0ms  Value_20ms  Value_40ms ... Value_980ms
 2016-01-10 10:00:00    10.0       11.1        10.5     ...    10.5

I am always reading the entire row, the primary clustered key is the first column (Timestamp), and I will never have to query the table by any of the other columns.

Normalized data would obviously have a Timestamp/Value pair, where Timestamp would then have millisecond precision. But then I would have to store 50 rows of two columns, instead of 1 row (Timestamp + BLOB).


Solution

  • As an experiment I tried out the two different methods to compare them.

    I found that after some tuning the binary version was about 3X faster than the 50 col version.

    This scenario is very specific and my test only tested something very specific. Any deviation from my test-setup will have an impact on the result.

    How the test was made

    For the 50 col version I had 50 nullable float columns which I populated all with float.MaxValue.

    For the binary version I had a single column. The value for the column was constructed from a string of 50x float.MaxValue + "|", all concatenated into a single long string. The string was then converted to byte[] to be stored in the table.

    Both tables were heaps with no indexes or constraints.

    My test code can be found here https://github.com/PeterHenell/binaryBulkInsertComparison

    I ran the tests on SQL Server 2014 Developer Edition on a 6 Core workstation with SSD drives.