Search code examples
c#sql.netsql-serverclr

SQL CLR Function to decompress GZip data


I have two CLR functions that I use to compress/decompress NVARCHAR data.

[SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]
public static SqlBytes ufn_GZipCompress(SqlString input) {
    if (input.IsNull || input.Value.Length == 0)
        return SqlBytes.Null;

    using (MemoryStream msInput = new MemoryStream(input.GetUnicodeBytes())) {
        using (MemoryStream msOutput = new MemoryStream()) {
            using (GZipStream deflateStream = new GZipStream(msOutput, CompressionMode.Compress, true)) {
                byte[] buffer = new byte[32768];
                int read;
                while ((read = msInput.Read(buffer, 0, buffer.Length)) > 0)
                    msOutput.Write(buffer, 0, read);
            }

            return new SqlBytes(msOutput.ToArray());
        }
    }
}

[SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]
public static SqlString ufn_GZipDecompress(SqlBytes input) {
    if (input.IsNull || input.IsNull)
        return SqlString.Null;

    byte[] buf = new byte[32768];

    using (MemoryStream msOutput = new MemoryStream()) {
        using (GZipStream deflateStream = new GZipStream(input.Stream, CompressionMode.Decompress, true)) {
            int bytesRead;
            while ((bytesRead = deflateStream.Read(buf, 0, 32768)) > 0)
                msOutput.Write(buf, 0, bytesRead);
        }

        return new SqlString(Encoding.UTF8.GetString(msOutput.ToArray()));
    }
}

The problem is when I try to decompress binary data, I do not get the the output that is expected, for example:

SELECT dbo.[ufn_GZipDecompress](dbo.[ufn_GZipCompress](N'Hello World'))

Returns

H

Solution

  • I was working on encryption CLRs at some point and remember something similar to this and it turned out to be encoding issue. sql's default SQL_Latin1_General_CP1_CI_AS is Windows-1252 encoded NOT UTF-8!

    I am not positive if this is going to be a problem with your GetUnicodeBytes as well as your returned encoding or not. You should test to make sure input.GetUnicodeBytes() gives you the desired result, I do that by recompiling and throwing custom exception with the data but I am sure others have other methods.

    Then for your Decompressing you might try something like:

    Encoding enc = Encoding.GetCoding(1252);
    ecn.GetString(yourbytearray)