Search code examples

Compressing and decompressing a string yields only the first letter of the original string?

I'm compressing a string with Gzip using this code:

public static String Compress(String decompressed)
        byte[] data = Encoding.Unicode.GetBytes(decompressed);
        using (var input = new MemoryStream(data))
        using (var output = new MemoryStream())
            using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
            return Convert.ToBase64String(output.ToArray());

and decompressing it with this code:

    public static String Decompress(String compressed)
        byte[] data = Convert.FromBase64String(compressed);
        using (MemoryStream input = new MemoryStream(data))
        using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
        using (MemoryStream output = new MemoryStream())
            StringBuilder sb = new StringBuilder();
            foreach (byte b in output.ToArray())
            return sb.ToString();

When I use these functions in this sample code, the result is only the letter S:

String test = "SELECT * FROM foods f WHERE = 'chicken';";
String com = Compress(test);
String decom = Decompress(com);

If I debug the code, I see that the value of decom is

S\0E\0L\0E\0C\0T\0 \0*\0 \0F\0R\0O\0M\0 \0f\0o\0o\0d\0s\0 \0f\0 \0W\0H\0E\0R\0E\0 \0f\0.\0n\0a\0m\0e\0 \0=\0 \0'\0c\0h\0i\0c\0k\0e\0n\0'\0;\0

but the value displayed is only the letter S.


  • These lines are the problem:

    foreach (byte b in output.ToArray())

    You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:

    string decoded = Encoding.Unicode.GetString(output.ToArray());

    which will convert the byte array to a string, based on the encoding.

    The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes. As well, you may want to use Encoding.UTF8 instead of Encoding.Unicode (though that shouldn't matter, as long as the encodings match up.)