Search code examples
pythonc#gzipmemorystream

How can I add header to compressed string with gzip in python?


I'm trying to compress string by python like a specific C# code but I'm getting a different result. It seems I have to add a header to the compressed result but I don't know how can I add a header to a compressed string in python. This is the C# line which I don't know how would be in python:

memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

This is the whole runable C# code

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace Rextester
{
    /// <summary>Handles compressing and decompressing API requests and responses.</summary>
    public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 4;
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, CompressionMode.Compress, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            //Your code goes here
            string data = "Hello World!";
            Console.WriteLine(  Compression.CompressData(data) );
            // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA

        }
    }
}

and this is the Python code I wrote:

data = 'Hello World!'

import gzip
import base64
print(base64.b64encode(gzip.compress(data.encode('utf-8'))))

# I expect DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA 
# but I get H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

Solution

  • You can use to_bytes to convert length of encoded string:

    enc = data.encode('utf-8')
    zipped = gzip.compress(enc)
    print(base64.b64encode((len(enc)).to_bytes(4, sys.byteorder) + zipped)) # sys.byteorder can be set to concrete fixed value
    

    Also it seems that gzip.compress(enc) produces slightly different result than C# counterpart (so the overall result will also differ) but this should not be an issue so decompress should handle everything correctly.