I'm trying to compress string by python like a specific C# code but I'm getting a different result. It seems I have to add a header to the compressed result but I don't know how can I add a header to a compressed string in python. This is the C# line which I don't know how would be in python:
memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);
This is the whole runable C# code
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
namespace Rextester
{
/// <summary>Handles compressing and decompressing API requests and responses.</summary>
public class Compression
{
#region Member Variables
/// <summary>The compressed message header length.</summary>
private const int CompressedMessageHeaderLength = 4;
#endregion
#region Methods
/// <summary>Compresses the XML string.</summary>
/// <param name="documentToCompress">The XML string to compress.</param>
public static string CompressData(string data)
{
using (MemoryStream memoryStream = new MemoryStream())
{
byte[] plainBytes = Encoding.UTF8.GetBytes(data);
using (GZipStream zipStream = new GZipStream(memoryStream, CompressionMode.Compress, leaveOpen: true))
{
zipStream.Write(plainBytes, 0, plainBytes.Length);
}
memoryStream.Position = 0;
byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];
Buffer.BlockCopy(
BitConverter.GetBytes(plainBytes.Length),
0,
compressedBytes,
0,
CompressedMessageHeaderLength
);
// Add the header, which is the length of the compressed message.
memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);
string compressedXml = Convert.ToBase64String(compressedBytes);
return compressedXml;
}
}
#endregion
}
public class Program
{
public static void Main(string[] args)
{
//Your code goes here
string data = "Hello World!";
Console.WriteLine( Compression.CompressData(data) );
// result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA
}
}
}
and this is the Python code I wrote:
data = 'Hello World!'
import gzip
import base64
print(base64.b64encode(gzip.compress(data.encode('utf-8'))))
# I expect DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA
# but I get H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=
You can use to_bytes
to convert length of encoded string:
enc = data.encode('utf-8')
zipped = gzip.compress(enc)
print(base64.b64encode((len(enc)).to_bytes(4, sys.byteorder) + zipped)) # sys.byteorder can be set to concrete fixed value
Also it seems that gzip.compress(enc)
produces slightly different result than C# counterpart (so the overall result will also differ) but this should not be an issue so decompress should handle everything correctly.