We have a requirement to decompress some data created by a Java system using the DEFLATE algorithm. This we have no control over.
While we don't know the exact variant, we are able to decompress data sent to us using the following Java code:
public static String inflateBase64(String base64)
{
try (Reader reader = new InputStreamReader(
new InflaterInputStream(
new ByteArrayInputStream(
Base64.getDecoder().decode(base64)))))
{
StringWriter sw = new StringWriter();
char[] chars = new char[1024];
for (int len; (len = reader.read(chars)) > 0; )
sw.write(chars, 0, len);
return sw.toString();
}
catch (IOException e)
{
System.err.println(e.getMessage());
return "";
}
}
Unfortunately, our ecosystem is C# based. We're shelling out to the Java program at the moment using the Process object but this is clearly sub-optimal from a performance point of view so we'd like to port the above code to C# if at all possible.
Some sample input and output:
>java -cp . Deflate -c "Pack my box with five dozen liquor jugs."
eJwLSEzOVsitVEjKr1AozyzJUEjLLEtVSMmvSs1TyMksLM0vUsgqTS/WAwAm/w6Y
>java -cp . Deflate -d eJwLSEzOVsitVEjKr1AozyzJUEjLLEtVSMmvSs1TyMksLM0vUsgqTS/WAwAm/w6Y
Pack my box with five dozen liquor jugs.
>
We're told the Java system conforms to RFC 1951 so we've looked at quite a few libraries but none of them seem to decompress the data correctly (if at all). One example is DotNetZip:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Ionic.Zlib;
namespace Decomp
{
class Program
{
static void Main(string[] args)
{
// Deflate
String start = "Pack my box with five dozen liquor jugs.";
var x = DeflateStream.CompressString(start);
var res1 = Convert.ToBase64String(x, 0, x.Length);
// Inflate
//String source = "eJwLSEzOVsitVEjKr1AozyzJUEjLLEtVSMmvSs1TyMksLM0vUsgqTS/WAwAm/w6Y"; // *** FAILS ***
String source = "C0hMzlbIrVRIyq9QKM8syVBIyyxLVUjJr0rNU8jJLCzNL1LIKk0v1gMA";
var part1 = Convert.FromBase64String(source);
var res2 = DeflateStream.UncompressString(part1);
}
}
}
This implements RFC 1951 according to the documentation, but does not decipher the string correctly (presumably due to subtle algorithm differences between implementations).
From a development point of view we could do with understanding the exact variant we need to write. Is there any header information or online tools we could use to provide an initial steer? It feels like we're shooting in the dark a little bit here.
https://www.nuget.org/packages/ICSharpCode.SharpZipLib.dll/
using ICSharpCode.SharpZipLib.Zip.Compression.Streams;
using System;
using System.IO;
using System.Text;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = "Pack my box with five dozen liquor jugs.";
string encoded = Encode(input);
string decoded = Decode(encoded);
Console.WriteLine($"Input: {input}");
Console.WriteLine($"Encoded: {encoded}");
Console.WriteLine($"Decoded: {decoded}");
Console.ReadKey(true);
}
static string Encode(string text)
{
byte[] bytes = Encoding.UTF8.GetBytes(text);
using (MemoryStream inms = new MemoryStream(bytes))
{
using (MemoryStream outms = new MemoryStream())
{
using (DeflaterOutputStream dos = new DeflaterOutputStream(outms))
{
inms.CopyTo(dos);
dos.Finish();
byte[] encoded = outms.ToArray();
return Convert.ToBase64String(encoded);
}
}
}
}
static string Decode(string base64)
{
byte[] bytes = Convert.FromBase64String(base64);
using (MemoryStream ms = new MemoryStream(bytes))
{
using (InflaterInputStream iis = new InflaterInputStream(ms))
{
using (StreamReader sr = new StreamReader(iis))
{
return sr.ReadToEnd();
}
}
}
}
}
}