Search code examples
javaarraysstringcompressionbyte

Java String read as a byte array


I have a string in the following format:

A|B|A_VERY_LONG_STRING_THAT_WILL_BE_COMPRESSED|C|D.

The above string will be parsed with pipe as a delimiter and stored in some array, let say result[].

result[0]=A;
result[1]=B;
result[2]=A_VERY_LONG_STRING_THAT_WILL_BE_COMPRESSED;
result[3]=C;
result[4]=D

Now the result[2] elements will be compressed using the following method:

public static byte[] compressUsingStream(String payload) {

        try (ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
             GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream)) {

            gzipOutputStream.write(payload.getBytes("UTF-8"));

            gzipOutputStream.finish();
            gzipOutputStream.close();

            return byteArrayOutputStream.toByteArray();

        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

So something like this:

byte[] compressedPayloadAsBytes = PayloadCompressionDecompression.compressUsingStream(result2);

Next I intend to convert the rest of the elements in the result[] array to byte as well and create another array:

byte[] finalArray = concatAll(result[0].getBytes(), 
"|".getBytes(), 
result[1].getBytes(), 
"|".getBytes(), 
compressedPayloadAsBytes, 
"|".getBytes(), 
result[3].getBytes(), 
"|".getBytes(), 
result[4].getBytes());

And then write the finalArray[] to a file:

Path path = Path.of(file);
Files.write(path, finalArray);

I want to read the same data from the file which I will do as follows:

byte[]  allBytesFromFile = Files.readAllBytes(path);
String recordWithCompressedPayload = new String(allBytesFromFile);

I separate the compressed payload as follows:

int payloadStart = StringUtils.ordinalIndexOf(recordWithCompressedPayload, "|", 2);
int payloadEnd = StringUtils.lastOrdinalIndexOf(recordWithCompressedPayload, "|", 2);

String compressedPayloadAsStr = recordWithCompressedPayload.substring(payloadStart+1, payloadEnd);

Now when I pass the compressedPayloadAsStr to a decompression method I get java.lang.RuntimeException: java.util.zip.ZipException: Not in GZIP format

My decompression method is as follows:

public static String deCompressUsingStream(byte[] compressedPayload) {

        try (GZIPInputStream gzipInputStream = new GZIPInputStream(new ByteArrayInputStream(compressedPayload))) {

            final StringWriter stringWriter = new StringWriter();
            IOUtils.copy(gzipInputStream, stringWriter, UTF_8);
            gzipInputStream.close();
            return stringWriter.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

Call to the above method is PayloadCompressionDecompression.deCompressUsingStream(compressedPayloadAsStr.getBytes())

Can someone help with me with retrieving my compressed payload from the file and passing it correctly to the deCompressUsingStream() method?


Solution

  • A generous thanks to @g00se and @Robert for helping me understand my problem.

    Basically, if you compress a string then it no longer is a text. It becomes a binary. And you cannot store binary as a string. If you do then you might lose some data which will eventually corrupt it.

    To overcome this, we use Base64 encoding so that there is no data loss. This was explained to me by both @g00se and @Robert.

    In my case I was using encryption using the AES cryptography which results in Base64 encoded string. So I merely had to take the payload, compress it and then encrypt it. This resulted in what I intended.

    Take a look at this image where I played around with different sets of payload sizes: Image shows for smaller string we lose compression but we get good compression for bigger strings