Search code examples
javagzipinputstream

Java java.io.IOException: Not in GZIP format


I searched for an example of how to compress a string in Java.

I have a function to compress then uncompress. The compress seems to work fine:

   public static String encStage1(String str)
   {
      String format1 = "ISO-8859-1";
      String format2 = "UTF-8";
      if (str == null || str.length() == 0)
      {
         return str;
      }
      System.out.println("String length : " + str.length());
      ByteArrayOutputStream out = new ByteArrayOutputStream();
      String outStr = null;
      try
      {
         GZIPOutputStream gzip = new GZIPOutputStream(out);
         gzip.write(str.getBytes());
         gzip.close();
         outStr = out.toString(format2);
         System.out.println("Output String lenght : " + outStr.length());
      } catch (Exception e)
      {
                  e.printStackTrace();

      }
      return outStr;
   }

But the reverse is complaining about the string not being in GZIP format, even when I pass the return from encStage1 straight back into the decStage3:

   public static String decStage3(String str)
   {
      if (str == null || str.length() == 0)
      {
         return str;
      }
      System.out.println("Input String length : " + str.length());
      String outStr = "";
      try
      {
         String format1 = "ISO-8859-1";
         String format2 = "UTF-8";
         GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes(format2)));
         BufferedReader bf = new BufferedReader(new InputStreamReader(gis, format2));
         String line;
         while ((line = bf.readLine()) != null)
         {
            outStr += line;
         }
         System.out.println("Output String lenght : " + outStr.length());
      } catch (Exception e)
      {
         e.printStackTrace();
      }
      return outStr;
   }

I get this error when I call with a string return from encStage1:

   public String encIDData(String idData)
   {
      String tst = "A simple test string";
      System.out.println("Enc 0: " + tst);
      String stg1 = encStage1(tst);
      System.out.println("Enc 1: " + toHex(stg1));
      String dec1 = decStage3(stg1);
      System.out.println("unzip: " + toHex(dec1));
   }

Output/Error:

Enc 0: A simple test string
String length : 20
Output String lenght : 40
Enc 1: 1fefbfbd0800000000000000735428efbfbdefbfbd2defbfbd495528492d2e51282e29efbfbdefbfbd4b07005aefbfbd21efbfbd14000000
Input String length : 40
java.io.IOException: Not in GZIP format
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)

Solution

  • A small error is:

         gzip.write(str.getBytes());
    

    takes the default platform encoding, which on Windows will never be ISO-8859-1. Better:

         gzip.write(str.getBytes(format1));
    

    You could consider taking "Cp1252", Windows Latin-1 (for some European languages), instead of "ISO-8859-1", Latin-1. That adds comma like quotes and such.

    The major error is converting the compressed bytes to a String. Java separates binary data (byte[], InputStream, OutputStream) from text (String, char, Reader, Writer) which internally is always kept in Unicode. A byte sequence does not need to be valid UTF-8. You might get away by converting the bytes as a single byte encoding (ISO-8859-1 for instance).

    The best way would be

         gzip.write(str.getBytes(StandardCharsets.UTF_8));
    

    So you have full Unicode, every script may be combined.

    And uncompressing to a ByteArrayOutputStream and new String(baos.toByteArray(), StandardCharsets.UTF_8). Using BufferedReader on an InputStreamReader with UTF-8 is okay too, but a readLine throws away the newline characters

    outStr += line + "\r\n"; // Or so.
    

    Clean answer:

    public static byte[] encStage1(String str) throws IOException
    {
       try (ByteArrayOutputStream out = new ByteArrayOutputStream())
       {
           try (GZIPOutputStream gzip = new GZIPOutputStream(out))
           {
               gzip.write(str.getBytes(StandardCharsets.UTF_8));
           }
           return out.toByteArray();
           //return out.toString(StandardCharsets.ISO_8859_1);
           // Some single byte encoding
       }
    }
    
    public static String decStage3(byte[] str) throws IOException
    {
       ByteArrayOutputStream baos = new ByteArrayOutputStream();
       try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str)))
       {
           int b;
           while ((b = gis.read()) != -1) {
               baos.write((byte) b);
           }
       }
       return new String(baos.toByteArray(), StandardCharset.UTF_8);
    }