Search code examples
javazipjava-iozipinputstream

Java program ignoring all the files inside the zip file


I have program when I give a zip folder path via console. It will go through each item inside that folder (every child item, children of child, etc..). But if it encounters a zip folder it will ignore everything inside the zip folder, I need to read everything including files inside zip folders.

Here is the method that goes through each item:

    public static String[] getLogBuffers(String path) throws IOException//path is given via console
  {
    String zipFileName = path;
    String destDirectory = path;
    BufferedInputStream errorLogBuffer = null;
    BufferedInputStream windowLogBuffer = null;
    String strErrorLogFileContents="";
    String strWindowLogFileContents="";
    String[] errorString=new String[2];



    byte[] buffer = new byte[1024];
    ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFileName));
    ZipEntry zipEntry = zis.getNextEntry();
    while (zipEntry != null)
    {
      String filePath = destDirectory + "/" + zipEntry.getName();
      System.out.println("unzipping" + filePath);
      if (!zipEntry.isDirectory())
      {
                if (zipEntry.getName().endsWith("errorlog.txt"))
                {
                  ZipFile zipFile = new ZipFile(path);
                  InputStream errorStream = zipFile.getInputStream(zipEntry);
                  BufferedInputStream bufferedInputStream=new BufferedInputStream(errorStream);
                  byte[] contents = new byte[1024];
                  System.out.println("ERRORLOG NAME"+zipEntry.getName());
                  int bytesRead = 0;
                  while((bytesRead = bufferedInputStream.read(contents)) != -1) {
                    strErrorLogFileContents += new String(contents, 0, bytesRead);
                  }

                }
                if (zipEntry.getName().endsWith("windowlog.txt"))
                { ZipFile zipFile = new ZipFile(path);
                  InputStream windowStream = zipFile.getInputStream(zipEntry);
                  BufferedInputStream bufferedInputStream=new BufferedInputStream(windowStream);
                  byte[] contents = new byte[1024];
                  System.out.println("WINDOWLOG NAME"+zipEntry.getName());

                  int bytesRead = 0;
                  while((bytesRead = bufferedInputStream.read(contents)) != -1) {
                    strWindowLogFileContents += new String(contents, 0, bytesRead);
                  }

                }

      }
    
      zis.closeEntry();
      zipEntry = zis.getNextEntry();

    }
    errorString[0]=strErrorLogFileContents;
    errorString[1]=strWindowLogFileContents;
    zis.closeEntry();
    zis.close();
    System.out.println("Buffers ready");
    return errorString;
  }

Items accessed inside the parent zip folder (my console output):

unzippingC:logFolders/logX3.zip/logX3/
unzippingC:logFolders/logX3.zip/logX3/Anan/
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/2021-11-23_103518.zip
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/errorlog.txt
unzippingC:logX3.zip/logX3/Anan/errorreports/version.txt
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/windowlog.txt

As you can see the program only go until 2021-11-23_103518.zip and goes in another path after that but 2021-11-23_103518.zip has children items(files) that I need to access appreciate any help, thanks


Solution

  • A zip file is not a folder. Although Windows treats a zip file as if it’s a folder,* it is not a folder. A .zip file is a single file with an internal table of entries, each containing compressed data.

    Each inner .zip file you read requires a new ZipFile or ZipInputStream. There is no way around that.

    You should not create new ZipFile instances to read the same .zip file’s entries. You only need one ZipFile object. You can go through its entries with its entries() method, and you can read each entry with the ZipFile’s getInputStream method.

    (I wouldn’t be surprised if using multiple objects to read the same zip file were to run into file locking problems on Windows.)

    try (ZipFile zipFile = new ZipFile(path))
    {
        Enumeration<? extends ZipEntry> entries = zipFile.entries();
        while (entries.hasMoreElements())
        {
            ZipEntry zipEntry = entries.nextElement();
    
            if (zipEntry.getName().endsWith("errorlog.txt"))
            {
                try (InputStream errorStream = zipFile.getInputStream(zipEntry))
                {
                    // ...
                }
            }
        }
    }
    

    Notice that no other ZipFile or ZipInputStream objects are created. Only zipFile reads and traverses the file. Also notice the use of a try-with-resources statement to implicitly close the ZipFile and the InputStream.

    You should not use += to build a String. Doing so creates a lot of intermediate String objects which will have to be garbage collected, which can hurt your program’s performance. You should wrap each zip entry’s InputStream in an InputStreamReader, then use that Reader’s transferTo method to append to a single StringWriter that holds your combined log.

    String strErrorLogFileContents = new StringWriter();
    String strWindowLogFileContents = new StringWriter();
    
    try (ZipFile zipFile = new ZipFile(path))
    {
        Enumeration<? extends ZipEntry> entries = zipFile.entries();
        while (entries.hasMoreElements())
        {
            ZipEntry zipEntry = entries.nextElement();
    
            if (zipEntry.getName().endsWith("errorlog.txt"))
            {
                try (Reader entryReader = new InputStreamReader(
                    zipFile.getInputStream(zipEntry),
                    StandardCharsets.UTF_8))
                {
                    entryReader.transferTo(strErrorLogFileContents);
                }
            }
        }
    }
    

    Notice the use of StandardCharsets.UTF_8. It is almost never correct to create a String from bytes without specifying the Charset. If you don’t provide the Charset, Java will use the system’s default Charset, which means your program will behave differently in Windows than it will on other operating systems.

    If you are stuck with Java 8, you won’t have the transferTo method of Reader, so you will have to do the work yourself:

            if (zipEntry.getName().endsWith("errorlog.txt"))
            {
                try (Reader entryReader = new BufferedReader(
                    new InputStreamReader(
                        zipFile.getInputStream(zipEntry),
                        StandardCharsets.UTF_8)))
                {
                    int c;
                    while ((c = entryReader.read()) >= 0)
                    {
                        strErrorLogFileContents.write(c);
                    }
                }
            }
    

    The use of BufferedReader means you don’t need to create your own array and implement bulk reads yourself. BufferedReader already does that for you.

    As mentioned above, a zip entry which is itself an inner zip file requires a brand new ZipFile or ZipInputStream object to read it. I recommend copying the entry to a temporary file, since reading from a ZipInputStream made from another ZipInputStream is known to be slow, then deleting the temporary file after you’re done reading it.

    try (ZipFile zipFile = new ZipFile(path))
    {
        Enumeration<? extends ZipEntry> entries = zipFile.entries();
        while (entries.hasMoreElements())
        {
            ZipEntry zipEntry = entries.nextElement();
    
            if (zipEntry.getName().endsWith(".zip"))
            {
                Path tempZipFile = Files.createTempFile(null, ".zip");
                try (InputStream errorStream = zipFile.getInputStream(zipEntry))
                {
                    Files.copy(errorStream, tempZipFile,
                        StandardCopyOption.REPLACE_EXISTING);
                }
    
                String[] logsFromZip = getLogBuffers(tempZipFile.toString());
    
                strErrorLogFileContents.write(logsFromZip[0]);
                strWindowLogFileContents.write(logsFromZip[1]);
    
                Files.delete(tempZipFile);
            }
        }
    }
    

    Finally, consider creating a meaningful class for your return value. An array of Strings is difficult to understand. A caller won’t know that it always contains exactly two elements and won’t know what those two elements are. A custom return type would be pretty short:

    public class Logs {
        private final String errorLog;
    
        private final String windowLog;
    
        public Logs(String errorLog,
                    String windowLog)
        {
            this.errorLog = errorLog;
            this.windowLog = windowLog;
        }
    
        public String getErrorLog()
        {
            return errorLog;
        }
    
        public String getWindowLog()
        {
            return windowLog;
        }
    }
    

    As of Java 16, you can use a record to make the declaration much shorter:

    public record Logs(String errorLog,
                       String windowLog)
    { }
    

    Whether you use a record or write out the class, you can use it as a return type in your method:

    public static Logs getLogBuffers(String path) throws IOException
    {
        // ...
    
        return new Logs(
            strErrorLogFileContents.toString(),
            strWindowLogFileContents.toString());
    }
    

    * The Windows explorer shell’s practice of treating zip files as folders is a pretty bad user interface. I know I’m not the only one who thinks so. It often ends up making things more difficult for users instead of easier.