Search code examples
javafilefileinputstreamepub

Can I use FileReader to read a file containing images and video (say an epub file) and text and is it suggested to do so with respect to perfomance.


I need to parse contents of a epub file and I am trying to see what would be the most efficient way to do it. The epub file may contain images, lot of text and occasionally videos too. Should I go for a FileInputStream or a FileReader?


Solution

  • As epub uses a ZIP archive structure I would propose to handle it as such. Find a small snippet below which list the content of an epub file.

    Map<String, String> env = new HashMap<>();
    env.put("create", "true");
    
    Path path = Paths.get("foobar.epub");
    URI uri = URI.create("jar:" + path.toUri());
    FileSystem zipFs = FileSystems.newFileSystem(uri, env);
    Path root = zipFs.getPath("/");
    Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
        @Override
        public FileVisitResult visitFile(Path file,
                BasicFileAttributes attrs) throws IOException {
            print(file);
            return FileVisitResult.CONTINUE;
        }
    
        @Override
        public FileVisitResult preVisitDirectory(Path dir,
                BasicFileAttributes attrs) throws IOException {
            print(dir);
            return FileVisitResult.CONTINUE;
        }
    
        private void print(Path file) throws IOException {
            Date lastModifiedTime = new Date(Files.getLastModifiedTime(file).toMillis());
            System.out.printf("%td.%<tm.%<tY %<tH:%<tM:%<tS %9d %s\n", 
                    lastModifiedTime, Files.size(file), file);
        }
    });
    

    sample output

    01.01.1970 00:59:59         0 /META-INF/
    11.02.2015 16:33:44       244 /META-INF/container.xml
    11.02.2015 16:33:44      3437 /logo.jpg
    ...
    

    edit If you only want to extract files based on their names you could do it like shown in this snippet for the visitFile(...) method.

    public FileVisitResult visitFile(Path file,
        BasicFileAttributes attrs) throws IOException {
        // if the filename inside the epub end with "*logo.jpg"
        if (file.endsWith("logo.jpg")) {
            // extract the file in directory /tmp/
            Files.copy(file, Paths.get("/tmp/",
                file.getFileName().toString()));
        }
        return FileVisitResult.CONTINUE;
    }
    

    Depending on how you want to process the files inside the epub you might also have a look on the ZipInputStream.

    try (ZipInputStream in = new ZipInputStream(new FileInputStream("foobar.epub"))) {
        for (ZipEntry entry = in.getNextEntry(); entry != null; 
            entry = in.getNextEntry()) {
            System.out.printf("%td.%<tm.%<tY %<tH:%<tM:%<tS %9d %s\n",
                    new Date(entry.getTime()), entry.getSize(), entry.getName());
            if (entry.getName().endsWith("logo.jpg")) {
                try (FileOutputStream out = new FileOutputStream(entry.getName())) {
                    // process the file
                }
            }
        }
    }
    

    sample output

    11.02.2013 16:33:44       244 META-INF/container.xml
    11.02.2013 16:33:44      3437 logo.jpg