Search code examples
javafilesystemsread-write

Java I/O operating on every file in a path


I have an HTML help system that I need to convert to SharePoint. The two most time consuming projects are to change the document links and to gather metadata. However, I'm lucky because this data is easily accessible. Each file is an HTML document, oversimplified as below:

 <body>
   <!--- Metadata follows
   Procedure Name: my document
   Procedure Number: xxxxx
   Use: freeform text explaining when procdure is used
   Revision Date: xx/xx/xx
   By: responsible party for revision
   <!--- end metadata

   <h1>Procedure Name<\h1>
   <p>procedure background and narrative, with links, as needed, to other documents at \\documentation-server\path\document-name.html
 <\body>

I can successfully extract & manipulate the right Strings, and I'm trying to incorporate that process into an automated solution. Since this is my first venture into file i/o, however, I'm a little fuzzy on what to do next.

In a perfect world, given a path, I would like to step though each *.html file in a path. I cannot seem to find a class/method to do that. newInputStream and newOutpuStream give me the file access, but I need to provide a path & file parameter. The FileVisitor interface appears to only interact file attributes and perform delete/copy/rename type functions.

Is there a soemthing that would combine these into a single function that would step through each file in a path, open it and allow my line-by-line parse, then close the file and move to the next one to repeat?

My other thought was to create an array of filenames, then feed that array into the filename parameter of newInputStream.

Suggestions?


Solution

  • If you use Java 7, the FileVisitor interface enables you to walk a file tree very easily. See for example the Java Tutorial.

    You can override the visitFile method to do what you want with the file, for example (not tested):

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
        if (attr.isRegularFile() && file.getFileName().toString().endsWith(".html")) {
            Charset charset = Charset.forName("UTF-16");
            try (BufferedReader reader = Files.newBufferedReader(file, charset)) {
               String line;
               while ((line = reader.readLine()) != null) {
                   System.out.println(line); //do what you need to do here
                }
             } catch (IOException x) {
                 //Print / log the errror
             }
        }
        return CONTINUE;
    }