I have an HTML help system that I need to convert to SharePoint. The two most time consuming projects are to change the document links and to gather metadata. However, I'm lucky because this data is easily accessible. Each file is an HTML document, oversimplified as below:
<body>
<!--- Metadata follows
Procedure Name: my document
Procedure Number: xxxxx
Use: freeform text explaining when procdure is used
Revision Date: xx/xx/xx
By: responsible party for revision
<!--- end metadata
<h1>Procedure Name<\h1>
<p>procedure background and narrative, with links, as needed, to other documents at \\documentation-server\path\document-name.html
<\body>
I can successfully extract & manipulate the right Strings, and I'm trying to incorporate that process into an automated solution. Since this is my first venture into file i/o, however, I'm a little fuzzy on what to do next.
In a perfect world, given a path, I would like to step though each *.html file in a path. I cannot seem to find a class/method to do that. newInputStream
and newOutpuStream
give me the file access, but I need to provide a path & file parameter. The FileVisitor
interface appears to only interact file attributes and perform delete/copy/rename type functions.
Is there a soemthing that would combine these into a single function that would step through each file in a path, open it and allow my line-by-line parse, then close the file and move to the next one to repeat?
My other thought was to create an array of filenames, then feed that array into the filename parameter of newInputStream
.
Suggestions?
If you use Java 7, the FileVisitor interface enables you to walk a file tree very easily. See for example the Java Tutorial.
You can override the visitFile
method to do what you want with the file, for example (not tested):
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
if (attr.isRegularFile() && file.getFileName().toString().endsWith(".html")) {
Charset charset = Charset.forName("UTF-16");
try (BufferedReader reader = Files.newBufferedReader(file, charset)) {
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line); //do what you need to do here
}
} catch (IOException x) {
//Print / log the errror
}
}
return CONTINUE;
}