I am writing a routine that will retrieve a list of URLs from a file, get the content for each URL using JSoup, find certain patterns and write the findings in output files (one for each URL that was analyzed).
I have a WebPageAnalysisTask (which implements Callable) and by now it is returning null, but it will return an object that holds the results of the processing (to be done):
public WebPageAnalyzerTask(String targetUrl, Pattern searchPattern) {
this.targetUrl = targetUrl;
this.searchPattern = searchPattern;
}
@Override
public WebPageAnalysisTaskResult call() throws Exception {
long startTime = System.nanoTime();
String htmlContent = this.getHtmlContentFromUrl();
List<String> resultContent = this.getAnalysisResults(htmlContent);
try (BufferedWriter bw = Files.newBufferedWriter(Paths.get("c:/output", UUID.randomUUID().toString() + ".txt"),
StandardCharsets.UTF_8, StandardOpenOption.WRITE)) {
bw.write(parseListToLine(resultContent));
}
long endTime = System.nanoTime();
return null;
}
I am writing the file using NIO and try-with-resources.
The code that will use the task is the following:
/**
* Starts the analysis of the Web Pages retrieved from the input text file using the provided pattern.
*/
public void startAnalysis() {
List<String> urlsToBeProcessed = null;
try (Stream<String> stream = Files.lines(Paths.get(this.inputPath))) {
urlsToBeProcessed = stream.collect(Collectors.toList());
if (urlsToBeProcessed != null && urlsToBeProcessed.size() > 0) {
List<Callable<WebPageAnalysisTaskResult>> pageAnalysisTasks = this
.buildPageAnalysisTasksList(urlsToBeProcessed);
ExecutorService executor = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
List<Future<WebPageAnalysisTaskResult>> results = executor.invokeAll(pageAnalysisTasks);
executor.shutdown();
} else {
throw new NoContentToProcessException();
}
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* Builds a list of tasks in which each task will be filled with data required for the analysis processing.
* @param urlsToBeProcessed The list of URLs to be processed.
* @return A list of tasks that must be handled by an executor service for asynchronous processing.
*/
private List<Callable<WebPageAnalysisTaskResult>> buildPageAnalysisTasksList(List<String> urlsToBeProcessed) {
List<Callable<WebPageAnalysisTaskResult>> tasks = new ArrayList<>();
UrlValidator urlValidator = new UrlValidator(ALLOWED_URL_SCHEMES);
urlsToBeProcessed.forEach(urlAddress -> {
if (urlValidator.isValid(urlAddress)) {
tasks.add(new WebPageAnalyzerTask(urlAddress, this.targetPattern));
}
});
return tasks;
}
The file holding the URLs list is being read once. The ExecutorService creates the task for each URL and will analyze and write the file with results asynchronously.
By now the file is being read and the content of HTML for each URL is being analyzed and saved in a String. However the task is not writing the file. So I wonder to know what could be happening there.
Can somebody tell me if I am missing something?
Thanks in advance.
Probably you're getting an exception in the following try
block
try (BufferedWriter bw = Files.newBufferedWriter(Paths.get("c:/output", UUID.randomUUID().toString() + ".txt"),
StandardCharsets.UTF_8, StandardOpenOption.WRITE)) {
bw.write(parseListToLine(resultContent));
}
Try to add a catch
block to it and print the exception if it actually happens to see what causes it
catch (IOException e) {
// Replace with logger or some kind of error handling in production code
e.printStackTrace();
}