My aim is to load a document from a web server and then parse its DOM for specific content. Loading the DOM is my problem.
I am trying to use a javafx.scene.web.WebEngine
as this seems as if it should be able to do all the necessary mechanics, including javascript execution, which may affect the final DOM.
When loading a document, it appears to get stuck in the RUNNING
state and never reaches the SUCCEEDED
state, which I believe is required before accessing the DOM from WebEngine.getDocument()
.
This occurs whether loading from a URL or literal content (as used in this minimal example).
Can anyone see what I’m doing wrong, or misunderstanding?
Thanks in advance for any help.
import java.util.concurrent.ExecutionException;
import org.w3c.dom.Document;
import javafx.application.Platform;
import javafx.concurrent.Task;
import javafx.concurrent.Worker;
import javafx.embed.swing.JFXPanel;
import javafx.scene.web.WebEngine;
public class WebEngineProblem {
private static Task<WebEngine> getEngineTask() {
Task<WebEngine> task = new Task<>() {
@Override
protected WebEngine call() throws Exception {
WebEngine webEngine = new WebEngine();
final Worker<Void> loadWorker = webEngine.getLoadWorker();
loadWorker.stateProperty().addListener((obs, oldValue, newValue) -> {
System.out.println("state:" + newValue);
if (newValue == State.SUCCEEDED) {
System.out.println("finished loading");
}
});
webEngine.loadContent("<!DOCTYPE html>\r\n" + "<html>\r\n" + "<head>\r\n" + "<meta charset=\"UTF-8\">\r\n"
+ "<title>Content Title</title>\r\n" + "</head>\r\n" + "<body>\r\n" + "<p>Body</p>\r\n" + "</body>\r\n"
+ "</html>\r\n");
State priorState = State.CANCELLED; //should never be CANCELLED
double priorWork = Double.NaN;
while (loadWorker.isRunning()) {
final double workDone = loadWorker.getWorkDone();
if (loadWorker.getState() != priorState || priorWork != workDone) {
priorState = loadWorker.stateProperty().getValue();
priorWork = workDone;
System.out.println(priorState + " " + priorWork + "/" + loadWorker.getTotalWork());
}
Thread.sleep(1000);
}
return webEngine;
}
};
return task;
}
public static void main(String[] args) {
new JFXPanel(); // Initialise the JavaFx Platform
WebEngine engine = null;
Task<WebEngine> task = getEngineTask();
try {
Platform.runLater(task);
Thread.sleep(1000);
engine = task.get(); // Never completes as always RUNNING
}
catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
// This code is never reached as the content never completes loading
// It would fail as it's not on the FX thread.
Document doc = engine.getDocument();
String content = doc.getTextContent();
System.out.println(content);
}
}
The change to a Worker
's state
property will occur on the FX Application Thread, even though that worker is running on a background thread. (JavaFX properties are essentially single-threaded.) Somewhere in the implementation of the thread that loads the web engine's content, there is a call to Platform.runLater(...)
that changes the state of the worker.
Since your task blocks until the state of the worker has changed, and since you make your task run on the FX Application Thread, you have essentially deadlocked the FX Application Thread: the change to the load worker's state can't occur until your task completes (because it is running on the same thread), and your task can't complete until the state changes (as that's what you programmed the task to do).
It is basically always an error to block the FX Application Thread. Instead, you should block another thread until the conditions you want are true (web engine is created and loading thread completes), and then execute the next thing you want to do when that occurs (using Platform.runLater(...)
again if it needs to be executed on the FX Application Thread).
Here is an example doing what I think you are trying to do:
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.FutureTask;
import org.w3c.dom.Document;
import javafx.application.Platform;
import javafx.concurrent.Worker;
import javafx.concurrent.Worker.State;
import javafx.embed.swing.JFXPanel;
import javafx.scene.web.WebEngine;
public class WebEngineProblem {
public static void main(String[] args) throws InterruptedException, ExecutionException {
new JFXPanel(); // Initialise the JavaFx Platform
CountDownLatch loaded = new CountDownLatch(1);
FutureTask<WebEngine> createEngineTask = new FutureTask<WebEngine>( () -> {
WebEngine webEngine = new WebEngine();
final Worker<Void> loadWorker = webEngine.getLoadWorker();
loadWorker.stateProperty().addListener((obs, oldValue, newValue) -> {
System.out.println("state:" + newValue);
if (newValue == State.SUCCEEDED) {
System.out.println("finished loading");
loaded.countDown();
}
});
webEngine.loadContent("<!DOCTYPE html>\r\n" + "<html>\r\n" + "<head>\r\n" + "<meta charset=\"UTF-8\">\r\n"
+ "<title>Content Title</title>\r\n" + "</head>\r\n" + "<body>\r\n" + "<p>Body</p>\r\n" + "</body>\r\n"
+ "</html>\r\n");
return webEngine ;
});
Platform.runLater(createEngineTask);
WebEngine engine = createEngineTask.get();
loaded.await();
Platform.runLater(() -> {
Document doc = engine.getDocument();
String content = doc.getDocumentElement().getTextContent();
System.out.println(content);
});
}
}