I am currently attempting a project to send a url to multiple websites to scan them for categorisation and any security risks using java and HtmlUnit.
www.virustotal.com is the last website I have to configure and I am having issues progressing through the site due to a href being empty.
The site works by entering a URL into the first page and then clicking submit. From here a popup is shown and the user has to select whether to re-analyse or use the last scan results (in this case we want to always re-analyse). It is the re-analyse anchor that is providing the empty href. My thoughts are that this is a javascript issue with it not generating the URL to the results page. Unfortunately I am unsure of where to go next :/
Project Code (apologies for how scruffy it is!):-
//turn off htmlunit logging//
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF);
java.util.logging.Logger.getLogger("org.apache.http.client.protocol.ResponseProcessCookies").setLevel(java.util.logging.Level.OFF);
//initialise url and obtain users selection//
System.out.println("Please select the url you would like to review:");
Scanner sc = new Scanner(System.in);
String startPath = sc.nextLine();
//enable javascript and use engine to initialise and parse websites code//
String url = "https://www.virustotal.com/#url";
System.out.println("Connecting to Virus Total...");
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.waitForBackgroundJavaScript(8000);
page = webClient.getPage(url);
//fill in form
HtmlForm form = page.getFirstByXPath("//form[@action='/en/url/submission/']");
HtmlTextInput textField = form.getInputByName("url");
textField.setValueAttribute(startPath);
HtmlButton button1 = page.getFirstByXPath("//button[@id='btn-scan-url']");
HtmlPage page1 = button1.click();
//waiting and dealing with popup
webClient.waitForBackgroundJavaScript(8000);
String page1String = page1.getWebResponse().getContentAsString();
System.out.println(page1String);
HtmlAnchor htmlAnchor = page1.getFirstByXPath("//button[@id='btn-url-reanalyse']");
System.out.println(htmlAnchor); //testing what I can see in the anchor
HtmlPage page2 = htmlAnchor.click();
//progressing to next screen
String output = page2.asText();
System.out.println(output);
HTML I receive when I print out string page1String:
<div class="modal-footer">
<a id="btn-url-reanalyse" class="btn" href="">
Reanalyse
</a>
HTML when manually progressing through site:
<a id="btn-url-reanalyse" class="btn" href="/en/url/submission/?force=1&url=http%3A//www.facebook.com/&token=415eda59daae48938b1dcc64f3152ed5ee9ac27d485348d55c87e9da7e714605">
Reanalyse
</a>
Any help or advice would be greatly appreciated! I am also happy to work with any module recommendations that are provided, simply using HtmlUnit
as it was one of the first I found that actually worked with other sites.
Thanks in advance.
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
I think disable the logging is a bad idea while hunting for a problem. If you enable logging you will see that there is a js error.
webClient.getOptions().setThrowExceptionOnScriptError(false);
Because of this the program continues but parts of the javascript are not executed. I guess that's the reason why your link does not get updated.
The Javascript error looks like a HtmlUnit bug. Please open an issue and isolate a minimal testcase as described here.