Search code examples
javascriptdownloadhrefattachmenthtmlunit

HtmlUnit to invoke javascript from href to download a file


I have tried to download a file that seems to have to be clicked vi a browser. The site uses a form for which inside are several hrefs to a javascript function named downloadFile. In this function, the element named poslimit is obtained by document.getElementById:

function downloadFile(actionUrl, formId)
{
    document.getElementById(formId).action=actionUrl;
    document.getElementById(formId).submit();
}

The HTML source snippett:

<form method="post" name="commandForm" action="position-limits" id="poslimit">
    <div id="content">
        <li><a href="javascript:downloadFile('position-limits?fileName=20130711&positionLimit=CURRENT_POSITION_LIMIT_', 'poslimit');" > July 11, 2013 </a></li>

So clicking on the linked code above in the href invokes the javascript in another file:

I've tried:

WebClient webClient = new WebClient(BrowserVersion.CHROME_16);
HtmlPage page = webClient.getPage("http://www.theocc.com/webapps/position-limits");
HtmlForm elt = page.getHtmlElementById("poslimit");
elt.setAttribute("action", "position-limits?fileName=20130709&positionLimit=POSITIONLIMITCHANGE_");
InputStream is = elt.click().getWebResponse().getContentAsStream();
int b = 0;
while ((b = is.read()) != -1)
{
    System.out.print((char)b);
}
webClient.closeAllWindows();

Also tried using HtmlElement I Also tried:

WebClient webClient = new WebClient(BrowserVersion.CHROME_16);
HtmlPage page = webClient.getPage("http://www.theocc.com/webapps/position-limits");
ScriptResult sr = page.executeJavaScript("downloadFile('position-limits?fileName=20130709&positionLimit=POSITIONLIMITCHANGE_', 'poslimit'");
InputStream is = sr.getNewPage().getWebResponse().getContentAsStream();
int b = 0;
while ((b = is.read()) != -1)
{
    System.out.print((char)b);
}
webClient.closeAllWindows();

Both of these come from examples on this and other boards, but I continue to just get the original page back instead of the attached file. I am also wondering if I need to look at history for the proper page response as maybe the return window/document I need is the previous. Courteous links to full explanations or good exampled documentaion as well as source I could try are appreciated.


Solution

  • So I think this may be helpful to others as I have not seen a working example.

    WebClient webClient = new WebClient(BrowserVersion.CHROME_16);
    HtmlPage page = webClient.getPage("http://www.theocc.com/webapps/position-limits");
    HtmlAnchor anchor = null;
    List<HtmlAnchor> anchors = page.getAnchors();
    for (int i = 0; i < anchors.size(); ++i)
    {
        anchor = anchors.get(i);
        String sAnchor = anchor.asText();
        // This date should come in from args
        if (sAnchor.equals("July 9, 2013"))
            break;
    }
    // This is not safe, need null check
    Page p = anchor.click();
    InputStream is = p.getWebResponse().getContentAsStream();
    int b = 0;
    while ((b = is.read()) != -1)
    {
        System.out.print((char)b);
    }
    webClient.closeAllWindows();
    

    This question helped me a bit as I tried the anchor thingy and it worked. struggling to click on link within htmlunit