Search code examples
javaweb-scrapingsalesforcehtmlunit

Salesforce not working with HTMLUnit


I'm trying to use HTMLUnit to do some web-scraping on Salesforce, to get the organization licenses info. It works if I try using accessing Salesforce through the regular login/test url. But I want to be able to login via session id using the /secur/frontdoor.jsp?sid= method.

When I try to use that, Salesforce complains that javascript is not enabled. But I have it enabled in HTMLUnit.

   java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);

        final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
        HtmlPage page;

        webClient.waitForBackgroundJavaScript(10000);
        webClient.waitForBackgroundJavaScriptStartingBefore(10000);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setRedirectEnabled(true);
        webClient.getOptions().setCssEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setAppletEnabled(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setActiveXNative(true);
        webClient.getOptions().setAppletEnabled(true);

        page = webClient.getPage("https://salesforce--domain/secur/frontdoor.jsp?sid=SessionId");

Solution

  • Figured it out. For some reason, your not getting automatically redirected. So you just need to get the URL from the first getPage and go to the new new URL.

         java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
    
        final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
        HtmlPage page;
    
        webClient.waitForBackgroundJavaScript(10000);
        webClient.waitForBackgroundJavaScriptStartingBefore(10000);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setRedirectEnabled(true);
        webClient.getOptions().setCssEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setAppletEnabled(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setActiveXNative(true);
        webClient.getOptions().setAppletEnabled(true);
    
        page = webClient.getPage("https://salesforce--domain/secur/frontdoor.jsp?sid=SessionId");
    

    page = webClient.getPage(page.getURL());