Search code examples
javaiframehtmlunit

How to print external script inside iframe using htmlunit?


import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.SilentCssErrorHandler;
import com.gargoylesoftware.htmlunit.ThreadedRefreshHandler;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.HtmlPage;    
public class ReadHtml{
       public static void main(String[] args) throws Exception {
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setActiveXNative(true);
    webClient.getOptions().setAppletEnabled(false);
    webClient.getOptions().setCssEnabled(true);
    webClient.getOptions().setDoNotTrackEnabled(true);
    webClient.getOptions().setGeolocationEnabled(false);
    webClient.getOptions().setPopupBlockerEnabled(false);
    webClient.getOptions().setPrintContentOnFailingStatusCode(true);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(true);
    webClient.getOptions().setThrowExceptionOnScriptError(true);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    webClient.setCssErrorHandler(new SilentCssErrorHandler());
    webClient.setRefreshHandler(new ThreadedRefreshHandler());
    webClient.getCookieManager().setCookiesEnabled(true);
    WebRequest request = new WebRequest(new URL("some url containing javascript to load html elements"));
    try {
            Page page;
            page = webClient.getPage(request);
            //System.out.println(page.getWebResponse().getContentAsString());
            System.out.println(((HtmlPage) page).asXml());
    } catch (FailingHttpStatusCodeException e) {
            e.printStackTrace();
    } catch (IOException e) {
            e.printStackTrace();
    }
}
}

I want to print all html element(not only source code), including html which are produced by javascript,iframes, nested iframes. I tried with this code but (also tried identifying by id,name but not prefer to print anyting specifically. want to print entire html contents), html load by javascript is not printing to console. Can Someone point out the modification need to be carried out? Thanks in advance.


Solution

  • Try using page.asXML.

    HTMLPage itself is a DOM Node, so you can iterate through the children recursively The frames may be accessed (recursively) via DOM or via page.getFrames.

    If you need to print all the responses from server, you can use WebConnectionWrapper as interceptor. This will get you access to all the responses (including Script ones)


    July 9

    Frames are part of the DOM. But, if some of the content is being loaded asynchronously (Ajax), HTMLUnit might not have waited for that to load. Try adding an AjaxController to your WebClient. Here is an example.

    For WebConnectoinWrapper, use this example. But again, if there is some asynchronous processing, HTMLUnit may exit before all the processing is done. So, AjaxController might be your best bet.

    browser.setWebConnection(new WebConnectionWrapper(browser) {
      public WebResponse getResponse(final WebRequest request) throws IOException {
        WebResponse response = super.getResponse(request);
        //processResponse
        return response;
     }
    });
    

    July 10

    NicelyResynchronizingAjaxController works for user initiated ajax. For "self loading" ones try something like this.

    public class AlwaysSynchronizingAjaxController extends NicelyResynchronizingAjaxController {
    public boolean processSynchron(HtmlPage page, WebRequest settings, boolean async) {
        return true;
    };
    }
    

    If you are using Fiddler (or wireshark or any other sniffing/interceptor tools), see if you find the communication for the dynamically loaded requests.