I intend to create an Android application that performs a headless login to a website and then scrape some content from the subsequent page while maintaining the logged-in session.
I first used HtmlUnit in a normal Java project and it worked just fine. But later found that HtmlUnit is not compatible with Android.
Then I tried JSoup library by sending HTTP “POST” request to the login form. But the resulting page does not load up completely since JSoup won't support JavaScript.
I was then suggested to have a look on Selendroid which actually is an android test automation framework. But what I actually need is an Html parser that supports both JavaScript and Android. I find Selendroid quite difficult to understand which I can't even figure out which dependencies to use.
With Selenium WebDriver, the code would be as simple as the following. But can somebody show me a similar code example for Selendroid as well?
WebDriver driver = new FirefoxDriver();
driver.get("https://mail.google.com/");
driver.findElement(By.id("email")).sendKeys(myEmail);
driver.findElement(By.id("pass")).sendKeys(pass);
// Click on 'Sign In' button
driver.findElement(By.id("signIn")).click();
And also,
Unfortunately I didn't get Selendroid to work. But I find a workaround to scrape dynamic content by using just Android's built in WebView with JavaScript enabled.
mWebView = new WebView();
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.addJavascriptInterface(new HtmlHandler(), "HtmlHandler");
mWebView.setWebViewClient(new WebViewClient() {
@Override
public void onPageFinished(WebView view, String url) {
super.onPageFinished(view, url);
if (url == urlToLoad) {
// Pass html source to the HtmlHandler
WebView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");
}
});
The JS method document.documentElement.outerHTML
will retrieve the full html contained in the loaded url. Then the retrived html string is sent to handleHtml method in HtmlHandler class.
class HtmlHandler {
@JavascriptInterface
@SuppressWarnings("unused")
public void handleHtml(String html) {
// scrape the content here
}
}
You may use a library like Jsoup to scrape the necessary content from the html String.