android selenium web-scraping selendroid

Selendroid as a web scraper

I intend to create an Android application that performs a headless login to a website and then scrape some content from the subsequent page while maintaining the logged-in session.

I first used HtmlUnit in a normal Java project and it worked just fine. But later found that HtmlUnit is not compatible with Android.

Then I tried JSoup library by sending HTTP “POST” request to the login form. But the resulting page does not load up completely since JSoup won't support JavaScript.

I was then suggested to have a look on Selendroid which actually is an android test automation framework. But what I actually need is an Html parser that supports both JavaScript and Android. I find Selendroid quite difficult to understand which I can't even figure out which dependencies to use.

selendroid-client
selendroid-standalone
selendroid-server

With Selenium WebDriver, the code would be as simple as the following. But can somebody show me a similar code example for Selendroid as well?

    WebDriver driver = new FirefoxDriver();
    driver.get("https://mail.google.com/");

    driver.findElement(By.id("email")).sendKeys(myEmail);
    driver.findElement(By.id("pass")).sendKeys(pass);

    // Click on 'Sign In' button
    driver.findElement(By.id("signIn")).click();

And also,

What dependencies to add to my Gradle.Build file?
Which Selendroid libraries to import?

Solution

Unfortunately I didn't get Selendroid to work. But I find a workaround to scrape dynamic content by using just Android's built in WebView with JavaScript enabled.

mWebView = new WebView();
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.addJavascriptInterface(new HtmlHandler(), "HtmlHandler");

mWebView.setWebViewClient(new WebViewClient() {
   @Override
   public void onPageFinished(WebView view, String url) {
       super.onPageFinished(view, url);

       if (url == urlToLoad) {
       // Pass html source to the HtmlHandler
       WebView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");

   }
});

The JS method document.documentElement.outerHTML will retrieve the full html contained in the loaded url. Then the retrived html string is sent to handleHtml method in HtmlHandler class.

class HtmlHandler {
        @JavascriptInterface
        @SuppressWarnings("unused")
        public void handleHtml(String html) {
            // scrape the content here

        }
    }

You may use a library like Jsoup to scrape the necessary content from the html String.