Search code examples
javascriptandroidweb-scrapingandroid-webview

WebView Crawler navigate to url based on page result


I'm trying to build a web crawler based on the requirements that were described here, and I figured WebView would be the most suitable way to implement this.

The problem seems to emerge when the next URL I need to visit is based on the HTML contents of the current page.
I am using view.evaluateJavascript to get the current page HTML and parse the URL part inside onReceiveValue, but then there is no way for me to navigate to the URL because onReceiveValue cannot access the view.

Also, using loadUrl in onPageFinished does not work as well, because it is done even before I retrieve the HTML content, thus navigating to the page with a null value

WebView myWebView = new WebView(this);
setContentView(myWebView);

myWebView.getSettings().setJavaScriptEnabled(true);
MyJavaScriptInterface jInterface = new MyJavaScriptInterface(this);
myWebView.addJavascriptInterface(jInterface, "HTMLOUT");

myWebView.setWebViewClient(new WebViewClient() {
 @Override
 public void onPageFinished(WebView view, String url) {
  super.onPageFinished(view, url);
  if (url.equals("http://url.com")) {
   final String[] versionString = {
    null
   };
   view.evaluateJavascript("(function(){return window.document.body.outerHTML})();",
    new ValueCallback < String > () {
     @Override
     public void onReceiveValue(String html) {
      String result = removeUTFCharacters(html).toString();
      Matcher m = r.matcher(result);
      versionString[0] = m.group(1);
     }
    });
   String getFullUrl = String.format("https://url.com/getData?v=%s", versionString[0]);
   view.loadUrl(getFullUrl);
  }
 }
});
myWebView.loadUrl("http://url.com");

Solution

  • Call your url from onReceiveValue

     myWebView.setWebViewClient(new WebViewClient() {
            @Override
            public void onPageFinished(WebView view, String url) {
                super.onPageFinished(view, url);
                if (url.contains("https://www.google.com")) {
                    final String[] versionString = {
                            null
                    };
                    view.evaluateJavascript("(function(){return window.document.body.outerHTML})();",
                            new ValueCallback< String >() {
                                @Override
                                public void onReceiveValue(String html) {
    
                                    String getFullUrl = String.format("https://cchat.in", versionString[0]);
                                    view.loadUrl(getFullUrl);
                                }
                            });
    
                }
            }
        });
        myWebView.loadUrl("https://www.google.com");
    

    I used 2 website to demonstrate. I am able to call 2nd URL from onReceiveValue.

    You can try this.