Search code examples
javascriptnode.jsweb-crawlernightmare

Nightmare conditional wait()


I'm trying to crawl a webpage using Nightmare, but want to wait for #someelem to be present, only if it actually exists. Otherwise, I want Nightmare to move on. How can this be done using .wait()?

I can't use .wait(ms). Using .wait(selector) means Nightmare will keep waiting until the element is present, but if the page will never have this element, Nightmare will keep waiting forever.

The last option is to use .wait(fn). And I've tried something like this

.wait(function(cheerio) {
            var $ = cheerio.load(document.body.outerHTML);
            var attempt = 0;

            function doEval() {
                if ( $('#elem').length > 0 ) {
                    return true;
                }
                else {
                    attempt++;

                    if ( attempt < 10 ) {
                        setTimeout(doEval,2000); //This seems iffy.
                    }
                    else {
                        return true;
                    }
                }
            }

            return doEval();
        },cheerio)

So, wait and attempt again (upto a threshold), and if the element is not found, then just move on. The code seems wrong around setTimeout, because .wait is done at the browser-scope.

Thanks in advance!


Solution

  • I don't think passing the cheerio library as you have it is going to work very well. The arguments get serialized (more or less) to be passed to the child Electron process, so passing an entire library probably won't work.

    On the up side, the fn part of .wait(fn) is executed in the page context - meaning you have full access to document and the methods it has (eg, querySelector). You could also have access to the page's jQuery context if it exists, or you could even use .inject() to inject it if not.

    Setting that aside, you're right insofar as .wait() (and .evaluate(), for that matter) expect a synchronous method, at least until something like promises could be used directly in .evaluate().

    Until that is available, you could use .action() to mimic the behavior you want:

    var Nightmare = require('nightmare');
    
    Nightmare.action('deferredWait', function(done) {
      var attempt = 0;
      var self = this;
    
      function doEval() {
        self.evaluate_now(function(selector) {
          return (document.querySelector(selector) !== null);
        }, function(result) {
          if (result) {
            done(null, true);
          } else {
            attempt++;
            if (attempt < 10) {
              setTimeout(doEval, 2000); //This seems iffy.
            } else {
              done(null, false);
            }
          }
        }, '#elem');
      };
      doEval();
      return this;
    });
    
    var nightmare = Nightmare();
    nightmare.goto('http://example.com')
      .deferredWait()
      .then(function(result) {
        console.log(result);
      });