Search code examples
javascriptscreen-scrapingcasperjs

Can't get links with CasperJS


I try to run this code and gets "undefined". Does anyone have an idea what is wrong with this code?

var casper = require('casper').create();

casper.start('http://casperjs.org/', function() {
    this.echo(document.querySelector('a'));
});

casper.run();

Solution

  • CasperJS is built on top of PhantomJS which has two contexts. The inner page context casper.evaluate() is sandboxed and is the only one that has access to the DOM.

    DOM nodes cannot be passed to the outside context, so you need to return some representation of the element that you can work with:

    this.echo(this.evaluate(function(){
        return document.querySelector('a').href;
    }));
    

    I suggest that you look into CasperJS functions that are abstractions from this like getElementInfo() and getElementAttribute().

    The PhantomJS documentation says:

    Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.

    Closures, functions, DOM nodes, etc. will not work!