Search code examples
javascriptasynchronousphantomjsnpn

PhantomJS - WaitFor Method will not execute function. Programm is stuck


I got some trouble using the waitFor-Method within PhantomJS.

This is what I want to do:

  • Load multiple webpages by generated urls
  • Use jQuery to parse some links from that pages
  • Store each parsed link in the same array (In this example I'll just log them)

I'm Using the waitFor() -Method, so I can wait until a page has been evaluated. As I understand, this method will make the programm prevent to continue, until the function which I pass as a parameter has returned anything.

My Problem: Actually the programm will not continue to run after it executes the waitFor-Method. It's just stuck. There is no error whatsoever. The function I passed as a parameter will not be executed...at least there is no logging in the console.

When I remove the waitFor-Methot it will execute the code properly, however I cannot execute the handleSeriesPageListPage()-Method multiple times. I'm really not too much into js and callbacks or asynchronous method handling. I guess I did some heavy mistakes and some javascript expert will be able to help me quickly :).

"use strict";
var page = require('webpage').create();
page.onConsoleMessage = function (msg) {
    console.log(msg);
};
var seriesPageBaseUrl = "https://www.example.com?pageid=";
var simpleBaseUrl = "https://www.example.com/";
var seriesPageIds = [0xx, 1xx];
var allSeriesUrls = [];


function handleSeriesPageListPage(url) {
    console.log("Open url: " + url);
    page.open(url, function (status) {
        console.log("status: " + status);
        if (status === "success") {
            waitFor(
                function () {
                    return page.includeJs("https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function () {
                        console.log("Included JS");
                        return page.evaluate(function () {
                            console.log("evaluate result...");
                            $('.list_item').each(function () {
                                var seriesLink = jQuery(this).find("a").first().attr("href");
                                var seriesUrl = simpleBaseUrl + seriesLink;
                                console.log(seriesUrl);
                                return true;
                            });
                        });
                    });
                }
            );
        } else {
            phantom.exit(1);
        }
    });
}

function nextSeriesListPage() {
    var seriesPageId = seriesPageIds.shift();
    if (typeof seriesPageId === "undefined") {
        console.log(allSeriesUrls);
        phantom.exit(0);
    }
    var targetURL = seriesPageBaseUrl + seriesPageId;
    handleSeriesPageListPage(targetURL);
}

nextSeriesListPage();

Solution

  • The waitFor() function you employed is not a suitable method for handling asynchronuous tasks, and you actually misunderstood what it does:

    waitFor(testFx, onReady, timeOutMillis)
    

    takes three parameters (the third one is optional). The first parameter is a test function. It is executed repeatedly, but each time synchronuously, until its return value is true. Then, the function given as the second parameter is executed. If during the period given with the third parameter (or 3 seconds as a default) no true value is returned, the function quits with a log message of 'waitFor()' timeout.

    You only provided a single paramter; a function that finishes without a return value (page.includeJs(), basically). Accordingly, waitFor() should quit after 3 seconds with the timeout message.

    What you really wanted to achieve was

    • wait for the page being loaded
    • then inject the jQuery script, and wait for that,
    • then reevaluate, and wait for that,
    • then extract information

    These are four asynchronuous tasks. The basic approach prescribed by PhantomJS is to proceed to the respective next step inside the previous callback function, resulting in four nested callbacks.

    As this is not a nice pattern (It is commonmly called callback hell), the Promise pattern has been introduced as a Javascript feature (or included in several libraries).

    To find out how to re-formulate callback APIs as Promises, take a look at How do I convert an existing callback API to promises?