Search code examples
javascriptweb-scrapingphantomjs

PhantomJS hanging when called from CLI or Web


I'm trying to use phantomJS to capture a screenshot of a URL, however when I call phantomJS (from either the command line or web app) it hangs and seesm to never execute the "exit()" call. I can't seem to find any error messages and it remains running until I kill it. This is the JS file that is passed to the phantomjs command:

var page = require('webpage').create();
var system = require('system');
var script_address = '';
var page_to_load = '';
var members_id = '';
var activities_id = '';
var folder_path = '';

if (system.args.length < 5) 
{
    console.log('Usage: phantom_activity_fax.js script_address page_to_load members_id activities_id folder_path');
    console.log('#Args: '+system.args.length);
    phantom.exit();
}//END IF SYSTEM.ARGS.LENGTH === 1

//ASSIGN OUR ARGUMENTS RECIEVED
script_address = system.args[0];
page_to_load = system.args[1];
members_id = system.args[2];
activities_id = system.args[3];
folder_path = system.args[4];

console.log(system.args[0]);
console.log(system.args[1]);
console.log(system.args[2]);
console.log(system.args[3]);
console.log(system.args[4]);

//OPEN OUR PAGE WITH THE VALUES PROVIDED
page.open(page_to_load, function () {
    console.log("Entering Anonymous Function, Beginning RENDER:\n");
    page.render(folder_path+members_id+'_'+activities_id+'.png');
    phantom.exit();
});

I see the values pushed to the console, but after that it just hangs :( I've tried the web inspector, but could not understand where to execute the __run() call, and didn't see any change when I added the debugger-autorun=yes to the call :(.

This is the output I get from the command line when it hangs (as root user):

[root@wv-wellvibe2 faxes]# phantomjs /var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php 397 0 /var/www/wv-wellvibe2-test/uploads/images/faxes/
/var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js
https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php
397
0
/var/www/wv-wellvibe2-test/uploads/images/faxes/

And this is the output I get when running it as my own user, but I don't see the image file in the designated folder (faxes):

[user@wv-wellvibe2 ~]$ phantomjs /var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php 397 0 /var/www/wv-wellvibe2-test/uploads/images/faxes/
/var/www/wv-wellvibe2-test/javascripts/phantom_activity_fax.js
https://wv-wellvibe2-test/manual_scripts/phantom_js_test_page.php
397
0
/var/www/wv-wellvibe2-test/uploads/images/faxes/
Entering Anonymous Function, Beginning RENDER:
[user@wv-wellvibe2 ~]$ 

Unfortunately, as I said, the command completed but did not save a .png in the faxes folder. Here is the permissions for that folder:

[root@wv-wellvibe2 faxes]# ls -la
total 12
drwxr-xr-x 3 root   apache 4096 May 16 15:31 .
drwxr-xr-x 5 apache apache 4096 May 16 14:14 ..
drwxr-xr-x 6 apache apache 4096 May 20 15:05 .svn

Please let me know if there is anything else I can provide! Thank you!

(As requested here is the PHP script that calls the Phantom JS process)

header("Date: " . date('Y-m-d H:i:s'));
//GET THE SMARTY CONFIG
include_once $_SERVER['DOCUMENT_ROOT'] . "/smarty/configs/config.php";

//VARS USED LATER
$process_script = $_SERVER['DOCUMENT_ROOT'] . '/javascripts/phantom_activity_fax.js';
$page_to_load = 'https://' . $_SERVER['HTTP_HOST'] . '/manual_scripts/phantom_js_test_page.php';
$members_id = $_SESSION['members_id'];
$activities_id = 0;
$folder_path = $_SERVER['DOCUMENT_ROOT'] . 'uploads/images/faxes/';
$system_response = '';


$call = "phantomjs --remote-debugger-port=65534 --remote-debugger-autorun=yes " .  $process_script . " " . $page_to_load . " " . $members_id . " " . $activities_id . " " . $folder_path;

echo 'CallingSystemWith: ' . $call . '<br />';

try 
{
    $system_response = system($call);

    echo '<br />SystemResponse: ' . $system_response . '<hr />';
} catch (Exception $exc) {
    echo $exc->getTraceAsString();
}//END TRY / CATCH

(The page it tells PhantomJS to "scrape" is a simple PHP script that outtputs a print_r() of $_SESSION and $_REQUEST)


Solution

  • If something goes wrong in your script (such as in page.render), phantom.exit() will never be called. That's why phantomJs seems to hang.

    Maybe there is an issue in page.render but I don't think so. The most common causes of hangs are unhandled exception.

    I will suggest you 4 things to investigate the issue :

    • add an handler to phantom.onError and/or to page.onError
    • encapsulate your code in try/catch blocks (such as for page.render)
    • Once the page is loaded, there is no test on callback status. It's better to check the status
    • seems to freeze when calling page.render. Have you tried a simpler filename in the current directory ? Maybe the freeze is because of the security or invalid filename (invalid characters ?)

    Hope this will help you