I threw together a small Lambda function together to crawl a website using the SpookyJS, CasperJS, and PhantomJS toolchain for headless browsing. The task is quite simple, and at some point a few months ago it was working in Lambda. I recently had to change a few things around and wanted to work on the project again, but started fresh and had trouble getting Lambda to run without erroring in any capacity. My question is how can I run phantomjs in Lambda?
The example code I am running is:
spooky.start('http://en.wikipedia.org/wiki/Spooky_the_Tuff_Little_Ghost');
spooky.then(function () {
this.emit('hello', 'Hello, from ' + this.evaluate(function () {
return document.title;
}));
});
spooky.run();
The error I am getting in Lambda is:
{ [Error: Child terminated with non-zero exit code 1] details: { code: 1, signal: null } }
I have followed a variety of procedures to ensure everything is able to run on Lambda. Below is a long list of things I've attempted to diagnose:
node index.js
and confirm it is workingnpm install
on the ec2 instance, and again run node index.js
to ensure the correct outputMy package.json is:
{
"name": "lambda-spooky-test",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"dependencies": {
"casperjs": "^1.1.3",
"phantomjs-prebuilt": "^2.1.10",
"spooky": "^0.2.5"
}
}
I have also attempted the following (most also working locally, and on the AWS EC2 instance, but with the same error on Lambda:
process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'] + ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/.bin';
console.log( 'PATH: ' + process.env.PATH );
Inspecting spawn calls by wrapping child_process's .spawn()
call, and got the following:
{ '0': 'casperjs',
'1':
[ '/var/task/node_modules/spooky/lib/bootstrap.js',
'--transport=http',
'--command=casperjs',
'--port=8081',
'--spooky_lib=/var/task/node_modules/spooky/lib/../',
'--spawnOptions=[object Object]' ],
'2': {} }
Calling .exec('casperjs')
and .exec('phantomjs --version')
directly, confirming it works locally and on EC2, but gets the following error in Lambda. The command:
`require('child_process').exec('casperjs', (error, stdout, stderr) => {
if (error) { console.error('error: ' + error); }
console.log('out: ' + stdout);
console.log('err: ' + stderr);
});
both with the following result:
err: Error: Command failed: /bin/sh -c casperjs
module.js:327
throw err;
^
Error: Cannot find module '/var/task/node_modules/lib/phantomjs'
at Function.Module._resolveFilename (module.js:325:15)
at Function.Module._load (module.js:276:25)
at Module.require (module.js:353:17)
at require (internal/module.js:12:17)
at Object.<anonymous> (/var/task/node_modules/.bin/phantomjs:16:15)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
2016-08-07T15:36:37.349Z b9a1b509-5cb4-11e6-ae82-256a0a2817b9 sout:
2016-08-07T15:36:37.349Z b9a1b509-5cb4-11e6-ae82-256a0a2817b9 serr: module.js:327
throw err;
^
Error: Cannot find module '/var/task/node_modules/lib/phantomjs'
at Function.Module._resolveFilename (module.js:325:15)
at Function.Module._load (module.js:276:25)
at Module.require (module.js:353:17)
at require (internal/module.js:12:17)
at Object.<anonymous> (/var/task/node_modules/.bin/phantomjs:16:15)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
I found the issue to be that including the node_modules/.bin
in the path works on both local and ec2 machines because those files simply point to the action /bin
folders in each respective library. This breaks if calls within those files use relative paths. The issue:
[ec2-user@ip-172-31-32-87 .bin]$ ls -lrt
total 0
lrwxrwxrwx 1 ec2-user ec2-user 35 Aug 7 00:52 phantomjs -> ../phantomjs-prebuilt/bin/phantomjs
lrwxrwxrwx 1 ec2-user ec2-user 24 Aug 7 00:52 casperjs -> ../casperjs/bin/casperjs
I worked around this by adding each library's respective bin to the lambda path in the Lambda handler function:
process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT']
+ ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/phantomjs-prebuilt/bin'
+ ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/casperjs/bin';
And this will now run phantom, casper, and spooky correctly in Lambda.