Search code examples
webserverphantomjscasperjs

PhantomJS not killing webserver client connections


I have a kind of proxy server running on a WebServer module and I noticed that this server is being killed due to its memory consumption.

Every time the server gets a new request it creates a child client process, the problem I see is that the process remains alive indefinitely.

Here is the server I'm using:

server.js

I thought response.close() was closing and killing client connections, but it is not.

Here is the list of child processes displayed on htop:

Processes enter image description here

(Those process are even more, it is just a fragment of the list)

I really need to kill those processes because they are using all the free memory. Am I missing something?

I could simply restart the server, but the memory will still be wasted.

Thanks you !

EDIT:

The processes I mentioned before are threads and no independient processes as I thought (check this).

Every http request creates a new thread, and that's ok, but this thread is not being killed after the script ends.

Also, I found out that no new threads are created if the request handler doesn't run casper (I mean casper.run(..)).

So, new threads are created only if the server runs a casper instance, the problem is that this instance doesn't end after run function does.

I tried casper.done() as mentioned below, but it kill the whole process instead of the current running thread. (I did not find any doc for this function).

When I execute other casper scripts, outside the server in the same machine, the instanced threads and the whole phantom process ends successfully. What would be happening?

I am using Phantom 2.1.1 and Casper 1.1.1 versions.

Please ask me anything if you want more or specific information.

Thanks again for reading !


Solution

  • This is a well known issue with casper:

    https://github.com/casperjs/casperjs/issues/1355

    It has not been fixed by the casper guys and is currently marked as an enhancement. I guess it's not on their priority list.

    Anyways, the workaround is to write a server side component e.g. a node.js server to handle the incoming requests and for every request run a casper script to do the scraping in a new child process. This child process will be closed when casper terminates it's job. While this is a workaround, it is not an optimal solution as the cost of opening a child process for every request is not cheap. it will be hard to heavily scale an approach similar to this. However, it is a sufficient workaround. More on this fully sensible approach is in the link above.