Search code examples
phpweb-crawlerphp-extensionpcntl

php extension "PCNTL" is suitable for a web spider?


Recently I try to write a web spider, so I find some projects about web spider written in PHP.

In those projects, I found the extension "PCNTL" is used frequently, but I can't find any detail tutorials or manuals about it.

So I want to know if extension "PCNTL" is really suitable for a web spider? If not, what are the alternatives.


Solution

  • "PCNTL" is extensions with C-like process related functions, most notably fork.

    I am not sure if there are good tutorials, but you can check C / C++ examples in order to understand how to use those PHP functions.

    Several years ago we did web crawler. Instead of fork, we have used a shell script that started 100 instances of the crawler in parallel.

    Another alternative is curl-multi, but once again there is no enough information and tutorials for it. We tried it and we do not found it very reliable, but I believe you should check it.

    Another alternative is to do it in Python - there are several different program libraries that gives a lot of possibilities.