Search code examples
phpmysqllinuxcurlcurl-multi

Better support for CURL with PHP and Linux


I'm the developer of twittertrend.net, I was wondering if there was a faster way to get headers of a URL, besides doing curl_multi? I process over 250 URLs a minute, and I need a really fast way to do this from a PHP standpoint. Either a bash script could be used and then output the headers or C appliation, anything that could be faster? I have primarily only programmed in PHP, but I can learn. Currently, CURL_MULTI (with 6 URLs provided at once, does an ok job, but I would prefer something faster? Ultimately I would like to stick with PHP for any MySQL storing and processing.

Thanks, James Hartig


Solution

  • I think you need a multi-process batch URL fetching daemon. PHP does not support multithreading, but there's nothing stopping you from spawning multiple PHP daemon processes.

    Having said that, PHP's lack of a proper garbage collector means that long-running processes can leak memory.

    Run a daemon which spawns lots of instances (a configurable, but controlled number) of the php program, which will of course have to be capable of reading a work queue, fetching the URLs and writing the results away in a manner which is multi-process safe; multiple procs shouldn't end up trying to do the same work.

    You'll want all of this to run autonomously as a daemon rather than from a web server. Really.