So the original problem is that we run an "industry standard" java based web app application, on WebSphere App Servers with around 100 million visits per year. The issue is after a restart of these appservers, we need to hit a few of the key pages so that the main servlets get compiled before we let the public onto them ... otherwise they tend to crash in the initial crush.
On some clusters, its about 6 pages that need to be hit, once for each of 35+ markets.... 200 ish url's!
So the script I am working on has all the hard work done of how to put together all these URL's and at the end of it all is a list of 200 url's in an array... now how to hit them?
We were using CGI for this earlier and it's main problem was that is was synchronous... taking a loooooong time. Now I am trying to make a simple url.php which will hit one single URL which I can then call from JQuery in an asynchronous way. I don't want to hit all 200 at first of course, probably in batchs of 5 should mean a 500% speed increase :)
So onto the url.php . I haven't use php much in the past so sockets is a bit new to me. What I have cobbled together so far is this:
function checkUrl($url,$port) {
set_time_limit(20);
ob_start();
header("Content-Type: text/plain");
$u = $url;
$p = $port;
$post = "HEAD / HTTP/1.1\r\n";
$post .= "Host: $u\r\n";
$post .= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2\r\n";
$post .= "Keep-Alive: 200\r\n";
$post .= "Connection: keep-alive\r\n\r\n";
$sock = fsockopen($u, $p, $errno, $errstr, 10);
if (!$sock) {
echo "$errstr ($errno)<br />\n";
} else {
fwrite($sock, $post, strlen($post));
while (!feof($sock)){
echo fgets($sock);
}
ob_end_flush();
}
}
Which works great if the url is simply someserver.somedomain.com but if the is a Uri tapped on the end it fails (e.g. someserver.somedomain.com/gb/en)
As I understand it, all I have done with the code so far is open the socket connection ... but how can I get it to parse the path separately?
The only output I need from this in the end is the HTTP Status code (200, 404, 301 etc) though it is important that it does fetch the complete page first in order for it to be compiled properly.
Maybe I'm missing something but do you have the curl extension available? No need to get jQuery in the mix, you can run asynchronous queries straight from PHP with ease. You'll also be able to control batch size easily, and put in delays and what-not per your needs. Also I'm not sure why you would need to use a raw socket to hit the JSP pages, hopefully this makes your life easier!
Here's a quick test script I have, based on code from php.net I'm sure:
<?php
// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();
// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://news.php.net/php.general/255000");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://news.php.net/php.general/255001");
curl_setopt($ch2, CURLOPT_HEADER, 0);
//create the multiple cURL handle
$mh = curl_multi_init();
//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>