how can I use long polling to automatically refresh a webpage

I am trying to figure out how to use long polling to trigger a webpage refresh (the entire page as opposed to just a single section). Although it would be nicer to just update part of the page instead of a single section, I would rather just get down the initial page refresh part and then move on from there. Having said that, I was wondering if anyone would be able to point me in the right direction as to how I can go about doing this? I have been searching for examples of long polling online, but unfortunately have not been able to find anything similar to this yet. Pretty much I would have a webpage which I could remotely refresh using long polling based on some condition on the server (apache on debian), so for instance if I had a bash script based cgi page that showed am or pm based on the server time, when the time on the server changes from am to pm or vice versa, the server would trigger a page refresh on the client side so the cgi page would reload and display the correct data.

Solution

Well first of all. if you do long polling requests you need to keep in mind, that there will be an open connection to your server for each page that is viewed in the browsers. That requires that your server infrastructure is able to handle this without huge memory consumption and wont run out of free connections to handle the long polling request.

i don't assume you use php but it is an good example: so if you have apache with php module, there is on the one hand a limit of maximum connection by configuration of apache and on the other hand for each connection the whole php module is loaded which uses much memory if you have many page views. if you use php-fpm as fcgi, there is also a maximum number of available clients, and you also don't want to increase this number over a certain limit.

so generally i would suggest not to use long polling request for public websites, if you don't have a good server backend that has some nice logic for handling this.

depending on the requirements you could think of the following solution, if you know in which intervals that page should check for refresh:

you could add a attribute data-check-for-refresh-at and data-modified-at to your html node:

<html data-check-for-refresh-at='2013-02-04 12:00:00 GMT' data-modified-at='2013-01-01 12:00:00 GMT'>

parse this with javascript and then do a refresh check at this time submitting the modified-at time with that request. if the content changed you will submit the new content, and the next time when the client should check for updates.

another thing that is important that you should add a random offset to this refresh time by the client, otherwise you probably DDOS yourself. because all clients would send a refresh request at the same time.

EDIT (Based on comments)

First a short explanation how it should be done for real system:

The server should not use one threads or processes per connection, instead it should use the event driven approach (registering callbacks to be informed if streams are ready to read or write). then if a long polling request arrives the server stores the information about which changes the client wants to be informed. then the connection is sleeping there are no cpu circles wasted for that connection anymore until client needs to be informed, also the memory usage is quite low. then if a url changed the server will be informed that is should notify all clients that listen to changes of this url. The server then will submit the responses to clients (a publication subscription system). depending on the number of clients to be notified the notifications should probably be queued and handled in an intelligent way, so that you would have a better balancing of the outgoing traffic. With this approach you will more likely run into the maximum allowed openports/filedescriptor problem then having problems with cpu or memory usage.

Of course this is a very simplistic description, but I think it is sufficient to get ene idea how it would be implemented.

Quick&Dirty Solution It is more pseudo code then real code, so this would not work with copy and past, also it is assumed that the server creates the files for $notificationFile before any long polling request arrives):

The long polling request will call a php script like this:

set_time_limit(0);

/*
$urlToCheck and  $modificationTimeToCheckAgainst should be initialized by the values send by client as parameter for the long polling request
$someTime should be the maximum time the long polling request should be keept alive
*/
$forceResponseTimeout =  microtime(true) + $someTime;
$urlToCheck = "the/url/to/observe.html"; 
$modificationTimeToCheckAgainst = "2013-02-05 00:00:00"; //should be the time in seconds (not a real date)

$notificationFile = "./tmp/observer-file-".sha1($urlToCheck);

$responseStatus = "did-not-change";
while( microtime(true) < $forceResponseTimeout ) {
   clearstatcache(); //need to clear cache otherwise we don't have the right modification date (also not the beast idea to keep cpu usage low)
   if( filemtime(".update-check-file-".sha1($pathToCheck)) > $modificationTimeToCheckAgainst ) {
        $responseStatus = "changed";
        break;
   }
   usleep(100); //this is a bad idea because it creates a high cpu usage, even with the sleep
}

echo $responseStatus;  //here some json response should be created, the client then gets the information if it should resend the long polling request or if it should do a refresh.

The update script should look like this:

 $urlThatIsUpdated = "the/url/to/observe.html";

 //doing the update of the file


 $notificationFile = "./tmp/observer-file-".sha1($urlThatIsUpdated);

 touch($notificationFile); //updates the modification time of the notification file, which should be recognized by the script above.