I have a application that runs periodically (it's a scheduled task). The task is launched once a minute, and normally only takes a few seconds to do its business, then exits.
But there's a ~1 in 80,000 chance (every two or three months) that the application will hang. The root cause is because we're using Microsoft ServerXmlHttpRequest component to perform some work, and sometimes it just decides to hang. The virtue of ServerXmlHttpRequest over XmlHttpRequest is that the latter is not recommended for important scenarios, such as where reliability and security are important (which is true of an unattended server component):
The
ServerXMLHTTP
object offers functionality similar to that of theXMLHTTP
object. UnlikeXMLHTTP
, however, theServerXMLHTTP
object does not rely on the WinInet control for HTTP access to remote XML documents.ServerXMLHTTP
uses a new HTTP client stack. Designed for server applications, this server-safe subset of WinInet offers the following advantages:
- Reliability — The HTTP client stack offers longer uptimes. WinInet features that are not critical for server applications, such as URL caching, auto-discovery of proxy servers, HTTP/1.1 chunking, offline support, and support for Gopher and FTP protocols are not included in the new HTTP subset.
- Security — The HTTP client stack does not allow a user-specific state to be shared with another user's session. ServerXMLHTTP provides support for client certificates.
The job is being run as a scheduled task. I need the task to continue to run periodically; killing the existing process if it's dead.
The Windows Task Scheduler does have an option for forcibly close a task that is running too long:
The only downside to that approach is that it simply doesn't work - it simply does not stop the task. The hung process keeps running.
Given that i cannot trust the Microsoft ServerXmlHttpRequest to not arbitrarily lock up, and the task scheduler is unable to terminate the scheduled task, i need some way to do it myself.
I tried looking into using the Job Objects API:
A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. A job can enforce limits such as working set size, process priority, and end-of-job time limit on each process that is associated with the job.
That one note sounded like exactly what i needed:
A job can enforce limits such as end-of-job time limit on each process that is associated with the job.
The only down-side to that approach is that it does not work. Job cannot impose a time-limit on a process. They can only impose a user time limit on a process:
PerProcessUserTimeLimit
If LimitFlags specifies JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process user-mode execution time limit, in 100-nanosecond ticks.
If the process is idle (for example, sitting at a MsgWaitForSingleObject as ServerXmlHttpRequest is), then it will accumulate no user time. I tested it. I created a job with a 1 second time limit, and placed my self process into it. As long as i don't move the mouse around my test application, it quite happily sits there for longer than one second.
The only other technique i can imagine, given that my main thread is indefinitely blocked, is another thread. The only solution i can imagine is spawn another thread that will sleep for my three minutes, then ExitProcess:
Int32 watchdogTimeoutSeconds = FindCmdLineSwitch("watchdog", 0);
if (watchdogTimeoutSeconds > 0)
Thread thread = new Thread(KillMeCallback, new IntPtr(watchdogTimeoutSeconds));
void KillMeCallback(IntPtr data)
{
Int32 secondsUntilProcessIsExited = data.ToInt32();
if (secondsUntilProcessIsExited <= 0)
return;
Sleep(secondsUntilProcessIsExited*1000); //seconds --> milliseconds
LogToEventLog(ExtractFilename(Application.ExeName),
"Watchdog fired after "+secondsUntilProcessIsExited.ToString()+" seconds. Process will be forcibly exited.", EVENTLOG_WARNING_TYPE, 999);
ExitProcess(999);
}
And that works. The only downside is that it's a bad idea.
Can anyone think of anything better?
Edit
For now i will implement a
Contoso.exe /watchdog 180
So the process will be exited after 180 seconds. It means the duration is configurable, or can be removed completely easily in the field.
I used the route where i pass a special WatchDog argument to my process on the command line;
>Contoso.exe /watchdog 180
During initialization i check for the presence of the WatchDog
option, with an integer number of seconds after it:
String s = Toolkit.FindCmdLineOption("watchdog", ["/", "-"]);
if (s <> "")
{
Int32 seconds = StrToIntDef(s, 0);
if (seconds > 0)
RunInThread(WatchdogThreadProc, Pointer(seconds));
}
and my thread procedure:
void WatchdogProc(Pointer Data);
{
Int32 secondsUntilProcessIsExited = Int32(Data);
if (secondsUntilProcessIsExited <= 0)
return;
Sleep(secondsUntilProcessIsExited*1000); //seconds -> milliseconds
LogToEventLog(ExtractFileName(ParamStr(0)),
Format("Watchdog fired after %d seconds. Process will be forcibly exited.", secondsUntilProcessIsExited),
EVENTLOG_WARNING_TYPE, 999);
ExitProcess(2);
}