[Previous] [Next]

Watchdog Timers

Some devices won't notify you when something goes wrong—they simply don't respond when you talk to them. Each device object has an associated IO_TIMER object that you can use to avoid indefinitely waiting for an operation to finish. While the timer is running, the I/O Manager will call a timer callback routine once a second. Within the timer callback routine, you can take steps to terminate any outstanding operations that should have finished but didn't.

You initialize the timer object at AddDevice time:

NTSTATUS AddDevice(...)
  {
  ...
  IoInitializeTimer(fdo, (PIO_TIMER_ROUTINE) OnTimer, pdx);
  ...
  }

where fdo is the address of your device object, OnTimer is the timer callback routine, and pdx is a context argument for the I/O Manager's calls to OnTimer.

You start the timer counting by calling IoStartTimer, and you stop it from counting by calling IoStopTimer. In between, your OnTimer routine is called once a second.

The PIOFAKE sample on the companion disc illustrates one way of using the IO_TIMER as a watchdog. I put a timer member into the device extension for this fake device:

typedef struct _DEVICE_EXTENSION {
  ...
  LONG timer;
  ...
  } DEVICE_EXTENSION, *PDEVICE_EXTENSION;

When I process an IRP_MJ_CREATE after a period with no handles open to the device, I start the timer counting. When I process the IRP_MJ_CLOSE that closes the last handle, I stop the timer:

NTSTATUS DispatchCreate(...)
  {
  ...
  if (InterlockedIncrement(&pdx->handles == 1)
    {
    pdx->timer = -1;
    IoStartTimer(fdo);
    }
  ...
  }

NTSTATUS DispatchClose(...)
  {
  ...
  if (InterlockedDecrement(&pdx->handles) == 0)
    IoStopTimer(fdo);
  ...
  }

The timer cell begins life with the value -1. I set it to 10 (meaning 10 seconds) in the StartIo routine and again after each interrupt. Thus, I allow 10 seconds for the device to digest an output byte and to generate an interrupt that indicates readiness for the next byte. (See the sidebar "More About PIOFAKE" for an explanation of the way this nonexistent device works.) The work to be done by the OnTimer routine at each 1-second tick of the timer needs to be synchronized with the interrupt service routine (ISR). Consequently, I use KeSynchronizeExecution to call a helper routine (CheckTimer) at device IRQL (DIRQL) under protection of the interrupt spin lock. The timer-tick routines dovetail with the ISR and DPC routines as shown in this excerpt:









1 

2 


3 







4 









5 






6 
VOID OnTimer(PDEVICE_OBJECT fdo, PDEVICE_EXTENSION pdx)
  {
  KeSynchronizeExecution(pdx->InterruptObject,
    (PKSYNCHRONIZE_ROUTINE) CheckTimer, pdx);
  }

VOID CheckTimer(PDEVICE_EXTENSION pdx)
  {
  if (pdx->timer <= 0 || --pdx->timer > 0)
    return;
  PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite);
  if (!Irp)
    return;
  Irp->IoStatus.Status = STATUS_IO_TIMEOUT;
  Irp->IoStatus.Information = 0;
  IoRequestDpc(pdx->DeviceObject, Irp, NULL);
  }

BOOLEAN OnInterrupt(...)
  {
  ...
  if (pdx->timer <= 0)
    return TRUE;
  if (!pdx->nbytes)
    {
    Irp->IoStatus.Status = STATUS_SUCCESS;
    Irp->IoStatus.Information = pdx->numxfer;
    pdx->timer = -1;
    IoRequestDpc(pdx->DeviceObject, Irp, NULL);
    }
  ...
  pdx->timer = 10;
  }

VOID DpcForIsr(...)
  {
  ...
  PIRP Irp = StartNextPacket(&pdx->dqReadWrite, fdo);
  IoCompleteRequest(Irp, IO_NO_INCREMENT);
  ...
  }

  1. A timer value of -1 means that no request is currently pending. A value of 0 means that the current request has timed out. In either case, we don't want or need to do any more work in this routine. The second part of the if expression decrements the timer. If it hasn't counted down to 0 yet, we return without doing anything else.
  2. This driver uses a DEVQUEUE, so we call the DEVQUEUE routine GetCurrentIrp to get the address of the request we're currently processing. If this value is NULL, the device is currently idle.
  3. At this point, we've decided we want to terminate the current request because nothing has happened for 10 seconds. We request a DPC after filling in the IRP status fields. This particular status code (STATUS_IO_TIMEOUT) turns into a Win32 error code (ERROR_SEM_TIMEOUT) for which the standard error text ("The semaphore timeout period has expired") doesn't really indicate what's gone wrong. If the application that has requested this operation is under your control, you should provide a more meaningful explanation.
  4. If the timer equals 0, the current request has timed out. The CheckTimer routine requested a DPC, so we don't need or want to do any more work in the ISR besides dismissing the interrupt. By setting timer to -1, we prevent the next invocation of CheckTimer from requesting another DPC for this same request.
  5. We allow 10 seconds between interrupts.
  6. Whatever requested this DPC also filled in the IRP's status fields. We therefore need to call only IoCompleteRequest.