[Previous] [Next]

Managing PnP State Transitions

As I said at the outset of this chapter, WDM drivers need to track their devices through the state transitions diagrammed in Figure 6-1. This state tracking also ties in with how you queue and cancel I/O requests. Cancellation in turn implicates the global cancel spin lock, which is a performance bottleneck in a multi-CPU system. The standard model of IRP processing can't solve all these interrelated problems. In this section, therefore, I'll present a new type of object—called a DEVQUEUE—that you can use in your PnP request handlers and in place of the standard model routines StartPacket and StartNextPacket. DEVQUEUE is my own invention, but it's based on sample drivers, especially PNPPOWER and CANCEL, that used to be in the DDK. See also the discussion of IRP cancellation in Ervin Peretz's "The Windows Driver Model Simplifies Management of Device Driver I/O Requests," (Microsoft Systems Journal, January 1999). A portion of the IRP cancellation logic I'm describing also derives from work by Peretz and other Microsoft employees and by Jamie Hanrahan that had not been published at the time I was writing this book.

I described the KDEVICE_QUEUE queue object in the previous chapter as encompassing an idle state, a busy but empty state, and a busy but not empty state. The support routines you use to manipulate a KDEVICE_QUEUE assume that if the device is not currently busy, all you want to do is start any new request running on the device. It's precisely this behavior that we need to overcome to successfully manage PnP states. Figure 6-4 illustrates the states of a DEVQUEUE.

Click to view at full size.

Figure 6-4. States of a DEVQUEUE object.

In the READY state, the queue operates much like a KDEVICE_QUEUE, accepting and forwarding requests to your StartIo routine in such a way that the device stays busy. In the STALLED state, however, the queue does not forward IRPs to StartIo even when the device is idle. In the REJECTING state, the queue doesn't even accept new IRPs. Figure 6-5 illustrates the flow of IRPs through the queue.

Click to view at full size.

Figure 6-5. Flow of IRPs through a DEVQUEUE.

Using DEVQUEUE for IRP Queuing and Cancellation

You define a DEVQUEUE object for each queue of requests you'll manage in the driver. For example, if your device manages reads and writes in a single queue, you'd define one DEVQUEUE:

typedef struct _DEVICE_EXTENSION {
  ...
  DEVQUEUE dqReadWrite; 
  ...
  } DEVICE_EXTENSION, *PDEVICE_EXTENSION;

Table 6-3 lists the support functions you can use with a DEVQUEUE.

Table 6-3. DEVQUEUE service routines.

Support Function Description
AbortRequests Aborts current and future requests
AllowRequests Undoes effect of previous AbortRequests
AreRequestsBeingAborted Are we currently aborting new requests?
CancelRequest Generic cancel routine
CheckBusyAndStall Checks for idle device and stalls requests in one atomic operation
CleanupRequests Cancels all requests for a given file object in order to service IRP_MJ_CLEANUP
GetCurrentIrp Determines which IRP is currently being processed by associated StartIo routine
InitializeQueue Initializes DEVQUEUE object
RestartRequests Restarts a stalled queue
StallRequests Stalls the queue
StartNextPacket Dequeues and starts the next request
StartPacket Starts or queues a new request
WaitForCurrentIrp Waits for current IRP to finish

For the moment, I'll just discuss the support functions that replace functions like StartPacket and StartNextPacket in the standard IRP processing model. For each queue, you provide a separate StartIo routine. Your DriverEntry routine would not store anything in the DriverStartIo pointer field of the driver object. Instead, during AddDevice, you'd initialize your queue object(s) like so:

NTSTATUS AddDevice(...)
  {
  ...
  PDEVICE_EXTENSION pdx = ...;
  InitializeQueue(&pdx->dqReadWrite, StartIo);
  ...
  }

The dispatch function for an IRP that uses a DEVQUEUE would follow the following pattern:

NTSTATUS DispatchWrite(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  <some power management stuff you haven't heard about yet>
  IoMarkIrpPending(Irp);
  StartPacket(&pdx->dqReadWrite, fdo, Irp, OnCancel);
  return STATUS_PENDING;
  }

That is, instead of calling IoStartPacket, you call the queue's StartPacket function with the address of the queue object, the device object, the IRP, and your cancel routine. At the start of a dispatch routine, you'll also have a small bit of code to handle restoring power after a period of disuse; I'll discuss that code in Chapter 8.

Here's a sketch of the new kind of StartIo routine you use with a DEVQUEUE:

VOID StartIo(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  <some PnP stuff you haven't heard about yet>
  // start request on device
  }

StartIo doesn't worry about IRP cancellation. The cancel routine you use in this scheme is different from a standard one—it simply delegates all work to the DEVQUEUE:

VOID OnCancel(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  CancelRequest(&pdx->dqReadWrite, Irp);
  }

CancelRequest will release the global cancel spin lock, which your cancel routine owns when it gets control, and will then cancel the IRP in a thread-safe and multiprocessor-safe way.

The deferred procedure call (DPC) routine you use when the request finishes also looks a little different from the standard-model one I showed you in Chapter 5, as you can see here:

VOID DpcForIsr(PKDPC Dpc, PDEVICE_OBJECT device, PIRP junk, PVOID context)
  {
  PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite);
  ...
  StartNextPacket(&pdx->dqReadWrite, device);
  <some PnP stuff you haven't heard about yet>
  CompleteRequest(Irp, ...);
  }

Like IoStartNextPacket, the StartNextPacket function removes the next IRP from the queue and sends it to your (queue-specific) StartIo routine. It also returns the address of the IRP you were processing or NULL to indicate that your device was not processing an IRP. A NULL return value indicates that the IRP was cancelled or aborted for some reason, so it would be incorrect for you to try to complete it. Since you'll obtain the address of the finishing IRP by calling GetCurrentIrp, don't use the IRP pointer that comes to you as the third argument to the DPC routine. I named it junk to reinforce the point.

The DEVQUEUE also simplifies the handling of an IRP_MJ_CLEANUP. In fact, the code is almost trivial:

NTSTATUS DispatchCleanup(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
  CleanupRequests(&pdx->dqReadWrite, stack->FileObject,
    STATUS_CANCELLED);
  return CompleteRequest(Irp, STATUS_SUCCESS, 0);
  }

Using DEVQUEUE with PnP Requests

The real point of using a DEVQUEUE instead of a KDEVICE_QUEUE is that a DEVQUEUE makes it easier to manage the transitions between PnP states. In all of my sample drivers, the device extension contains a state variable with the imaginative name state. I also define an enumeration named DEVSTATE whose values correspond to the PnP states. When you initialize your device object in AddDevice, you'll call InitializeQueue for each of your device queues and also indicate that the device is in the STOPPED state:

NTSTATUS AddDevice(...)
  {
  ...
  PDEVICE_EXTENSION pdx = ...;
  InitializeQueue(&pdx->dqRead, StartIoReadWrite);
  pdx->state = STOPPED;
  ...
  }

After AddDevice returns, the system sends IRP_MJ_PNP requests to direct you through the various PnP states the device can assume.

Starting the Device

A newly initialized DEVQUEUE is in a STALLED state, such that a call to StartPacket will queue a request even when the device is idle. You'll keep the queue(s) in the STALLED state until you successfully process IRP_MN_START_DEVICE, whereupon you'll execute code like the following:

NTSTATUS HandleStartDevice(...)
  {
  status = StartDevice(...);
  if (NT_SUCCESS(status))
    {
    pdx->state = WORKING;
    RestartRequests(&pdx->dqReadWrite, fdo);
    }
  }

You record WORKING as the current state of your device, and you call RestartRequests for each of your queues to release any IRPs that might have arrived between the time AddDevice ran and the time you received the IRP_MN_START_DEVICE request.

Is It Okay to Stop the Device?

The PnP Manager always asks your permission before sending you an IRP_MN_STOP_DEVICE. The query takes the form of an IRP_MN_QUERY_STOP_DEVICE request that you can succeed or fail as you choose. The query basically means, "Would you be able to immediately stop your device if the system were to send you an IRP_MN_STOP_DEVICE in a few nanoseconds?" You can handle this query in two slightly different ways. Here's the first way, which is appropriate when your device might be busy with an IRP that either finishes quickly or can be easily terminated in the middle:





1

2

3

4
NTSTATUS HandleQueryStop(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  Irp->IoStatus.Status = STATUS_SUCCESS;
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  if (pdx->state != WORKING)
    return DefaultPnpHandler(fdo, Irp);
  if (!OkayToStop(pdx))
    return CompleteRequest(Irp, STATUS_UNSUCCESSFUL, 0);
  StallRequests(&pdx->dqReadWrite);
  WaitForCurrentIrp(&pdx->dqReadWrite);
  pdx->state = PENDINGSTOP;
  return DefaultPnpHandler(fdo, Irp);
  }

  1. This statement handles a peculiar situation that can arise for a boot device: the PnP Manager might send you a QUERY_STOP when you haven't initialized yet. You want to ignore such a query, which is tantamount to saying "yes."
  2. At this point, you perform some sort of investigation to see if it will be okay to revert to the STOPPED state. I'll discuss factors bearing on the investigation immediately below.
  3. StallRequests puts the DEVQUEUE into the STALLED state so that any new IRP just goes into the queue. WaitForCurrentIrp waits until the current request, if there is one, finishes on the device. These two steps make the device quiescent until we know whether the device is really going to stop or not.
  4. At this point, we have no reason to demur. We therefore record our state as PENDINGSTOP. Then we pass the request down the stack so that other drivers can have a chance to accept or decline this query.

The other basic way of handling QUERY_STOP is appropriate when your device might be busy with a request that will take a long time and can't be stopped in the middle, such as a tape retension operation that can't be stopped without potentially breaking the tape. In this case, you can use the DEVQUEUE's CheckBusyAndStall function. That function returns TRUE if the device is busy, whereupon you'd fail the QUERY_STOP with STATUS_UNSUCCESSFUL. The function returns FALSE if the device is idle, in which case it also stalls the queue. (The operations of checking the state of the device and stalling the queue need to be protected by a spin lock, which is why I wrote this function in the first place.)

You can fail a stop query for many reasons. Disk devices that are used for paging, for example, cannot be stopped. Neither can devices that are used for storing hibernation or crash dump files. (You'll know about these characteristics as a result of an IRP_MN_DEVICE_USAGE_NOTIFICATION request, which I'll discuss later in "Other Configuration Functionality.") Other reasons may also apply to your device.

Even if you succeed the query, one of the drivers underneath you might fail it for some reason. Even if all the drivers succeed the query, the PnP Manager might decide not to shut you down. In any of these cases, you'll receive another PnP request with the minor code IRP_MN_CANCEL_STOP_DEVICE to tell you that your device won't be shut down. You should then clear whatever state you set during the initial query:

NTSTATUS HandleCancelStop(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  Irp->IoStatus.Status = STATUS_SUCCESS;
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  if (pdx->state != PENDINGSTOP)
    return DefaultPnpHandler(fdo, Irp);
  NTSTATUS status = ForwardAndWait(fdo, Irp);
  if (NT_SUCCESS(status))
    {
    pdx->state = WORKING;
    RestartRequests(&pdx->dqReadWrite, fdo);
    }
  return CompleteRequest(Irp, status, Irp->IoStatus.Information);
  }

We first check to see whether a stop operation is even pending. Some higher-level driver might have vetoed a query that we never saw, so we'd still be in the WORKING state. If we're not in the PENDINGSTOP state, we simply forward the IRP. Otherwise, we send the CANCEL_STOP IRP synchronously to the lower-level drivers. That is, we use our ForwardAndWait helper function to send the IRP down the stack and await its completion. We wait for low-level drivers because we're about to resume processing IRPs, and the drivers might have work to do before we send them an IRP. If the lower layers successfully handle this IRP_MN_CANCEL_STOP_DEVICE, we change our state variable to indicate that we're back in the WORKING state, and we call RestartRequests to unstall the queues we stalled when we succeeded the query.

While the Device Is Stopped

If, on the other hand, all device drivers succeed the query and the PnP Manager decides to go ahead with the shutdown, you'll get an IRP_MN_STOP_DEVICE next. Your subdispatch function would look like this one:





1



2
3
4
NTSTATUS HandleStopDevice(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  Irp->IoStatus.Status = STATUS_SUCCESS;
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  if (pdx->state != PENDINGSTOP);
    {
    <complicated stuff>
    }
  StopDevice(fdo, pdx->state == WORKING);
  pdx->state = STOPPED;
  return DefaultPnpHandler(fdo, Irp);
  }

  1. We expect the system to send us a QUERY_STOP before it sends us a STOP, so we should already be in the PENDINGSTOP state with all of our queues stalled. There is, however, a bug in Windows 98 such that we can sometimes get a STOP (without a QUERY_STOP) instead of a REMOVE. You need to take some action at this point that causes you to reject any new IRPs, but you mustn't really remove your device object or do the other things you do when you really receive a REMOVE request.
  2. StopDevice is the helper function I've already discussed that deconfigures the device.
  3. We now enter the STOPPED state. We're in almost the same situation as we were when AddDevice was done. That is, all queues are stalled, and the device has no I/O resources. The only difference is that we've left our registered interfaces enabled, which means that applications will not have received removal notifications and will leave their handles open. Applications can also open new handles in this situation. Both aspects are just as they should be, because the stop condition won't last long.
  4. As I previously discussed, the last thing we do to handle IRP_MN_STOP_DEVICE is pass the request down to the lower layers of the driver hierarchy.

Is It Okay to Remove the Device?

Just as the PnP Manager asks your permission before shutting your device down with a stop device request, it also might ask your permission before removing your device. This query takes the form of an IRP_MN_QUERY_REMOVE_DEVICE request that you can, once again, succeed or fail as you choose. And, just as with the stop query, the PnP Manager will use an IRP_MN_CANCEL_REMOVE_DEVICE request if it changes its mind about removing the device.





1

2

3










4




5
NTSTATUS HandleQueryRemove(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  Irp->IoStatus.Status = STATUS_SUCCESS;
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  if (OkayToRemove(fdo))
    {
    StallRequests(&pdx->dqReadWrite);
    WaitForCurrentIrp(&pdx->dqReadWrite);
    pdx->prevstate = pdx->state;
    pdx->state = PENDINGREMOVE;
    return DefaultPnpHandler(fdo, Irp);
    }
  return CompleteRequest(Irp, STATUS_UNSUCCESSFUL, 0);
  }

NTSTATUS HandleCancelRemove(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  Irp->IoStatus.Status = STATUS_SUCCESS;
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  if (pdx->state != PENDINGREMOVE)
    return DefaultPnpHandler(fdo, Irp);
  NTSTATUS status = ForwardAndWait(fdo, Irp);
  if (NT_SUCCESS(status))
    {
    pdx->state = pdx->prevstate;
    if (pdx->state == WORKING)
      RestartRequests(&pdx->dqReadWrite, fdo);
    }
  return CompleteRequest(Irp, status, Irp->IoStatus.Information);
  }

  1. This OkayToRemove helper function provides the answer to the question, "Is it okay to remove this device?" In general, this answer includes some device-specific ingredients, such as whether the device holds a paging or hibernation file, and so on.
  2. Just as I showed you for IRP_MN_QUERY_STOP_DEVICE, you want to stall the request queue and wait for a short period, if necessary, until the current request finishes.
  3. If you look at Figure 6-1 carefully, you'll notice that it's possible to get a QUERY_REMOVE when you're in either the WORKING or STOPPED state. The right thing to do if the current query is later cancelled is to return to the original state. Hence, I have a prevstate variable in the device extension to record the prequery state.
  4. We get the CANCEL_REMOVE request when something either above or below us vetoes a QUERY_REMOVE. If we never saw the query, we'll still be in the WORKING state and don't need to do anything with this IRP. Otherwise, we need to forward it to the lower levels before we process it because we want the lower levels to be ready to process the IRPs we're about to release from our queues.
  5. Here, we undo the steps we took when we succeeded the QUERY_REMOVE. We revert to the previous state. If the previous state was WORKING, we stalled the queues when we handled the query and need to unstall them now.

Synchronizing Removal

It turns out that the I/O Manager can send you PnP requests simultaneously with other substantive I/O requests, such as requests that involve reading or writing. It's entirely possible, therefore, for you to receive an IRP_MN_REMOVE_DEVICE at a time when you're still processing another IRP. It's up to you to prevent untoward consequences, and the standard way to do that involves using an IO_REMOVE_LOCK object and several associated kernel-mode support routines.

The basic idea behind the standard scheme for preventing premature removal is that you acquire the remove lock each time you start processing a request and you release the lock when you're done. Before you remove your device object, you make sure that the lock is free. If not, you wait until all references to the lock are released. Figure 6-6 illustrates the process.

Click to view at full size.

Figure 6-6. Operation of an IO_REMOVE_LOCK.

To handle the mechanics of this process, you define a variable in the device extension:

struct DEVICE_EXTENSION {
  ...
  IO_REMOVE_LOCK RemoveLock;
  ...
  };

You initialize the lock object during AddDevice:

NTSTATUS AddDevice(PDRIVER_OBJECT DriverObject, PDEVICE_OBJECT pdo)
  {
  ...
  IoInitializeRemoveLock(&pdx->RemoveLock, 0, 0, 256);
  ...
  }

The last three parameters to IoInitializeRemoveLock are, respectively, a tag value, an expected maximum lifetime for a lock, and a maximum lock count, none of which are used in the free build of the operating system.

These preliminaries set the stage for what you do during the lifetime of the device object. Whenever you receive an I/O request, you call IoAcquireRemoveLock. IoAcquireRemoveLock will return STATUS_DELETE_PENDING if a removal operation is underway. Otherwise, it will acquire the lock and return STATUS_SUCCESS. Whenever you finish an I/O operation, you call IoReleaseRemoveLock , which will release the lock and might unleash a heretofore pending removal operation. In the context of some purely hypothetical dispatch function that completes the IRP it's handed, the code might look like this:

NTSTATUS DispatchSomething(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  NTSTATUS status = IoAcquireRemoveLock(&pdx->RemoveLock, Irp);
  if (!NT_SUCCESS(status))
    return CompleteRequest(Irp, status, 0);
  ...
  IoReleaseRemoveLock(&pdx->RemoveLock, Irp);
  return CompleteRequest(Irp, <some code><info value>);
  }

The second argument to IoAcquireRemoveLock and IoReleaseRemoveLock is just a tag value that a checked build of the OS can use to match up acquisition and release calls, by the way.

The calls to acquire and release the remove lock dovetail with additional logic in the PnP dispatch function and the remove device subdispatch function. First, DispatchPnp has to obey the rule about locking and unlocking the device, so it will contain the following code that I didn't show you earlier in "IRP_MJ_PNP Dispatch Function":

NTSTATUS DispatchPnp(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  NTSTATUS status = IoAcquireRemoveLock(&pdx->RemoveLock, Irp);
  if (!NT_SUCCESS(status))
    return CompleteRequest(Irp, status, 0);
  ...
  status = (*fcntab[fcn](fdo, Irp);
  if (fcn != IRP_MN_REMOVE_DEVICE)
    IoReleaseRemoveLock(&pdx->RemoveLock, Irp);
  return status;
  }

In other words, DispatchPnp locks the device, calls the subdispatch routine, and then (usually) unlocks the device afterward. The subdispatch routine for IRP_MN_REMOVE_DEVICE has additional special logic that you also haven't seen yet:





1



2
3
NTSTATUS HandleRemoveDevice(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  Irp->IoStatus.Status = STATUS_SUCCESS;
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  AbortRequests(&pdx->dqReadWrite, STATUS_DELETE_PENDING);
  DeregisterAllInterfaces(pdx);
  StopDevice(fdo, pdx->state == WORKING);
  pdx->state = REMOVED;
  NTSTATUS status = DefaultPnpHandler(pdx->LowerDeviceObject, Irp);
  IoReleaseRemoveLockAndWait(&pdx->RemoveLock, Irp);
  RemoveDevice(fdo);
  return status;
  }

  1. Windows 98 doesn't send the SURPRISE_REMOVAL request, so this REMOVE IRP may be the first indication you have that the device has disappeared. Calling StopDevice allows you to release all your I/O resources in case you didn't get an earlier IRP that caused you to release them. Calling AbortRequests causes you to complete any queued IRPs and to start rejecting any new IRPs.
  2. We pass this request to the lower layers now that we've done our work.
  3. The PnP dispatch routine acquired the remove lock. We now call the special function IoReleaseRemoveLockAndWait to release that lock reference and wait until all references to the lock are released. Once the IoReleaseRemoveLockAndWait routine returns, any subsequent call to IoAcquireRemoveLock will elicit a STATUS_DELETE_PENDING status to indicate that device removal is under way.

NOTE
You'll notice that the IRP_MN_REMOVE_DEVICE handler might block while some IRP finishes. This is certainly okay in Windows 98 and Windows 2000, which were designed with this possibility in mind—the IRP gets sent in the context of a system thread that is allowed to block. Some WDM functionality (a Microsoft developer even called it "embryonic") is present in OEM releases of Microsoft Windows 95, but you can't block a remove device request there. Consequently, if your driver needs to run in Windows 95, you need to discover that fact and avoid blocking. That discovery process is left as an exercise for you.

These are the mechanics of locking and unlocking the device to forestall removing the device while it's still in use. You still need to know when to invoke IoAcquireRemoveLock and IoReleaseRemoveLock to bring that mechanism into play. Basically, an IRP dispatch function that will complete the request quickly should acquire and release the lock.

A dispatch routine that queues an IRP should not acquire the remove lock, however. For a queued IRP, you acquire the lock inside StartIo and release it inside your DPC routine. So, we can expand the earlier skeleton of StartIo and DpcForIsr as follows:




1


2













3
VOID StartIo(PDEVICE_OBJECT fdo, PIRP Irp)
  {
  PDEVICE_EXTENSION pdx =(PDEVICE_EXTENSION) fdo->DeviceExtension;
  NTSTATUS status = IoAcquireRemoveLock(&pdx->RemoveLock, Irp);
  if (!NT_SUCCESS(status))
    {
    CompleteRequest(Irp, status, 0);
    return;
    }

  // start request on device
  }

VOID DpcForIsr(PKDPC Dpc, PDEVICE_OBJECT device, PIRP junk,
  PVOID context)
  {
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite);
  ...
  StartNextPacket(&pdx->dqReadWrite, device);
  IoReleaseRemoveLock(&pdx->RemoveLock, Irp);
  CompleteRequest(Irp, ...);
 }

  1. We acquire the lock here rather than in the dispatch routine. We don't want the fact that we've got an IRP sitting in our queue to prevent the PnP Manager from shutting us down. It's also better to not have to worry about the remove lock in our cancel routine.
  2. IoAcquireRemoveLock fails only if a delete operation is pending. Its return value can be either STATUS_SUCCESS or STATUS_DELETE_PENDING. In the failure case, don't call StartNextPacket—there's no point in trying to start a new operation when the device is about to disappear. Were we to call StartNextPacket, it would recursively call this routine, which would try to acquire the remove lock and fail, whereupon it would call StartNextPacket, which would call StartIo, which would…<BSOD due to stack overflow>. You get the idea.
  3. This call to IoReleaseRemoveLock balances the call inside StartIo.

You should also acquire the remove lock when you successfully process an IRP_MJ_CREATE. In contrast to the other situations we've considered, you don't release the lock before returning from the DispatchCreate routine. The balancing call to IoReleaseRemoveLock occurs instead in the dispatch routine for IRP_MJ_CLOSE. In other words, you hold the remove lock for the entire time something has a handle open to your device. Here's a sketch of what I mean:

NTSTATUS DispatchCreate(...)
  {
  ...
  IoAcquireRemoveLock(&pdx->RemoveLock, stack->FileObject);
  return CompleteRequest(...);
  }

NTSTATUS DispatchClose(...)
  {
  ...
  IoReleaseRemoveLock(&pdx->RemoveLock, stack->FileObject);
  return CompleteRequest(...);
  }

For debugging purposes, the balancing calls to IoAcquireRemoveLock and IoReleaseRemoveLock should use the same value for the second argument. You wouldn't use the IRP pointer as I've done in my other examples because the CREATE and CLOSE requests are different IRPs. The file object will be the same in both requests, though, which is why I used the file object in this example.

If the end user uses the Device Manager to remove a device when some application has an open handle, the operating system declines to remove the device and so informs the user. In that situation, the fact that you've also claimed the remove lock won't influence the course of events because you'll never get the IRP_MN_REMOVE_DEVICE that would cause you to wait for all holders of the lock to release it. If it's possible for the device to be physically removed from the computer without first going through the Device Manager, however, a correctly written application will be looking for a WM_DEVICECHANGE message that signals departure of the device. (See the discussion of user-mode notifications near the end of this chapter in "PnP Notifications".) The application will then close its handles. You should delay IRP_MN_REMOVE_DEVICE until the handles are actually closed, and the locking logic I've just described allows you to do that.

How DEVQUEUE Works

In contrast to other examples in this book, I'm going to show you the full implementation of the DEVQUEUE object even though the source code is on the companion disc. I'm making an exception in this case because I think an annotated listing of the functions will make it easier for you to understand how to use it.

Initializing a DEVQUEUE

The DEVQUEUE object has this declaration in my DEVQUEUE.H header file:

typedef struct _DEVQUEUE {
  LIST_ENTRY head;
  KSPIN_LOCK lock;
  PDRIVER_START StartIo;
  LONG stallcount;
  PIRP CurrentIrp;
  KEVENT evStop;
  NTSTATUS abortstatus;
  } DEVQUEUE, *PDEVQUEUE;

InitializeQueue initializes one of these objects like this:



1
2
3
4
5
6
7
VOID NTAPI InitializeQueue(PDEVQUEUE pdq, PDRIVER_STARTIO StartIo)
  {
  InitializeListHead(&pdq->head);
  KeInitializeSpinLock(&pdq->lock);
  pdq->StartIo = StartIo;
  pdq->stallcount = 1;
  pdq->CurrentIrp = NULL;
  KeInitializeEvent(&pdq->evStop, NotificationEvent, FALSE);
  pdq->abortstatus = (NTSTATUS) 0;
  }

  1. We use an ordinary (noninterlocked) doubly-linked list to queue IRPs. We don't need to use an interlocked list because we'll always access it within the protection of our own spin lock.
  2. This spin lock guards access to the queue and other fields in the DEVQUEUE structure. It also takes the place of the global cancel spin lock for guarding nearly all of the cancellation process, thereby improving system performance.
  3. Each queue has its own associated StartIo function that we call automatically in the appropriate places.
  4. The stall counter indicates how many times something has requested that IRP delivery to StartIo be stalled. Initializing the counter to 1 means that the IRP_MN_START_DEVICE handler must call RestartRequests to release an IRP.
  5. The CurrentIrp field records the IRP most recently sent to the StartIo routine. Initializing this field to NULL indicates that the device is initially idle.
  6. We use this event to block WaitForCurrentIrp when necessary. We'll set this event inside StartNextPacket, which should always be called when the current IRP completes.
  7. We reject incoming IRPs in two situations. The first situation is after we irrevocably commit to removing the device, when we must start failing new IRPs with STATUS_DELETE_PENDING. The second situation is during a period of low power, when, depending on the type of device we're managing, we might choose to fail new IRPs with the STATUS_DEVICE_POWERED_OFF code. The abortstatus field records the status code we should use in rejecting IRPs in these situations.

Stalling the Queue

Stalling the IRP queue involves two DEVQUEUE functions:



1





2
3
4
VOID NTAPI StallRequests(PDEVQUEUE pdq)
  {
  InterlockedIncrement(&pdq->stallcount);
  }

BOOLEAN NTAPI CheckBusyAndStall(PDEVQUEUE pdq)
  {
  KIRQL oldirql;
  KeAcquireSpinLock(&pdq->lock, &oldirql);
  BOOLEAN busy = pdq->CurrentIrp != NULL;
  if (!busy)
    InterlockedIncrement(&pdq->stallcount);
  KeReleaseSpinLock(&pdq->lock, oldirql);
  return busy;
  }

  1. To stall requests, we just need to set the stall counter to a nonzero value. It's unnecessary to protect the increment with a spin lock because any device that might be racing with us to change the value will also be using an interlocked increment or decrement.
  2. Since CheckBusyAndStall needs to operate as an atomic function, we first take the queue's spin lock.
  3. CurrentIrp being non-NULL is the signal that the device is busy handling one of the requests from this queue.
  4. If the device is currently idle, this statement starts stalling the queue, thereby preventing the device from becoming busy later on.

Queuing IRPs

IRPs get added to the queue when a dispatch function calls StartPacket:




1

2





3

4








5


6
VOID NTAPI StartPacket(PDEVQUEUE pdq, PDEVICE_OBJECT fdo,
  PIRP Irp, PDRIVER_CANCEL cancel)
  {
  KIRQL oldirql;
  KeAcquireSpinLock(&pdq->lock, &oldirql);
  if (pdq->abortstatus)
    {
    KeReleaseSpinLock(&pdq->lock, oldirql);
    Irp->IoStatus.Status = pdq->abortstatus;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    }
  else if (pdq->CurrentIrp || pdq->stallcount)
    {
    IoSetCancelRoutine(Irp, cancel);
    if (Irp->Cancel && IoSetCancelRoutine(Irp, NULL))
      {
      KeReleaseSpinLock(&pdq->lock, oldirql);
      Irp->IoStatus.Status = STATUS_CANCELLED;
      IoCompleteRequest(Irp, IO_NO_INCREMENT);
      }
  else
      {
      InsertTailList(&pdq->head, &Irp->Tail.Overlay.ListEntry);
      KeReleaseSpinLock(&pdq->lock, oldirql);
      }
  else
    {
    pdq->CurrentIrp = Irp;
    KeReleaseSpinLock(&pdq->lock, DISPATCH_LEVEL);
    (*pdq->StartIo)(fdo, Irp);
    KeLowerIrql(oldirql);
    }
  }

  1. Acquiring the spin lock allows us to access fields in the DEVQUEUE without interference from the other support routines—principally StartNextPacket—that might be trying to access the same queue.
  2. As I described earlier, we sometimes need to reject IRPs on arrival. If abortstatus is nonzero, we just complete the request. Our caller will be returning STATUS_PENDING, so it's up to us to do the completion.
  3. If the device is currently busy, or if some other part of the driver has stalled this queue, we need to queue the IRP for later processing.
  4. We might be in race with an instance of IoCancelIrp that is trying to cancel this very IRP. We first install our own cancel routine in the IRP by using IoSetCancelRoutine, which performs an (atomic) interlocked exchange. Then we test the Cancel flag. If we find the Cancel flag set, our cancel routine might or might not have been called by now, depending on the exact order in which our code and IoCancelIrp executed their program steps. If our cancel routine was called, a second call to IoSetCancelRoutine will return NULL; we can then enqueue the IRP and rely on the cancel routine to immediately dequeue the IRP and complete it. If our cancel routine has not yet been called, it won't be possible for it to ever be called after the second invocation of IoSetCancelRoutine; we will complete the IRP now in this case.
  5. This is where we actually queue the IRP. The Tail.Overlay.ListEntry field of an IRP was designed for uses like this one.
  6. The last case is when the queue is in the READY state and the device is not currently busy. We set the CurrentIrp pointer in the DEVQUEUE, release the spin lock, and call the StartIo routine at DISPATCH_LEVEL.

I'd like to discuss a pesky nonproblem in the above code. Programs that change CurrentIrp do so while owning our spin lock, so we can be sure there's no ambiguity in our test of CurrentIrp. The stall counter, on the other hand, can be incremented without the spin lock inside StallRequests. It should be obvious that the only potential problem occurs when the counter is being incremented from 0 to 1 more or less simultaneously with us, because we behave the same way no matter what nonzero value the counter might have. Consider the potential race with a call to StallRequests that will increment the counter from 0 to 1. If we beat the increment and find the counter 0, we'll go ahead and start a request. That's okay, because the caller of StallRequests is willing to have the device be busy. (If the caller weren't willing, it would have used CheckBusyAndStall instead.) If we find the counter already incremented, we'll queue the IRP, which is also consistent with what the caller of StallRequests intended.

Dequeuing IRPs

The function that dequeues most IRPs is StartNextPacket, which is called from a DPC routine:



1

2

3

4



5

6





7
PIRP NTAPI StartNextPacket(PDEVQUEUE pdq, PDEVICE_OBJECT fdo)
  {
  KIRQL oldirql;
  KeAcquireSpinLock(&pdq->lock, &oldirql));
  PIRP CurrentIrp = (PIRP) InterlockedExchangePointer
    (&pdq->CurrentIrp, NULL);
  if (CurrentIrp)
    KeSetEvent(&pdq->evStop, 0, FALSE);
  while (!pdq->stallcount
    && !pdq->abortstatus
    && !IsListEmpty(&pdq->head))
    {
    PLIST_ENTRY next = RemoveHeadList(&pdq->head);
    PIRP Irp = CONTAINING_RECORD(next, IRP, Tail.Overlay.ListEntry);
    if (!IoSetCancelRoutine(Irp, NULL))
      {
      InitializeListHead(&Irp->Tail.Overlay.ListEntry);
      continue;
      }
    pdq->CurrentIrp = Irp;
    KeReleaseSpinLockFromDpcLevel(&pdq->lock);
    (*pdq->StartIo)(fdo, Irp);
    KeLowerIrql(oldirql);
    return CurrentIrp;
    }
  KeReleaseSpinLock(&pdq->lock, oldirql);
  return CurrentIrp;
  }

  1. We first acquire the queue's spin lock so that we can muck about with the internal structure of the object without interference.
  2. We'll be returning the address of the current IRP as our return value, and we also want to set the CurrentIrp pointer to NULL. Because of the spin lock, we don't need to use an atomic operation to extract and nullify CurrentIrp, but doing so can't hurt either.
  3. Some routine might be waiting inside WaitForCurrentIrp for the current request to finish. This call to KeSetEvent will satisfy that wait.
  4. This series of tests determines whether we can and should dequeue a request. The queue must not be stalled. Neither can we be in the REJECTING state, in which we're rejecting new IRPs. Finally, the queue must contain a request before it makes sense to call RemoveHeadList.
  5. This code removes the oldest entry in our IRP queue.
  6. Nullifying the cancel routine pointer in the IRP will prevent IoCancelIrp from trying to cancel the IRP. It's possible that IoCancelIrp is in the process of trying to cancel this IRP on another CPU at this very moment, in which case we should get NULL as the return value from IoSetCancelRoutine. When CancelRequest gains control, it will need to acquire the queue's spin lock before proceeding further. At that point, it will blindly try to remove this IRP from whatever queue it happens to be on. Calling InitializeListHead on the IRP's own chaining field will make it safe for CancelRequest to do this when it eventually gains control of the spin lock and proceeds.
  7. This is where we finally pass the newly dequeued IRP to the StartIo routine for processing.

The RestartRequests function balances a call to StallRequests or CheckBusyAndStall. It's complicated—very slightly—by the need to send the first IRP to the StartIo routine. Luckily, it can just call StartNextPacket:

VOID NTAPI RestartRequests(PDEVQUEUE pdq, PDEVICE_OBJECT fdo)
  {
  if (InterlockedDecrement(&pdq->stallcount) > 0)
    return;
  StartNextPacket(pdq, fdo);
  }

Cancelling IRPs

StartPacket registers a cancel routine supplied by its caller, which in turn simply delegates the work to the queue's CancelRequest function:

VOID NTAPI CancelRequest(PDEVQUEUE pdq, PIRP Irp)
  {
  KIRQL oldirql = Irp->CancelIrql;
  IoReleaseCancelSpinLock(DISPATCH_LEVEL);
  KeAcquireSpinLockAtDpcLevel(&pdq->lock);
  RemoveEntryList(&Irp->Tail.Overlay.ListEntry);
  KeReleaseSpinLock(&pdq->lock, oldirql);
  Irp->IoStatus.Status = STATUS_CANCELLED;
  IoCompleteRequest(Irp, IO_NO_INCREMENT);
  }

We're called while we own the global cancel spin lock, which we release almost immediately. After this everything is protected by the queue's spin lock instead. When IoCancelIrp called IoAcquireCancelSpinLock , it saved the previous interrupt request level (IRQL) value in the CancelIrql field of the IRP, and we need to eventually revert to that same IRQL; hence, we save it in the oldirql variable.

NOTE
The caller of IoCancelIrp is responsible for making sure that the IRP has not already been completed.

IRPs can also be cancelled as a result of an IRP_MJ_CLEANUP, which we'll receive prior to an IRP_MJ_CLOSE. The DEVQUEUE CleanupRequests function is almost identical to the standard-model DispatchCleanup routine I showed you in the previous chapter. The only substantive difference between the two is that we only need to acquire the queue's spin lock:




1





2


3
4


5

6


7
VOID NTAPI CleanupRequests(PDEVQUEUE pdq, PFILE_OBJECT fop,
  NTSTATUS status)
  {
  LIST_ENTRY cancellist;
  InitializeListhead(&cancellist);
  KIRQL oldirql;
  KeAcquireSpinLock(&pdq->lock, &oldirql);
  PLIST_ENTRY first = &pdq->head;
  PLIST_ENTRY next;
  for (next = first->Flink; next != first; )
    {
    PIRP Irp = CONTAINING_RECORD(next, IRP, Tail.Overlay.ListEntry);
    PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
    next = next->Flink;
    if (fop && stack->FileObject != fop)
      continue;
    if (!IoSetCancelRoutine(Irp, NULL))
      continue;
    RemoveEntryList(next);
    InsertTailList(&cancellist, next);
    }
  KeReleaseSpinLock(&pdq->lock, oldirql);
  while (!IsListEmpty(&cancellist))
    {
    next = RemoveHeadList(&cancellist);
    PIRP Irp = CONTAINING_RECORD(next, IRP, Tail.Overlay.ListEntry);
    Irp->IoStatus.Status = status;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    }
  }

  1. Our strategy will be to move the IRPs that need to be cancelled into a private queue under protection of the queue's spin lock. Hence, we initialize the private queue and acquire the spin lock before doing anything else.
  2. This loop traverses the entire queue until we return to the list head. Note the absence of a loop increment step—the third clause in the for statement. I'll explain why none is desirable in a moment.
  3. If we're being called to help out with IRP_MJ_CLEANUP, the fop argument is the address of a file object that is about to be closed. We're supposed to isolate the IRPs that pertain to the same file object, which requires us to first find the stack location.
  4. If we decide to remove this IRP from the queue, we won't thereafter have an easy way to find the next IRP in the main queue. We therefore perform the loop increment step here.
  5. This especially clever statement is due to Jamie Hanrahan. We need to worry that someone might be trying to cancel the IRP that we're currently looking at during this iteration. They could get only as far as the point where CancelRequest tries to acquire the spin lock. Before getting that far, however, they necessarily had to execute the statement inside IoCancelIrp that nullifies the cancel routine pointer. If we find that pointer NULL when we call IoSetCancelRoutine , therefore, we can be sure that someone really is trying to cancel this IRP. By simply skipping this IRP during this iteration, we allow the cancel routine to complete it later on.
  6. Here's where we take the IRP out of the main queue and put it in the private queue instead.
  7. Once we finish moving IRPs into the private queue, we can release our spin lock. Then we go ahead and cancel all the IRPs we moved.

CleanupRequests can be called from elsewhere in the driver, by the way. For example, earlier I showed you a call from the IRP_MN_REMOVE_DEVICE handler, which supplied a NULL file object pointer (in order to select all IRPs) and a status code of STATUS_DELETE_PENDING.

Awaiting the Current IRP

The handler for IRP_MN_STOP_DEVICE might need to wait for the current IRP, if any, to finish by calling WaitForCurrentIrp:



1
2
3
VOID NTAPI WaitForCurrentIrp(PDEVQUEUE pdq)
  {
  KeClearEvent(&pdq->evStop);
  ASSERT(pdq->stallcount != 0);
  KIRQL oldirql;
  KeAcquireSpinLock(&pdq->lock, &oldirql);
  BOOLEAN mustwait = pdq->CurrentIrp != NULL;
  KeReleaseSpinLock(&pdq->lock, oldirql);
  if (mustwait)
    KeWaitForSingleObject(&pdq->evStop, Executive, KernelMode,
      FALSE, NULL);
  }

  1. StartNextPacket signals the evStop event each time it's called. We want to be sure that the wait we're about to perform doesn't complete because of a now stale signal, so we clear the event before doing anything else.
  2. It doesn't make sense to call this routine without first stalling the queue. Otherwise, StartNextPacket would just start the next packet if there were one, and the device would become busy again.
  3. If the device is currently busy, we'll wait on the evStop event until something calls StartNextPacket to signal that event. We need to protect our inspection of CurrentIrp with the spin lock because, in general, testing a pointer for NULL isn't an atomic event. If the pointer is NULL now, it can't change later because we've assumed that the queue is stalled.

Aborting Requests

Surprise removal of the device demands that we immediately halt every outstanding IRP that might try to touch the hardware. In addition, we want to make sure that all further IRPs get rejected. The AbortRequests function helps with these tasks:

VOID NTAPI AbortRequests(PDEVQUEUE pdq, NTSTATUS status)
  {
  pdq->abortstatus = status;
  CleanupRequests(pdq, NULL, status);
  }

Setting abortstatus puts the queue into the REJECTING state so that all future IRPs will be rejected with whatever status value our caller supplied. Calling CleanupRequests at this point—with a NULL file object pointer so that CleanupRequests will process the entire queue—empties the queue.

We don't dare try to do anything with the IRP, if any, that's currently active on the hardware. Drivers that don't use the hardware abstraction layer (HAL) to access the hardware—USB drivers, for example, which rely on the hub and host-controller drivers—can count on another driver to fail the current IRP. Drivers that use the HAL might, however, need to worry about hanging the system or, at the very least, leaving an IRP in limbo because the nonexistent hardware can't generate the interrupt that would let the IRP finish. To deal with situations like this, you call AreRequestsBeingAborted:

NTSTATUS AreRequestsBeingAborted(PDEVQUEUE pdq)
  {
  return pdq->abortstatus;
  }

It would be silly, by the way, to use the queue spin lock in this routine. Suppose that we were to capture the instantaneous value of abortstatus in a thread-safe and multiprocessor-safe way. The value we return could become obsolete as soon as we release the spin lock.

NOTE
If your device might be removed in such a way that an outstanding request simply hangs, you should also have a watchdog timer of some sort running that will let you kill the IRP after some period of time. See the "Watchdog Timers" section in Chapter 9, "Specialized Topics."

Sometimes we need to undo the effect of a previous call to AbortRequest. AllowRequests lets us do that:

VOID NTAPI AllowRequests(PDEVQUEUE pdq)
  {
  pdq->abortstatus = (NTSTATUS) 0;
  }