[Previous] [Next]

Managing Power Transitions

Performing power management tasks correctly requires very accurate coding, and there are many complicating factors. For example, your device might have the ability to wake up the system from a sleeping state. Deciding whether to succeed or fail a query, and deciding which device power state corresponds to a given new system power state, depends on whether your wake-up feature is currently armed. You may have powered down your own device because of inactivity, and you need to provide for restoring power when a substantive IRP comes along. Maybe your device is an "inrush" device that needs a large spike of current to power on, in which case the Power Manager treats you specially. And so on.

When I thought about solving all the problems of handling query-power and set-power operations in a traditional way—that is, with normal-looking dispatch and completion routines—I was daunted by the sheer number of different subroutines that would be required and that would end up doing fairly similar things. I therefore decided to build my power support around a finite state machine that could easily deal with the asynchronous nature of the activities.

I'll explain this finite state machine as it appears in GENERIC.SYS, which is a support driver that most of the code samples on the companion disc use. Appendix B, "Using GENERIC.SYS," explains the client interface to GENERIC.SYS in complete detail. GENERIC.SYS amounts to a kernel-mode DLL containing helper functions for WDM drivers. You could think of it as a generic class driver with broad applicability. Client drivers, including most of my own sample drivers, delegate handling of power IRPs to GENERIC.SYS by calling GenericDispatchPower. GENERIC.SYS also implements the DEVQUEUE object I discussed in Chapter 6, "Plug and Play."

Overview of the Finite State Machine

I wrote a function named HandlePowerEvent to implement the finite state machine that manages power IRPs. I call this function with two arguments:

NTSTATUS HandlePowerEvent(PPOWCONTEXT ctx, enum POWEVENT event);

The first argument is a context structure that contains a state variable, among other things:

typedef struct _POWCONTEXT {
  LONG id;
  LONG eventcount;
  PGENERIC_EXTENSION pdx;
  PIRP irp;
  enum POWSTATE state;
  NTSTATUS status;
  PKEVENT pev;
  DEVICE_POWER_STATE devstate;
  UCHAR MinorFunction;
  BOOLEAN UnstallQueue;
} POWCONTEXT, *PPOWCONTEXT;

The id and eventcount fields are for debugging. If you compile POWER.CPP in the GENERIC project with the preprocessor macro VERBOSETRACE defined as a nonzero value, the POWTRACE macro will produce volumes of trace messages. I used this feature to debug the finite state machine. The prebuilt version of GENERIC.SYS on the companion disc was built without VERBOSETRACE to cut down on the sheer number of trace messages you'd be confronted with when you began to try out my samples.

The pdx member points to GENERIC's portion of the device extension for a given device. There are just a couple of members in the device extension that are important for power management, and I'll mention them later in "Initial Handling for a New IRP." The irp member points to the power IRP that the finite state machine is currently working on; state is the state variable for the machine. The status member is the ending status of an IRP. In some situations, we want to wait while HandlePowerEvent originates and completes a device power IRP; we use the event pointed to by pev to await completion in those situations. The devstate member holds the device power state we want to use in a device IRP, and MinorFunction holds the minor function code (IRP_MN_QUERY_POWER or IRP_MN_SET_POWER) we want to use in that IRP. Finally, UnstallQueue indicates whether we want the state machine to unstall the substantive IRP queue when it finishes handling the current power IRP.

The second argument to HandlePowerEvent is an event code that indicates why we're calling the function. There are just these few event codes:

HandlePowerEvent uses the value of the state variable and the event code to determine an action to take. See Table 8-3. (In the table, by the way, an empty cell denotes an impossible situation that leads to an ASSERT failure in the checked build of GENERIC.SYS.) An action corresponds to a series of program steps that advance the power IRP along its processing path.

Table 8-3. Table giving initial action for each event and state.

State Event
NewIrp MainIrpComplete AsyncNotify
InitialState TriageNewIrp
SysPowerUpPending SysPowerUpComplete
SubPowerUpPending SubPowerUpComplete
SubPowerDownPending SubPowerDownComplete
SysPowerDownPending SysPowerDownComplete
DevPowerUpPending DevPowerUpComplete
DevPowerDownPending CompleteMainIrp
ContextSavePending ContextSaveComplete
ContextRestorePending ContextRestoreComplete
DevQueryUpPending DevQueryUpComplete
DevQueryDownPending DevQueryDownComplete
QueueStallPending QueueStallComplete
FinalState

Since many of the events require multiple actions in some situations, I coded HandlePowerEvent in what may seem at first like a peculiar way, as follows:

NTSTATUS HandlePowerEvent(...)
  {
  NTSTATUS status;
  POWACTION action = ...;
  while (TRUE)
    {
    switch (action)
      {
    case <someaction>:
      action = <someotheraction>;
      continue;
    case <anotheraction>:
      break;
      }
    break;
    }
  return status;
  }

That is, the function amounts to a switch on the action code imbedded within an infinite loop. An action case that performs a continue statement repeats the loop; this is how I string together a series of actions during one call to the function. An action case that performs a break from the switch reaches another break statement that exits from the loop, whereupon the function returns.

I adopted this coding style for the state machine because I really took to heart the structured programming precepts I learned in my youth. I wanted there to be just one return statement in this whole function to make it easier to prove that the function worked correctly. To aid in the proof, I developed a couple of rules for myself that I could test either by inspection or with ASSERT statements at the end of the function. Here are the rules:

Initial Handling for a New IRP

When we receive a new query-power or set-power IRP, we create a context structure to drive the finite state machine and call HandlePowerEvent:



1 






2 
NTSTATUS GenericDispatchPower(PGENERIC_EXTENSION pdx, PIRP Irp)
  {
  NTSTATUS status = IoAcquireRemoveLock(pdx->RemoveLock, Irp);
  if (!NT_SUCCESS(status))
    return CompleteRequest(Irp, status);
  PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
  ULONG fcn = stack->MinorFunction;
  if (fcn == IRP_MN_SET_POWER || fcn == IRP_MN_QUERY_POWER)
    {
    PPOWCONTEXT ctx = (PPOWCONTEXT) ExAllocatePool(NonPagedPool,
      sizeof(POWCONTEXT));
    RtlZeroMemory(ctx, sizeof(POWCONTEXT));
    ctx->pdx = pdx;
    ctx->irp = Irp;
    status = HandlePowerEvent(ctx, NewIrp);
    }
  IoReleaseRemoveLock(pdx->RemoveLock, Irp);
  return status;
  }

  1. The client driver provides a remove lock that both it and GENERIC use to guard against premature removal of the device object. The actual code in GENERIC is a little more complicated than I'm showing you here, in that the remove lock isn't required. The actual code therefore tests the RemoveLock pointer for NULL before using it. There are other unimportant respects, including error checking, in which GENERIC differs from the simplified version I'm showing throughout this chapter.
  2. For set and query operations, we allocate nonpaged memory for the context structure and initialize it. The state variable gets initialized to InitialState, which is numerically equal to 0, by the call to RtlZeroMemory.

The initial state of the finite state machine is InitialState. When we call HandlePowerEvent for the NewIrp event, the first action taken will be the following, which I named TriageNewIrp:



1 

2 


3 

4 
5 












6 




7 

case TriageNewIrp:
  {
  status = STATUS_PENDING;
  IoMarkIrpPending(Irp);
  IoAcquireRemoveLock(pdx->RemoveLock, Irp);
  if (stack->Parameters.Power.Type == SystemPowerState)
    {        // system IRP
    if (stack->Parameters.Power.State.SystemState < pdx->syspower)
      {
      action = ForwardMainIrp;
      ctx->state = SysPowerUpPending;
      }
    else
      {
      action = SelectDState;
      ctx->state = SubPowerDownPending;
      }
    }        // system IRP
  else
    {        // device IRP
    ctx->state = QueueStallPending;
    if (!pdx->StalledForPower)
      {
      ctx->UnstallQueue = TRUE;
      pdx->StalledForPower = TRUE;
      NTSTATUS qstatus = StallRequestsAndNotify(pdx->dqReadWrite,
        GenericSaveRestoreComplete, ctx);
      if (qstatus == STATUS_PENDING)
        break;
      }
    action = QueueStallComplete;
    }        // device IRP
  continue;
  }

  1. We always pend the power IRPs that come to us. In nearly every case, we need to delay completing the IRP until after some asynchronous activity occurs.
  2. We acquire the remove lock an extra time beyond the acquisition that occurs in the dispatch routine. We'll release this instance of the lock when we finally complete the IRP.
  3. If the power state in the IRP is numerically less than the syspower value we carry around in the device extension, the IRP relates to a higher system power state.
  4. This statement illustrates how HandlePowerEvent can perform more than one action during a single invocation. Later on we'll execute a continue statement that repeats the infinite loop. The action value will be different, however, which will cause us to execute a different piece of code.
  5. This statement illustrates how action cases can alter the state of the finite state machine. To simplify the conditional compilation I used for debugging print statements, the actual code in GENERIC uses a macro named SETSTATE to perform this assignment, by the way.
  6. We're about to call a function (StallRequestsAndNotify ) that might cause recursion into this function. We're not allowed to touch the context structure afterwards, so we set this flag now. The flag means that CompleteMainIrp should call RestartRequests to unstall the queue.
  7. This statement illustrates how an action case can cause HandlePowerEvent to return. This break statement exits from the switch on action. Immediately after the switch statement is another break, which exits from the while loop in which the switch is embedded.

Basically, TriageNewIrp is distinguishing between system power IRPs (that is, IRPs whose Type is SystemPowerState ) that increase the power level, system power IRPs that leave the power level alone or reduce it, and device power IRPs (that is, IRPs whose Type is DevicePowerState ), regardless of whether they raise or lower the power level. The state machine doesn't distinguish at this stage between QUERY_POWER and SET_POWER requests, so they end up being treated very similarly up to a point.

For us to know whether power is rising or falling, our device extension needs two variables for keeping track of system power and device power states:

typedef struct _GENERIC_EXTENSION {
  ...
  DEVICE_POWER_STATE devpower; // current dev power state
  SYSTEM_POWER_STATE syspower; // current sys power state
  } GENERIC_EXTENSION, *PGENERIC_EXTENSION;

We initialize these values to PowerDeviceD0 and PowerSystemWorking, respectively, when the client driver first registers with GENERIC.SYS.

You can guess from context that the device extension also has a BOOLEAN member named StalledForPower. This flag, when set, indicates that the substantive IRP queue is presently stalled for purposes of power management. Incidentally, you'll notice (if you've got the right sort of nasty and suspicious mind to be doing device driver programming, that is) that I'm not explicitly synchronizing access to the power state fields or this flag. No additional synchronization is required beyond the serialization that the Power Manager already imposes.

I'll discuss the three initial categories of IRPs separately now.

System Power IRPs That Increase Power

If a system power IRP implies an increase in the system power level, you'll forward it immediately to the next lower driver. In your completion routine for the system power IRP, you'll request the corresponding device power IRP and return STATUS_MORE_PROCESSING_REQUIRED to temporarily halt the completion process. In a completion routine for the device power IRP, you'll finish the completion processing for the system power IRP. Figure 8-5 diagrams the flow of the IRP through all of the drivers. Figure 8-6 is a state diagram that shows how our finite state machine handles the IRP.

Click to view at full size.

Figure 8-5. IRP flow when increasing system power.

Click to view at full size.

Figure 8-6. State transitions when increasing system power.

In terms of how the code works, I showed you earlier that TriageNewIrp puts the machine into the SysPowerUpPending state and requests the ForwardMainIrp action, which is as follows:

case ForwardMainIrp:
  {
  IoCopyCurrentIrpStackLocationToNext(Irp);
  IoSetCompletionRoutine(Irp, (PIO_COMPLETION_ROUTINE)
    MainCompletionRoutine, (PVOID) ctx, TRUE, TRUE, TRUE);
  PoCallDriver(pdx->LowerDeviceObject, Irp);
  break;
  }

HandlePowerEvent will now return STATUS_PENDING, as mandated by the code we already saw in TriageNewIrp. This return value percolates back out through GenericDispatchPower and, presumably, the client driver's IRP_MJ_POWER dispatch function.

Our next contact with this IRP is when the bus driver completes it. Our own MainCompletionRoutine gets control as part of the completion process, saves the IRP's ending status in the context structure's status field, and invokes the finite state machine:

NTSTATUS MainCompletionRoutine(PDEVICE_OBJECT junk, PIRP Irp,
  PPOWCONTEXT ctx)
  {
  ctx->status = Irp->IoStatus.Status;
  return HandlePowerEvent(ctx, MainIrpComplete);
  }

Our initial action will be SysPowerUpComplete:




1 


2 



3 
case SysPowerUpComplete:
  {
  if (!NT_SUCCESS(ctx->status))
    action = CompleteMainIrp;
  else
    {
    if (stack->MinorFunction == IRP_MN_SET_POWER)
      pdx->syspower = stack->Parameters.Power.State.SystemState;
    action = SelectDState;
    ctx->state = SubPowerUpPending;
    status = STATUS_MORE_PROCESSING_REQUIRED;
    }
  continue;
  }

  1. If the IRP failed in the lower levels of the driver hierarchy, we're going to let it complete without doing any more work on this power event. I'll explain in the next section, "Dealing with Failure," what CompleteMainIrp does.
  2. This is where we record the new system power state. We use the syspower value when we check to see whether a new system IRP is raising or lowering power.
  3. We've been called from MainCompletionRoutine and now want to interrupt completion of the system IRP while we process the device IRP we're about to originate. Hence, we'll cause MainCompletionRoutine to return STATUS_MORE_PROCESSING_REQUIRED.

Dealing with Failure

If the IRP failed, you can see that we'll do the CompleteMainIrp action next:



1 
2 

3 




4 
5 
case CompleteMainIrp:
  {
  PoStartNextPowerIrp(Irp);
  if (event == MainIrpComplete)
    status = ctx->status;
  else
    {
    Irp->IoStatus.Status = ctx->status;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    }
  IoReleaseRemoveLock(pdx->RemoveLock, Irp);
  if (ctx->UnstallQueue)
    {
    pdx->StalledForPower = FALSE;
    RestartRequests(pdx->dqReadWrite, pdx->DeviceObject);
    }
  action = DestroyContext;
  continue;
  }

  1. Here's the call to PoStartNextPowerIrp that we must make for each power IRP while we still own it.
  2. If we were entered to handle a MainIrpComplete event, our caller must have been MainCompletionRoutine, and the first action routine will have set status equal to STATUS_MORE_PROCESSING_REQUIRED to short-circuit the completion process. Since we've decided we want to complete this IRP after all—that's why we're at CompleteMainIrp—the right thing to do is to return a different status code and allow the completion process to take its normal course.
  3. If we were entered for any other event, we need to explicitly complete the IRP.
  4. This IoReleaseRemoveLock call balances the call to IoAcquireRemoveLock that we did during TriageNewIrp.
  5. I'll explain what this block of code is all about when I talk about device IRPs later in this chapter.

When handling a system power IRP that increases power, the machine enters CompleteMainIrp after a MainIrpComplete event. CompleteMainIrp will therefore arrange to return the error status we originally fetched (inside MainCompletionRoutine) from the IRP. That will permit the completion process to continue. There are other code paths we haven't studied yet in which CompleteMainIrp calls IoCompleteRequest instead. CompleteMainIrp finishes by requesting yet another action:




1 

2 
case DestroyContext:
  {
  if (ctx->pev)
    KeSetEvent(ctx->pev, IO_NO_INCREMENT, FALSE);
  else
    ExFreePool(ctx);
  break;
  }

  1. This branch is taken when SendDeviceSetPower calls the state machine engine to create and wait for a device IRP.
  2. This branch is taken when GenericDispatchPower calls the state machine engine to process an IRP.

DestroyContext is, of course, the last action the finite state machine ever performs.

Mapping the System State to a Device State

The other possible path out of SysPowerUpComplete generates a device power IRP with a power state that corresponds to the system power state. We perform the mapping of system to device states in the SelectDState action:

case SelectDState:
  {
  SYSTEM_POWER_STATE sysstate = 
    stack->Parameters.Power.State.SystemState;
  if (sysstate == PowerSystemWorking)
    ctx->devstate = PowerDeviceD0;
  else
    {
    DEVICE_POWER_STATE maxstate =
      pdx->devcaps.DeviceState[sysstate];
    DEVICE_POWER_STATE minstate = pdx->WakeupEnabled ?
      pdx->devcaps.DeviceWake : PowerDeviceD3;
    ctx->devstate = minstate > maxstate ? minstate : maxsstate;
    }
  ctx->MinorFunction = stack->MinorFunction;
  action = SendDeviceIrp;
  continue;
  }

By the way, the Power Manager never transitions directly from one low system power state to another: it always moves via PowerSystemWorking. That's why I coded SelectDState to choose one mapping for PowerSystemWorking and a different mapping for all other system power states.

In general, we always want to put our device into the lowest power state that's consistent with current device activity, with our own wake-up feature (if any), with device capabilities, and with the impending state of the system. These factors can interplay in a relatively complex way. To explain them fully, I need to digress briefly and talk about a Plug and Play IRP that I avoided discussing in Chapter 6: IRP_MN_QUERY_CAPABILITIES.

The PnP Manager sends a capabilities query shortly after starting your device and perhaps at other times. The parameter for the request is a DEVICE_CAPABILITIES structure that contains several fields relevant to power management. Since this is the only time in this book I'm going to discuss this structure, I'm showing you the entire declaration:

typedef struct _DEVICE_CAPABILITIES {
    USHORT Size;
    USHORT Version;
    ULONG DeviceD1:1;
    ULONG DeviceD2:1;
    ULONG LockSupported:1;
    ULONG EjectSupported:1;
    ULONG Removable:1;
    ULONG DockDevice:1;
    ULONG UniqueID:1;
    ULONG SilentInstall:1;
    ULONG RawDeviceOK:1;
    ULONG SurpriseRemovalOK:1;
    ULONG WakeFromD0:1;
    ULONG WakeFromD1:1;
    ULONG WakeFromD2:1;
    ULONG WakeFromD3:1;
    ULONG HardwareDisabled:1;
    ULONG NonDynamic:1;
    ULONG Reserved:16;

    ULONG Address;
    ULONG UINumber;

    DEVICE_POWER_STATE DeviceState[PowerSystemMaximum];
    SYSTEM_POWER_STATE SystemWake;
    DEVICE_POWER_STATE DeviceWake;
    ULONG D1Latency;
    ULONG D2Latency;
    ULONG D3Latency;
} DEVICE_CAPABILITIES, *PDEVICE_CAPABILITIES;

Table 8-4 describes the fields in this structure that relate to power management.

Table 8-4. Power-management fields in DEVICE_CAPABILITIES structure.

Field Description
DeviceState Array of highest device states possible for each system state
SystemWake Lowest system power state from which the device can generate a wake-up signal for the system—PowerSystemUnspecified indicates that device can't wake up the system
DeviceWake Lowest power state from which the device can generate a wake-up signal—PowerDeviceUnspecified indicates that device can't generate a wake-up signal
D1Latency Approximate worst-case time (in 100-microsecond units) required for device to switch from D1 to D0 states
D2Latency Approximate worst-case time (in 100-microsecond units) required for device to switch from D2 to D0 states
D3Latency Approximate worst-case time (in 100-microsecond units) required for device to switch from D3 to D0 states
WakeFromD0 Flag indicating whether device's system wake-up feature is operative when the device is in the indicated state
WakeFromD1 Same as above
WakeFromD2 Same as above
WakeFromD3 Same as above

You normally handle the query capabilities IRP synchronously by passing it down and waiting for the lower layers to complete it. After the pass-down, you'll make any desired changes to the capabilities recorded by the bus driver. Your subdispatch routine would look like this one:








1 






2 
3 
NTSTATUS HandleQueryCapabilities(IN PDEVICE_OBJECT fdo,
  IN PIRP Irp)
  {
  PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
  PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension;
  PDEVICE_CAPABILITIES pdc = stack->
    Parameters.DeviceCapabilities.Capabilities;
  if (pdc->Version < 1)
    return DefaultPnpHandler(fdo, Irp);
  NTSTATUS status = ForwardAndWait(fdo, Irp);
  if (NT_SUCCESS(status))
    {
    stack = IoGetCurrentIrpStackLocation(Irp);
    pdc = stack->Parameters.DeviceCapabilities.Capabilities;
    <stuff>
    pdx->devcaps = *pdc;
    }
  return CompleteRequest(Irp, status);
  }

  1. The device capabilities structure has a version number member, which is currently always equal to 1. The structure is designed to always be upward compatible, so you'll be able to work with the version defined in the DDK that you build your driver with and with any later incarnation of the structure. If, however, you're confronted with a structure that's older than you're able to work with, you should just ignore this IRP by passing it along.
  2. Here's where you can override any capabilities that were set by the bus driver.
  3. It's a good idea to make a copy of the capabilities structure. I already described how you'll use the DeviceState map when you receive a system power IRP. You might have occasion to consult other fields in the structure, too.

Don't bother altering the characteristics structure before you pass this IRP down: the bus driver will completely reinitialize it. When you regain control, you can modify SystemWake and DeviceWake to specify a higher power state than the bus driver thought was appropriate. You can't specify a lower power state for the wake-up fields, and you can't override the bus driver's decision that your device is incapable of waking the system. If your device is ACPI-compliant, the ACPI filter will set the LockSupported, EjectSupported, and Removable flags automatically based on the ACPI Source Language (ASL) description of the device—you won't need to worry about these capabilities.

You might want to set the SurpriseRemovalOK flag at point "2" in the capabilities handler. Setting the flag suppresses the dialog box that Windows 2000 normally presents when it detects the sudden and unexpected removal of a device. It's normally okay for the end user to remove a universal serial bus (USB) or 1394 device without first telling the system, and the function driver should set this flag to avoid annoying the user.

To return to our discussion of SelectDState, suppose we're dealing with a set-power request that will take the computer from Working to Sleeping1; we'll therefore execute the second branch of the if statement in SelectDState. Let's suppose that the bus driver knows that our device can be in any of the states D0, D1, D2, or D3 when the system is in Sleeping1. When it answered the PnP capabilities query it would therefore have filled in DeviceState [PowerSystemSleeping1] in the device capabilities structure with the value PowerDeviceD0 because D0 is the highest power state our device can occupy for this system state. We'll initially record PowerDeviceD0, then, as the value of maxstate.

Our device might also have a wake-up feature. I'll say more about wake-up later on. If so, the bus driver will have set the DeviceWake member of the capabilities structure equal to the lowest power state from which wake-up can occur. Let's suppose that value is PowerDeviceD1. If our wake-up feature happens to be enabled right now, we'll set minstate to PowerDeviceD1.

If we don't have a wake-up feature, however, or if we have one and it's not currently enabled, we're free to choose any device power state lower than the maxstate value we derived from the device capabilities structure. We could blindly choose D3, but that wouldn't be right for every type of device because generally speaking it takes longer to resume from D3 to D0 than from D2 or D1. The choice you make in this case therefore depends on factors for which I can't give you cut-and-dried guidance. If your device is capable of the D2 state, for example, you might decide to enter D2 for any of the system sleeping states and reserve D3 for the hibernate and shutdown states.

It seems reasonable to leave your device in a low power state when the system resumes from a sleeping state. The DDK suggests you do this, and so does good sense. There are two situations in which you would need to restore your device to D0 when the system goes to Working. The first situation is when your device has the INRUSH characteristic. In this case, the Power Manager won't send power IRPs to any other INRUSH device until you've powered on your device. The second situation is when you've got substantive IRPs queued and waiting to run once power is back. Notwithstanding what a good idea it seems to be to just leave your device in a low power state, you'll notice that the code fragment I just showed you for SelectDState unconditionally picks the D0 state. In my testing, Windows 2000 seemed to hang coming out of standby if I didn't do that. Maybe there's a mistake in my code or in the operating system. Stay tuned to my errata page for more information about this.

Requesting a Device Power IRP

In Chapter 5, "The I/O Request Packet," I discussed support functions such as IoAllocateIrp that you can use to build IRPs. You don't use those functions when you want to create power IRPs, though. (Actually, you would use one of those functions for an IRP_MN_POWER_SEQUENCE request, but not for the other IRP_MJ_POWER requests.) Instead, you use PoRequestPowerIrp, as shown here in the code for the SendDeviceIrp action we'd perform after SelectDState:



1 







2 


3 

4 
case SendDeviceIrp:
  {
  if (win98 && ctx->devstate == pdx->devpower)
    {
    ctx->status = STATUS_SUCCESS;
    action = actiontable[ctx->state][AsyncNotify];
    continue;
    }
  POWER_STATE powstate;
  powstate.DeviceState = ctx->devstate;
  NTSTATUS postatus = PoRequestPowerIrp(pdx->Pdo,
    ctx->MinorFunction, powstate, (PREQUEST_POWER_COMPLETE)
    PoCompletionRoutine, ctx, NULL);
  if (NT_SUCCESS(postatus))
    break;
  action = CompleteMainIrp;
  ctx->status = postatus;
  continue;
  }

  1. Refer to "Windows 98 Compatibility Notes" at the end of this chapter for an explanation of what this section of code is all about.
  2. The first argument to PoRequestPowerIrp is the address of the physical device object (PDO) for our device. Note that the IRP we're requesting will actually get sent to the topmost filter device object (FiDO) anyway. The second argument is the minor function code for the IRP we want to send. This will either be IRP_MN_QUERY_POWER or IRP_MN_SET_POWER in our case. The third argument is a POWER_STATE that should contain a device power state value when we're requesting a query or set operation. The fourth and fifth arguments are, respectively, the address of a callback routine for when the IRP finishes and a context parameter for that function. The last argument is an optional address of a PIRP variable to receive the address of the IRP that PoRequestPowerIrp creates.
  3. PoRequestPowerIrp normally returns STATUS_PENDING after creating and launching the power IRP you've requested. This, and any success code, in fact, mean that our callback function will eventually be called. It will generate another call to HandlePowerEvent, so we're done with this invocation of the engine.
  4. If PoRequestPowerIrp fails, it never created the IRP and our callback function will never be called. We therefore want to fail the system IRP with whatever status code we've gotten.

In the system power-up scenario I'm currently discussing, our state machine will be in the SubPowerUpPending state when we get to SendDeviceIrp. The status variable will be STATUS_MORE_PROCESSING_REQUIRED, which is the right value for MainCompletionRoutine to return if we're going to wait for the device IRP to finish. Normally, then, when we break from SendDeviceIrp, we'll interrupt the completion processing for the system power IRP for the time being.

I'll discuss what happens to the device IRP we request via PoRequestPowerIrp later on.

Finishing the System IRP

Eventually, the device IRP that SendDeviceIrp requests will finish, whereupon the Power Manager will call the PoCompletionRoutine callback routine. It in turn calls HandlePowerEvent with the event code AsyncNotify. Our first action in the SubPowerUpPending state will be SubPowerUpComplete:

case SubPowerUpComplete:
  {
  if (status == -1)
    status = STATUS_SUCCESS;
  action = CompleteMainIrp;
  continue;
  }

The only job performed by this action routine is to alter the status variable. The reason we do that is that we have an ASSERT statement at the end of HandlePowerEvent to make sure someone changes status. In this exact scenario, it doesn't matter what status value we return because PoCompletionRoutine is a void function. But you don't want to trigger an ASSERT and a BSOD unless something is really wrong.

The next action after SubPowerUpComplete is CompleteMainIrp, which leads to DestroyContext. You've already seen what those action routines do.

System Power IRPs That Decrease Power

If the system power IRP implies no change or a reduction in the system power level, you'll request a device power IRP with the same minor function code (set or query) and a device power state that corresponds to the system state. When the device power IRP completes, you'll forward the system power IRP to the next lower driver. You'll need a completion routine for the system power IRP so that you can make the requisite call to PoStartNextPowerIrp and so that you can perform some additional cleanup. See Figure 8-7 for an illustration of how the IRPs flow through the system in this case.

Click to view at full size.

Figure 8-7. IRP flow when decreasing system power.

Figure 8-8 diagrams how our finite state machine handles this type of IRP. TriageNewIrp puts the state machine into the SubPowerDownPending state and jumps to the SelectDState action. You already saw that SelectDState selects a device power state and leads to a SendDeviceIrp action to request a device power IRP. In the system power-down scenario, we'll be specifying a lower power state in this device IRP.

Click to view at full size.

Figure 8-8. State transitions when decreasing system power.

When the device IRP finishes, we execute SubPowerDownComplete:

case SubPowerDownComplete:
  {
  if (status == -1)
    status = STATUS_SUCCESS;
  if (NT_SUCCESS(ctx->status))
    {
    ctx->state = SysPowerDownPending;
    action = ForwardMainIrp;
    }
  else
    action = CompleteMainIrp;
  continue;
  }

As you can see, if the device IRP fails, we fail the system IRP too. If the device IRP succeeds, we enter the SysPowerDownPending state and exit via ForwardMainIrp. When the system IRP finishes, and MainCompletionRoutine runs, we'll execute SysPowerDownComplete:

case SysPowerDownComplete:
  {
  if (stack->MinorFunction == IRP_MN_SET_POWER)
    pdx->syspower = stack->Parameters.Power.State.SystemState;
  action = CompleteMainIrp;
  continue;
  }

The only purpose of this action is to record the new system power state in our device extension and then to exit via CompleteMainIrp and DestroyContext.

Device Power IRPs

All we actually do with system power IRPs is act as a conduit for them and request a device IRP either as the system IRP travels down the driver stack or as it travels back up. We have more work to do with device power IRPs, however.

To begin with, we don't want our device occupied by any substantive I/O operations while a change in the device power state is under way. As early as we can in a sequence that leads to powering down our device, therefore, we wait for any outstanding operation to finish, and we stop processing new operations. Since we're not allowed to block the system thread in which we receive power IRPs, an asynchronous mechanism is required. Once the current IRP finishes, we'll continue processing the device IRP.

If the device power IRP implies an increase in the device power level, we'll forward it to the next lower driver. Refer to Figure 8-9 for an illustration of how the IRP flows through the system. The bus driver will process a device set-power IRP by, for example, using whatever bus-specific mechanism is appropriate to turn on the flow of electrons to your device, and it will complete the IRP. Your completion routine will initiate whatever operations are required to restore context information to the device, and it will return STATUS_MORE_PROCESSING_REQUIRED to interrupt the completion process for the device IRP. When the context restore operation finishes, you'll resume processing substantive IRPs and finish completing the device IRP.

Click to view at full size.

Figure 8-9. IRP flow when increasing device power.

If the device power IRP implies no change or a reduction in the device power level, you perform any device-specific processing (asynchronously, as we've discussed) and then forward the device IRP to the next lower driver. See Figure 8-10. The "device-specific processing" for a set operation includes saving device context information, if any, in memory so that you can restore it later. There probably isn't any device-specific processing for a query operation beyond deciding whether to succeed or fail the query. The bus driver completes the request. In the case of a query operation, you can expect the bus driver to complete the request with STATUS_SUCCESS to indicate acquiescence in the proposed power change. In the case of a set operation, you can expect the bus driver to take whatever bus-dependent steps are required to put your device into the specified device power state. Your completion routine cleans up by calling PoStartNextPowerIrp, among other things.

Click to view at full size.

Figure 8-10. IRP flow when decreasing device power.

I invented StallRequestsAndNotify for use in TriageNewIrp. (It's so new that Chapter 6, where all the other DEVQUEUE functions are described, was already beyond my reach when I created it.) The first step it performs is to stall the request queue. If the device is currently busy, it records a callback routine address—in this case, GenericSaveRestoreComplete, which I'm overloading for purposes of receiving a notification—and returns STATUS_PENDING. TriageNewIrp will then exit in the QueueStallPending state.

If the device isn't busy, StallRequestsAndNotify returns STATUS_SUCCESS without arranging any callback; the device can't become busy now because the queue is stalled. TriageNewIrp will then go directly to the QueueStallComplete action.

We reach the QueueStallComplete routine either directly from TriageNewIrp (when the device is idle or if the queue was previously stalled for some other power-related reason) or when the client driver calls StartNextPacket to indicate that it's finished processing the current IRP. StartNextPacket calls the notification routine we gave to StallRequestsAndNotify, and that routine signals an AsyncNotify event to the state machine. QueueStallComplete now separates the device IRP into one of four categories, as follows:

case QueueStallComplete:
  {
  if (stack->MinorFunction == IRP_MN_SET_POWER)
    {
    if (stack->Parameters.Power.State.DeviceState < pdx->devpower)
      {
      action = ForwardMainIrp;
      SETSTATE(DevPowerUpPending);
      }
    else
      action = SaveContext;
    }
  else
    {
    if (stack->Parameters.Power.State.DeviceState < pdx->devpower)
      {
      action = ForwardMainIrp;
      SETSTATE(DevQueryUpPending);
      }
    else
      action = DevQueryDown;
    }
  continue;
  }

The upshot of QueueStallComplete is that we perform the next action indicated in Table 8-5 for the type of IRP we're dealing with.

Table 8-5. Next action for device IRPs.

Minor Function More or Less Power? Next Action
IRP_MN_QUERY_POWER More power
Less or same power
ForwardMainIrp
DevQueryDown
IRP_MN_SET_POWER More power
Less or same power
ForwardMainIrp
SaveContext

Setting a Higher Device Power State

Figure 8-11 diagrams the state transitions that occur for an IRP_MN_SET_POWER that specifies a higher device power state than that which is current.

Click to view at full size.

Figure 8-11. State transitions when setting a higher device power state.

ForwardMainIrp will install a completion routine and send the IRP down the driver stack. When MainCompletionRoutine eventually gains control, it signals a MainIrpComplete event. We will be in the DevPowerUpPending state, so we'll execute the DevPowerUpComplete action:

case DevPowerUpComplete:
  {
  if (!NT_SUCCESS(ctx->status) || stack->MinorFunction !=
    IRP_MN_SET_POWER)
    {
    action = CompleteMainIrp;
    continue;
    }
  status = STATUS_MORE_PROCESSING_REQUIRED;
  DEVICE_POWER_STATE oldpower = pdx->devpower;
  pdx->devpower = stack->Parameters.Power.State.DeviceState;
  if (pdx->RestoreContext)
    {
    ctx->state = ContextRestorePending;
    (*pdx->RestoreDeviceContext)(pdx->DeviceObject, oldpower,
      pdx->devpower, ctx);
    break;
    }
  action = ContextRestoreComplete;
  continue;
  }

The main task we need to accomplish is restoring any device context that was lost during the previous power-down transition. Since we're not allowed to block our thread, we initiate whatever operations are required and return STATUS_MORE_PROCESSING_REQUIRED to interrupt the completion of the device IRP. When the restore operations finish, the client driver calls GenericSaveRestoreComplete, which signals an AsyncNotify event. We'll be in the ContextRestorePending state at that point, so we'll perform the ContextRestoreComplete action:

case ContextRestoreComplete:
  {
  if (event == AsyncNotify)
    status = STATUS_SUCCESS;
  action = CompleteMainIrp;
  if (!NT_SUCCESS(ctx->status) || pdx->devpower != PowerDeviceD0)
    continue;
  ctx->UnstallQueue = TRUE;
  continue;
  }

The main result of this action routine is that we unstall the queue of substantive IRPs at the conclusion of an IRP_MN_SET_POWER to the D0 state. We exit via CompleteMainIrp and DestroyContext.

Querying for a Higher Device Power State

You shouldn't expect to receive an IRP_MN_QUERY_POWER that refers to a higher power state than your device is already in, but you shouldn't crash the system if you happen to receive one. The following code shows what GENERIC does when such a query completes in the lower level drivers. (Refer to Figure 8-12 for a state diagram.)

case DevQueryUpComplete:
  {
  if (NT_SUCCESS(ctx->status) && pdx->QueryPower)
    if (!(*pdx->QueryPower)(pdx->DeviceObject, pdx->devpower,
      stack->Parameters.Power.State.DeviceState))
      ctx->status = STATUS_UNSUCCESSFUL;
  action = CompleteMainIrp;
  continue;
  }

That is, GENERIC allows the client driver to accept or veto the query by calling its QueryPower function, and then it exits via CompleteMainIrp and DestroyContext.

Click to view at full size.

Figure 8-12. State transitions for a query about a higher device power state.

Setting a Lower Device Power State

If the IRP is an IRP_MN_SET_POWER for the same or a lower device power state than current, the finite state machine goes through the state transitions diagrammed in Figure 8-13.

Click to view at full size.

Figure 8-13. State transitions when setting a lower device power state.

SaveContext will initiate an asynchronous process to save any device context that will be lost when the device loses power:

case SaveContext:
  {
  DEVICE_POWER_STATE devpower = 
    stack->Parameters.Power.State.DeviceState;
  if (pdx->SaveDeviceContext && devpower > pdx->devpower)
    {
    ctx->state = ContextSavePending;
    (*pdx->SaveDeviceContext)(pdx->DeviceObject, pdx->devpower,
      devpower, ctx);
    break;
    }
  action = ContextSaveComplete;
  }

When the save operations finish, the client driver calls GenericSaveRestoreComplete, which signals an AsyncNotify event. We'll be in the ContextSavePending state at that point, so we'll perform the ContextSaveComplete action:



1 





2 

3 
4 
case ContextSaveComplete:
  {
  if (event == AsyncNotify)
    status = STATUS_SUCCESS;
  ctx->state = DevPowerDownPending;
  action = ForwardMainIrp;
  DEVICE_POWER_STATE devpower =
    stack->Parameters.Power.State.DeviceState;
  if (devpower <= pdx->devpower)
    continue;
  pdx->devpower = devpower;
  if (devpower > PowerDeviceD0)
    ctx->UnstallQueue = FALSE;
  continue;
  }

  1. We'll come directly here from GenericSaveRestoreComplete, and we need to change status to prevent an ASSERT failure (but not for any other reason).
  2. If we didn't actually change power, there's no more work to do here.
  3. This is where we record the new device power state when we're powering down.
  4. If the device is now in a low-power or no-power state, we want to leave the substantive IRP queue stalled.

The next action, ForwardMainIrp, sends the device IRP down the driver stack. The bus driver will turn the physical flow of current off and complete the IRP. We'll see it next when MainCompletionRoutine signals a MainIrpComplete event, which takes us directly to CompleteMainIrp and thence to DestroyContext.

Querying for a Lower Device Power State

An IRP_MN_QUERY_POWER that specifies the same or a lower device power state than current is the basic vehicle by which a function driver gets to vote on changes in power levels. Although the DDK doesn't specifically say you should create one of these requests when you handle a system query, it's a good idea to do so. You have to handle device queries anyway and might as well put all the query logic in one place. Figure 8-14 shows how our state machine will handle such a query.

The DevQueryDown action follows QueueStallComplete for this kind of IRP:

case DevQueryDown:
  {
  DEVICE_POWER_STATE devpower = 
    stack->Parameters.Power.State.DeviceState;
  if (devpower > pdx->devpower 
    && pdx->QueryPower 
    && !(*pdx->QueryPower)(pdx->DeviceObject,
    pdx->devpower, devpower))
    {
    ctx->status = STATUS_UNSUCCESSFUL;
    action = DevQueryDownComplete;
    continue;
    }
  ctx->state = DevQueryDownPending);
  action = ForwardMainIrp;
  continue;
  }

Click to view at full size.

Figure 8-14. State transitions for a query about a lower device power state.

GENERIC basically lets the client driver decide whether the query should succeed. If the client driver says "Yes," we enter the DevQueryDownPending state and exit via ForwardMainIrp to send the query down the driver stack. Completion of the IRP sends us to the DevQueryDownComplete action:

case DevQueryDownComplete:
  {
  if (NT_SUCCESS(ctx->status))
    ctx->UnstallQueue = FALSE;
  action = CompleteMainIrp;
  continue;
  }

The basic action we take is to leave the substantive IRP queue stalled if the query succeeds. (CompleteMainIrp will unstall the queue if it sees the UnstallQueue flag set in the context structure. Clearing the flag causes this step to be skipped.) Recall that we first stalled the queue when we received the query. We'll leave it stalled until someone eventually sends us a set-power IRP to put the device into D0.