[Previous] [Next]

Kernel Dispatcher Objects

The Windows NT kernel provides five types of synchronization objects that you can use to control the flow of nonarbitrary threads. See Table 4-1 for a summary of these kernel dispatcher object types and their uses. At any moment, one of these objects is in one of two states: signalled or not-signalled. At times when it's permissible for you to block a thread in whose context you're running, you can wait for one or more objects to reach the signalled state by calling KeWaitForSingleObject or KeWaitForMultipleObjects. The kernel also provides routines for initializing and controlling the state of each of these objects.

Table 4-1. Kernel dispatcher objects.

Object Data Type Description
Event KEVENT Blocks a thread until some other thread detects that an event has occurred
Semaphore KSEMAPHORE Used instead of an event when an arbitrary number of wait calls can be satisfied
Mutex KMUTEX Excludes other threads from executing a particular section of code
Timer KTIMER Delays execution of a thread for some period of time
Thread KTHREAD Blocks one thread until another thread terminates

In the next few sections, I'll describe how to use the kernel dispatcher objects. I'll start by explaining when you can block a thread by calling one of the wait primitives, and then I'll discuss the support routines that you use with each of the object types. I'll finish this section by discussing the related concepts of thread alerts and asynchronous procedure call delivery.

How and When You Can Block

To understand when and how it's permissible for a WDM driver to block a thread on a kernel dispatcher object, you have to know some basic facts about threads. In general, whatever thread was executing at the time of a software or hardware interrupt continues to be the "current" thread while the kernel processes the interrupt. We speak of executing kernel-mode code "in the context" of this current thread. In response to interrupts of various kinds, the Windows NT scheduler might decide to switch threads, of course, in which case a new thread becomes "current."

We use the terms arbitrary thread context and nonarbitrary thread context to describe the precision with which we can know the thread in whose context we're currently operating in a driver subroutine. If we know that we're in the context of the thread which initiated an I/O request, the context is not arbitrary. Most of the time, however, a WDM driver can't know this fact because chance usually controls which thread is active when the interrupt occurs that results in the driver being called. When applications issue I/O requests, they cause a transition from user mode to kernel mode. The I/O Manager routines that create an IRP and send it to a driver dispatch routine continue to operate in this nonarbitrary thread context, as does the first dispatch routine to see the IRP. We use the term highest-level driver to describe the driver whose dispatch routine first receives the IRP.

As a general rule, only the highest-level driver for a given device can know for sure that it's operating in a nonarbitrary thread context. This is because driver dispatch routines often put requests onto queues and return back to their callers. Queued requests are then removed from their queues and forwarded to lower-level drivers from within callback routines that execute later. Once a dispatch routine pends a request, all subsequent processing of that request must occur in arbitrary thread context.

Having explained these facts about thread context, we can state a simple rule about when it's okay to block a thread:

Block only the thread that originated the request you're working on.

To follow this rule, you generally have to be the highest-level driver for the device that's getting sent the IRP. One important exception occurs for requests like IRP_MN_START_DEVICE—see Chapter 6—that all drivers process in a synchronous way. That is, drivers don't queue or pend certain requests. When you receive one of these requests, you can trace the call/return stack directly back to the originator of the request. As we'll see in Chapter 6, it's not only okay for you to block the thread in which you process these requests, but blocking and waiting is the prescribed way to handle them.

One more rule should be obvious from the fact that thread switching doesn't occur at elevated IRQL:

You can't block a thread if you're executing at or above DISPATCH_LEVEL.

As a practical matter, this rule means that you must be in your DriverEntry or AddDevice function to block the current thread, or else in a driver dispatch function. All of these functions execute at PASSIVE_LEVEL. I'm hard-pressed to think of why you might need to block to finish DriverEntry or AddDevice, even, because those functions merely initialize data structures for downstream use.

Waiting on a Single Dispatcher Object

You call KeWaitForSingleObject as illustrated in the following example:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER timeout;
NTSTATUS status = KeWaitForSingleObject(object, WaitReason,
  WaitMode, Alertable, &timeout);

As suggested by the ASSERT, you must be executing at or below DISPATCH_LEVEL to even call this service routine.

In this call, object points to the object on which you wish to wait. While this argument is typed as a PVOID, it should be a pointer to one of the dispatcher objects listed in Table 4-1. The object must be in nonpaged memory—for example, in a device extension structure or other data area allocated from the nonpaged pool. For most purposes, the execution stack can be considered nonpaged.

WaitReason is a purely advisory value chosen from the KWAIT_REASON enumeration. No code in the kernel actually cares what value you supply here, so long as you don't specify WrQueue. (Internally, scheduler code bases some decisions on whether a thread is currently blocked for this "reason.") The reason a thread is blocked is saved in an opaque data structure, though. If you knew more about that data structure and were trying to debug a deadlock of some kind, you could perhaps gain clues from the reason code. The bottom line: always specify Executive for this parameter; there's no reason to say anything else.

WaitMode is one of the two values of the MODE enumeration: KernelMode or UserMode. Alertable is a simple Boolean value. Unlike WaitReason, these parameters do make a difference to the way the system behaves, by controlling whether the wait can be terminated early in order to deliver asynchronous procedure calls of various kinds. I'll explain these interactions in more detail in "Thread Alerts and APCs" later in this chapter. Waiting in user mode also authorizes the Memory Manager to swap your thread's kernel-mode stack out. You'll see examples in this book and elsewhere where drivers create event objects, for instance, as automatic variables. A bug check would result if some other thread were to call KeSetEvent at elevated IRQL at a time when the event object was absent from memory. The bottom line: you should probably always wait in KernelMode and specify FALSE for the alertable parameter.

The last parameter to KeWaitForSingleObject is the address of a 64-bit timeout value, expressed in 100-nanosecond units. A positive number for the timeout is an absolute timestamp relative to the same January 1, 1601, epoch of the system clock. You can determine the current time by calling KeQuerySystemTime. A negative number is an interval relative to the current time. If you specify an absolute time, a subsequent change to the system clock alters the duration of the timeout you might experience. That is, the timeout doesn't expire until the system clock equals or exceeds whatever absolute value you specify. In contrast, if you specify a relative timeout, the duration of the timeout you experience is unaffected by changes in the system clock.

Specifying a zero timeout causes KeWaitForSingleObject to return immediately with a status code indicating whether the object is in the signalled state. If you're executing at DISPATCH_LEVEL, you must specify a zero timeout because blocking is not allowed. Each kernel dispatcher object offers a KeReadStateXxx service function that allows you to determine the state of the object. Reading the state is not completely equivalent to waiting for zero time, however: when KeWaitForSingleObject discovers that the wait is satisfied, it performs the side effects that the particular object requires. In contrast, reading the state of the object does not perform the side effects, even if the object is already signalled and a wait would be satisfied if it were requested right now.

Specifying a NULL pointer for the timeout parameter is okay and indicates an infinite wait.

The return value indicates one of several possible results. STATUS_SUCCESS is the result you expect and indicates that the wait was satisfied. That is, either the object was in the signalled state when you made the call to KeWaitForSingleObject, or else the object was in the not-signalled state and later became signalled. When the wait is satisfied in this way, there may be side effects that need to be performed on the object. The nature of these side effects depends on the type of the object, and I'll explain them later in this chapter in connection with discussing each type of object. (For example, a synchronization type of event will be reset after your wait is satisfied.)

A return value of STATUS_TIMEOUT indicates that the specified timeout occurred without the object reaching the signalled state. If you specify a zero timeout, KeWaitForSingleObject returns immediately with either this code (indicating that the object is not-signalled) or STATUS_SUCCESS (indicating that the object is signalled). This return value is not possible if you specify a NULL timeout parameter pointer, because you thereby request an infinite wait.

Two other return values are possible. STATUS_ALERTED and STATUS_USER_APC mean that the wait has terminated without the object having been signalled because the thread has received an alert or a user-mode APC, respectively. I'll discuss these concepts a bit further on in "Thread Alerts and APCs."

Waiting on Multiple Dispatcher Objects

KeWaitForMultipleObjects is a companion function to KeWaitForSingleObject that you use when you want to wait for one or all of several dispatcher objects simultaneously. Call this function as in the example below.

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER timeout;
NTSTATUS status = KeWaitForMultipleObjects(count, objects, 
  WaitType, WaitReason, WaitMode, Alertable, &timeout,waitblocks);

Here, objects is the address of an array of pointers to dispatcher objects, and count is the number of pointers in the array. The count must be less than or equal to the value MAXIMUM_WAIT_OBJECTS, which currently equals 64. The array, as well as each of the objects to which the elements of the array point, must be in nonpaged memory. WaitType is one of the enumeration values WaitAll or WaitAny and specifies whether you want to wait until all of the objects are simultaneously in the signalled state or whether, instead, you want to wait until any one of the objects is signalled.

The waitblocks argument points to an array of KWAIT_BLOCK structures that the kernel will use to administer the wait operation. You don't need to initialize these structures in any way—the kernel just needs to know where the storage is for the group of wait blocks that it will use to record the status of each of the objects during the pendency of the wait. If you're waiting for a small number of objects (specifically, a number no bigger than THREAD_WAIT_OBJECTS, which currently equals 3), you can supply NULL for this parameter. If you supply NULL, KeWaitForMultipleObjects uses a preallocated array of wait blocks that lives in the thread object. If you're waiting for more objects than this, you must provide nonpaged memory that's at least count * sizeof(KWAIT_BLOCK) bytes in length.

The remaining arguments to KeWaitForMultipleObjects are the same as the corresponding arguments to KeWaitForSingleObject, and most return codes have the same meaning.

If you specify WaitAll , the return value STATUS_SUCCESS indicates that all the objects managed to reach the signalled state simultaneously. If you specify WaitAny , the return value is numerically equal to the objects array index of the single object that satisfied the wait. If more than one of the objects happens to be signalled, you'll be told about one of them—maybe the lowest numbered of all the ones that are signalled at that moment, but maybe some other one. You can think of this value being STATUS_WAIT_0 plus the array index. You can perform the usual NT_SUCCESS test of the returned status before extracting the array index from the status code:

NTSTATUS status = KeWaitForMultipleObjects(...);
if (NT_SUCCESS(status))
  {
  ULONG iSignalled = (ULONG) status - (ULONG) STATUS_WAIT_0;
  ...
  }

When KeWaitForMultipleObjects returns a success code, it also performs the side effects required by the object(s) that satisfied the wait. If more than one object is signalled but you specified WaitAny, only the one that's deemed to satisfy the wait has its side effects performed.

Kernel Events

You use the service functions listed in Table 4-2 to work with kernel event objects. To initialize an event object, first reserve nonpaged storage for an object of type KEVENT and then call KeInitializeEvent:

ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL;
KeInitializeEvent(event, EventType, initialstate);

Event is the address of the event object. EventType is one of the enumeration values NotificationEvent or SynchronizationEvent. A notification event has the characteristic that, when it is set to the signalled state, it stays signalled until it is explicitly reset to the not-signalled state. Furthermore, all threads that wait on a notification event are released when the event is signalled. This is like a manual-reset event in user mode. A synchronization event, on the other hand, gets reset to the not-signalled state as soon as a single thread gets released. This is what happens in user mode when someone calls SetEvent on an auto-reset event object. The only side effect performed on an event object by KeWaitXxx is to reset a synchronization event to not-signalled. Finally, initialstate is TRUE to specify that the initial state of the event is to be signalled and FALSE to specify that the initial state is to be not-signalled.

Table 4-2. Service functions for use with kernel event objects.

Service Function Description
KeClearEvent Sets event to not-signalled, don't report previous state
KeInitializeEvent Initializes event object
KeReadStateEvent Determines current state of event
KeResetEvent Sets event to not-signalled, return previous state
KeSetEvent Sets event to signalled, return previous state

NOTE
In this series of sections on synchronization primitives, I'm repeating the IRQL restrictions that the DDK documentation describes. In the current release of Microsoft Windows 2000, the DDK is sometimes more restrictive than the OS actually is. For example, KeClearEvent can be called at any IRQL, not just at or below DISPATCH_LEVEL. KeInitializeEvent can be called at any IRQL, not just at PASSIVE_LEVEL. However, you should regard the statements in the DDK as being tantamount to saying that Microsoft might someday impose the documented restriction, which is why I haven't tried to report the true state of affairs.

You can call KeSetEvent to place an event into the signalled state:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LONG wassignalled = KeSetEvent(event, boost, wait);

As implied by the ASSERT, you must be running at or below DISPATCH_LEVEL to call this function. The event argument is a pointer to the event object in question, and boost is a value to be added to a waiting thread's priority if setting the event results in satisfying someone's wait. See the sidebar ("That Pesky Third Argument to KeSetEvent") for an explanation of the Boolean wait argument, which a WDM driver would almost never want to specify as TRUE. The return value is nonzero if the event was already in the signalled state before the call and 0 if the event was in the not-signalled state.

A multitasking scheduler needs to artificially boost the priority of a thread that waits for I/O operations or synchronization objects in order to avoid starving threads that spend lots of time waiting. This is because a thread that blocks for some reason generally relinquishes its time slice and won't regain the CPU until either it has a relatively higher priority than other eligible threads or other threads that have the same priority finish their time slices. A thread that never blocks, however, gets to complete its time slices. Unless a boost is applied to the thread that repeatedly blocks, therefore, it will spend a lot of time waiting for CPU-bound threads to finish their time slices.

You and I won't always have a good idea of what value to use for a priority boost. A good rule of thumb to follow is to specify IO_NO_INCREMENT unless you have a good reason not to. If setting the event is going to wake up a thread that's dealing with a time-sensitive data flow (such as a sound driver), supply the boost that's appropriate to that kind of device (such as IO_SOUND_INCREMENT). The important thing is to not boost the waiter for a silly reason. For example, if you're trying to handle an IRP_MJ_PNP request synchronously—see Chapter 6—you'll be waiting for lower-level drivers to handle the IRP before you proceed and your completion routine will be calling KeSetEvent. Since Plug and Play requests have no special claim on the processor and occur only infrequently, specify IO_NO_INCREMENT even for a sound card.

You can determine the current state of an event (at any IRQL) by calling KeReadStateEvent:

LONG signalled = KeReadStateEvent(event);

The return value is nonzero if the event is signalled, 0 if it's not-signalled.

NOTE
KeReadStateEvent is not supported in Microsoft Windows 98 even though the other KeReadStateXxx functions described here are. The absence of support has to do with how events and other synchronization primitives are implemented in Windows 98.

You can determine the current state of an event and, immediately thereafter, place it in the not-signalled state by calling the KeResetEvent function (at or below DISPATCH_LEVEL):

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LONG signalled = KeResetEvent(event);

If you're not interested in the previous state of the event, you can save a little time by calling KeClearEvent instead, as shown below.

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
KeClearEvent(event);

KeClearEvent is faster because it doesn't need to capture the current state of the event before setting it to not-signalled.

Kernel Semaphores

A kernel semaphore is an integer counter with associated synchronization semantics. The semaphore is considered signalled when the counter is positive and not-signalled when the counter is 0. The counter cannot take on a negative value. Releasing a semaphore increases the counter, whereas successfully waiting on a semaphore decrements the counter. If the decrement makes the count 0, the semaphore is then considered not-signalled, with the consequence that other KeWaitXxx callers who insist on finding it signalled will block. Note that if more threads are waiting for a semaphore than the value of the counter, not all of the waiting threads will be unblocked.

The kernel provides three service functions to control the state of a semaphore object. (See Table 4-3.) You initialize a semaphore by making the following function call at PASSIVE_LEVEL:

ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL);
KeInitializeSemaphore(semaphore, count, limit);

In this call, semaphore points to a KSEMAPHORE object in nonpaged memory. Count is the initial value of the counter, and limit is the maximum value that the counter will be allowed to take on, which must be as large as the initial count.

Table 4-3. Service functions for use with kernel semaphore objects.

Service Function Description
KeInitializeSemaphore Initializes semaphore object
KeReadStateSemaphore Determines current state of semaphore
KeReleaseSemaphore Sets semaphore object to the signalled state

If you create a semaphore with a limit of 1, the object is somewhat similar to a mutex in that only one thread at a time will be able to claim it. A kernel mutex has some features that a semaphore lacks, however, to help prevent deadlocks. Accordingly, there's almost no point in creating a semaphore with a limit of 1.

If you create a semaphore with a limit bigger than 1, you have an object that allows multiple threads to access some resource. A familiar theorem in queuing theory dictates that providing a single queue for multiple servers is more fair (that is, results in less variation in waiting times) than providing a separate queue for each of several servers. The average waiting time is the same in both cases, but the variation in waiting times is smaller. (This is why queues in stores are increasingly organized so that customers wait in a single line for the next available clerk.) This kind of semaphore allows you to organize a set of software or hardware servers to take advantage of that theorem.

The owner (or one of the owners) of a semaphore releases its claim to the semaphore by calling KeReleaseSemaphore:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LONG wassignalled = KeReleaseSemaphore(semaphore, boost, delta, wait);

This operation adds delta , which must be positive, to the counter associated with semaphore , thereby putting the semaphore into the signalled state and allowing other threads to be released. In most cases, you would specify 1 for this parameter to indicate that one claimant of the semaphore is releasing its claim. The boost and wait parameters have the same import as the corresponding parameters to KeSetEvent, discussed earlier. The return value is 0 if the previous state of the semaphore was not-signalled and nonzero if the previous state was signalled.

KeReleaseSemaphore doesn't allow you to increase the counter beyond the limit specified when you initialized the semaphore. If you try, it does not adjust the counter at all, and it raises an exception with the code STATUS_SEMAPHORE_LIMIT_EXCEEDED. Unless someone has a structured exception handler to trap the exception, a bug check will eventuate.

You can also interrogate the current state of a semaphore with this call:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LONG signalled = KeReadStateSemaphore(semaphore);

The return value is nonzero if the semaphore is signalled and 0 if the semaphore is not-signalled. You shouldn't assume that the return value is the current value of the counter—it could be any nonzero value if the counter is positive.

Kernel Mutexes

The word mutex is a contraction of mutual exclusion. A kernel mutex object provides one method (and not necessarily the best one) to serialize access by competing threads to some shared resource. The mutex is signalled if no thread owns it and not-signalled if some thread currently does own it. When a thread gains control of a mutex after calling one of the KeWaitXxx routines, the kernel also takes some steps to help avoid possible deadlocks. These are the side effects referred to in the earlier discussion of KeWaitForSingleObject (in the section "Waiting on a Single Dispatcher Object"). The kernel ensures that the thread can't be paged out, and it forestalls all but the delivery of "special" kernel APCs (such as the one that IoCompleteRequest uses to complete I/O requests).

It's generally better to use an executive fast mutex rather than a kernel mutex, as I'll explain in more detail later in "Fast Mutex Objects." The main difference between the two is that a kernel mutex can be acquired recursively, whereas an executive fast mutex cannot. That is, the owner of a kernel mutex can make a subsequent call to KeWaitXxx specifying the same mutex and have the wait immediately satisfied. A thread that does this must release the mutex an equal number of times before the mutex will be considered free.

The reason you would use a mutex in the first place (instead of relying on elevated IRQL and a spin lock) is that you need to serialize access to an object for a long time or in pagable code. By gating access to a resource through a mutex, you allow other threads to run on the other CPUs of a multiprocessor system, and you also allow your code to cause page faults while still locking out other threads. Table 4-4 lists the service functions you use with mutex objects.

Table 4-4. Service functions for use with kernel mutex objects.

Service Function Description
KeInitializeMutex Initializes mutex object
KeReadStateMutex Determines current state of mutex
KeReleaseMutex Sets mutex object to the signalled state

To create a mutex, you reserve nonpaged memory for a KMUTEX object and make the following initialization call:

ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL);
KeInitializeMutex(mutex, level);

where mutex is the address of the KMUTEX object, and level is a parameter originally intended to help avoid deadlocks when your own code uses more than one mutex. Since the kernel currently ignores the level parameter, I'm not going to attempt to describe what it used to mean.

The mutex begins life in the signalled—that is, unowned—state. An immediate call to KeWaitXxx would take control of the mutex and put it into the not-signalled state.

You can interrogate the current state of a mutex with this function call:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LONG signalled = KeReadStateMutex(mutex);

The return value is 0 if the mutex is currently owned, nonzero if it's currently unowned.

The thread that owns a mutex can release ownership and return the mutex to the signalled state with this function call:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LONG wassignalled = KeReleaseMutex(mutex, wait);

The wait parameter means the same thing as the corresponding argument to KeSetEvent. The return value is always 0 to indicate that the mutex was previously owned because, if this were not the case, KeReleaseMutex would have bugchecked (it being an error for anyone but the owner to release a mutex).

Just for the sake of completeness, I want to mention a macro in the DDK named KeWaitForMutexObject. (See WDM.H.) It is defined simply as follows:

#define KeWaitForMutexObject KeWaitForSingleObject

Using this special name offers no benefit at all. You don't even get the benefit of having the compiler insist that the first argument be a pointer to a KMUTEX instead of any random pointer type.

Kernel Timers

The kernel provides a timer object that functions something like an event that automatically signals itself at a specified absolute time or after a specified interval. It's also possible to create a timer that signals itself repeatedly and to arrange for a DPC callback following the expiration of the timer. Table 4-5 lists the service functions you use with timer objects. With so many different ways of using timers, it will be easiest to describe the use of these functions in several different scenarios.

Table 4-5. Service functions for use with kernel timer objects.

Service Function Description
KeCancelTimer Cancels an active timer
KeInitializeTimer Initializes a one-time notification timer
KeInitializeTimerEx Initializes a one-time or repetitive notification or synchronization timer
KeReadStateTimer Determines current state of a timer
KeSetTimer (Re)specifies expiration time for a notification timer
KeSetTimerEx (Re)specifies expiration time and other properties of a timer

Notification Timers Used like Events

In this scenario, we'll create a notification timer object and wait until it expires. First allocate a KTIMER object in nonpaged memory. Then, running at or below DISPATCH_LEVEL, initialize the timer object.

PKTIMER timer;      //  someone gives you this
ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
KeInitializeTimer(timer);

At this point, the timer is in the not-signalled state and isn't counting down—a wait on the timer would never be satisfied. To start the timer counting, call KeSetTimer as follows:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER duetime;
BOOLEAN wascounting = KeSetTimer(timer, &duetime, NULL);

The duetime value is a 64-bit time value expressed in 100-nanosecond units. If the value is positive, it is an absolute time relative to the same January 1, 1601, epoch used for the system timer. If the value is negative, it is an interval relative to the current time. If you specify an absolute time, a subsequent change to the system clock alters the duration of the timeout you experience. That is, the timer doesn't expire until the system clock equals or exceeds whatever absolute value you specify. In contrast, if you specify a relative timeout, the duration of the timeout you experience is unaffected by changes in the system clock. These are the same rules that apply to the timeout parameter to KeWaitXxx.

The return value from KeSetTimer, if TRUE, indicates that the timer was already counting down (in which case our call to KeSetTimer would have cancelled it and started the count all over again).

At any time, you can determine the current state of a timer:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
BOOLEAN counting = KeReadStateTimer(timer);

KeInitializeTimer and KeSetTimer are actually older service functions that have been superseded by newer functions. We could have initialized the timer with this call:

ASSERT(KeGetCurrentIqrl() <= DISPATCH_LEVEL);
KeInitializeTimerEx(timer, NotificationTimer);

We could also have used the extended version of the set timer function, KeSetTimerEx:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER duetime;
BOOLEAN wascounting = KeSetTimerEx(timer, &duetime, 0, NULL);

I'll explain a bit further on in this chapter the purpose of extra parameters in these extended versions of the service functions.

Once the timer is counting down, it's still considered to be not-signalled until the specified due time arrives. At that point, the object becomes signalled, and all waiting threads are released. The system guarantees only that the expiration of the timer will be noticed no sooner than the due time you specify. If you specify a due time with a precision finer than the granularity of the system timer (which you can't control), the timeout will be noticed later than the exact instant you specify.

Notification Timers Used with a DPC

In this scenario, we want expiration of the timer to trigger a DPC. You would choose this method of operation if you wanted to be sure that you could service the timeout no matter what priority level your thread had. (Since you can only wait at PASSIVE_LEVEL, regaining control of the CPU after the timer expires is subject to the normal vagaries of thread scheduling. The DPC, however, executes at elevated IRQL and thereby effectively preempts all threads.)

We initialize the timer object in the same way. We also have to initialize a KDPC object for which we allocate nonpaged memory. For example:

PKDPC dpc;  //  points to KDPC you've allocated
ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL);
KeInitializeTimer(timer);
KeInitializeDpc(dpc, DpcRoutine, context);

You can initialize the timer object by using either KeInitializeTimer or KeInitializeTimerEx, as you please. DpcRoutine is the address of a deferred procedure call routine, which must be in nonpaged memory. The context parameter is an arbitrary 32-bit value (typed as a PVOID) that will be passed as an argument to the DPC routine. The dpc argument is a pointer to a KDPC object for which you provide nonpaged storage. (It might be in your device extension, for example.)

When we want to start the timer counting down, we specify the DPC object as one of the arguments to KeSetTimer or KeSetTimerEx:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER duetime;
BOOLEAN wascounting = KeSetTimer(timer, &duetime, dpc);

You could also use the extended form KeSetTimerEx if you wanted to. The only difference between this call and the one we examined in the previous section is that we've specified the DPC object address as an argument. When the timer expires, the system will queue the DPC for execution as soon as conditions permit. This would be at least as soon as you'd be able to wake up from a wait at PASSIVE_LEVEL. Your DPC routine would have the following skeletal appearance:

VOID DpcRoutine(PKDPC dpc, PVOID context, PVOID junk1, PVOID junk2)
  {
  ...
  }

For what it's worth, even when you supply a DPC argument to KeSetTimer or KeSetTimerEx, you can still call KeWaitXxx to wait at PASSIVE_LEVEL if you want. On a single-CPU system, the DPC would occur before the wait could finish because it executes at higher IRQL.

Synchronization Timers

Like event objects, timer objects come in both notification and synchronization flavors. A notification timer allows any number of waiting threads to proceed once it expires. A synchronization timer, by contrast, allows only a single thread to proceed. Once some thread's wait is satisfied, the timer switches to the not-signalled state. To create a synchronization timer, you must use the extended form of the initialization service function:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
KeInitializeTimerEx(timer, SynchronizationTimer);

SynchronizationTimer is one of the values of the TIMER_TYPE enumeration. The other value is NotificationTimer.

If you use a DPC with a synchronization timer, think of queuing the DPC as being an extra thing that happens when the timer expires. That is, expiration puts the timer into the signalled state and queues a DPC. One thread can be released as a result of the timer being signalled.

Periodic Timers

So far, I've discussed only timers that expire exactly once. By using the extended set timer function, you can also request a periodic timeout:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER duetime;
BOOLEAN wascounting = KeSetTimerEx(timer, &duetime, period, dpc);

Here, period is a periodic timeout, expressed in milliseconds (ms), and dpc is an optional pointer to a KDPC object. A timer of this kind expires once at the due time and periodically thereafter. To achieve exactly periodic expiration, specify the same relative due time as the interval. Specifying a zero due time causes the timer to immediately expire, whereupon the periodic behavior takes over. It often makes sense to start a periodic timer in conjunction with a DPC object, by the way, because doing so allows you to be notified without having to repeatedly wait for the timeout.

An Example

One use for kernel timers is to conduct a polling loop in a system thread dedicated to the task of repeatedly checking a device for activity. Not many devices nowadays need to be served by a polling loop, but yours may be one of the few exceptions. I'll discuss this subject in Chapter 9, "Specialized Topics," and the companion disc includes a sample driver (POLLING) that illustrates all of the concepts involved. Part of that sample is the following loop that polls the device at fixed intervals. The logic of the driver is such that the loop can be broken by setting a kill event. Consequently, the driver uses KeWaitForMultipleObjects. The code is actually a bit more complicated than the following fragment, which I've edited to concentrate on the part related to the timer:





1
2







3


4



5
VOID PollingThreadRoutine(PDEVICE_EXTENSION pdx)
  {
  NTSTATUS status;
  KTIMER timer;
  KeInitializeTimerEx(&timer, SynchronizationTimer);
  PVOID pollevents[] = {
    (PVOID) &pdx->evKill,
    (PVOID) &timer,
    };
  ASSERT(arraysize(pollevents) <= THREAD_WAIT_OBJECTS);
  
  LARGE_INTEGER duetime = {0};
  #define POLLING_INTERVAL 500
  KeSetTimerEx(&timer, duetime, POLLING_INTERVAL, NULL);
  while (TRUE)
    {
    status = KeWaitForMultipleObjects(arraysize(pollevents),
      pollevents, WaitAny, Executive, KernelMode, FALSE, NULL, NULL);
    if (status == STATUS_WAIT_0)
      break;
    if (<device needs attention>)
      <do something>;
    }
  KeCancelTimer(&timer);
  PsTerminateSystemThread(STATUS_SUCCESS);
  }

  1. Here we initialize a kernel timer object to act as a synchronization timer. It would have worked just as well to initialize it as a notification timer because only one thread—this one—will ever wait on the timer.
  2. We'll need to supply an array of dispatcher object pointers as one of the arguments to KeWaitForMultipleObjects, and this is where we set that up. The first element of the array is the kill event that some other part of the driver might set when it's time for this system thread to exit. The second element is the timer object. The ASSERT statement that follows this array verifies that we have few enough objects in our array such that we can implicitly use the default array of wait blocks in our thread object.
  3. The KeSetTimerEx statement starts a periodic timer running. The duetime is 0, so the timer goes immediately into the signalled state. It will expire every 500 ms thereafter.
  4. Within our polling loop, we wait for the timer to expire or for the kill event to be set. If the wait terminates because of the kill event, we leave the loop, clean up, and exit this system thread. If the wait terminates because the timer has expired, we go on to the next step.
  5. This is where our device driver would do something related to our hardware.

Alternatives to Kernel Timers

Rather than using a kernel timer object, you can use two other timing functions that might be more appropriate. First of all, you can call KeDelayExecutionThread to wait at PASSIVE_LEVEL for a given interval. This function is obviously less cumbersome than creating, initializing, setting, and awaiting a timer by using separate function calls:

ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL);
LARGE_INTEGER duetime;
NSTATUS status = KeDelayExecutionThread(WaitMode, Alertable, &duetime);

Here, WaitMode, Alertable, and the returned status code have the same meaning as the corresponding parameters to KeWaitXxx, and duetime is the same kind of timestamp that I discussed previously in connection with kernel timers.

If your requirement is to delay for a very brief period of time (less than 50 microseconds), you can call KeStallExecutionProcessor at any IRQL:

KeStallExecutionProcessor(nMicroSeconds);

The purpose of this delay is to allow your hardware time to prepare for its next operation before your program continues executing. The delay might end up being significantly longer than you request because KeStallExecutionProcessor can be preempted by activities that occur at a higher IRQL than that which the caller is using.

Using Threads for Synchronization

The Process Structure component of the operating system provides a few routines that WDM drivers can use for creating and controlling system threads. I'll be discussing these routines later on in Chapter 9 from the perspective of how you can use these functions to help you manage a device that requires periodic polling. For the sake of thoroughness, I want to mention here that you can use a pointer to a kernel thread object in a call to KeWaitXxx to wait for the thread to complete. The thread terminates itself by calling PsTerminateSystemThread.

Before you can wait for a thread to terminate, you need to first obtain a pointer to the opaque KTHREAD object that internally represents that thread, which poses a bit of a problem. While running in the context of a thread, you can determine your own KTHREAD easily:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
PKTHREAD thread = KeGetCurrentThread();

Unfortunately, when you call PsCreateSystemThread to create a new thread, you can retrieve only an opaque HANDLE for the thread. To get the KTHREAD pointer, you use an Object Manager service function:

HANDLE hthread;
PKTHREAD thread;
PsCreateSystemThread(&hthread, ...);
ObReferenceObjectByHandle(hthread, THREAD_ALL_ACCESS, NULL, KernelMode,
  (PVOID*) &thread, NULL);
ZwClose(hthread);

ObReferenceObjectByHandle converts your handle into a pointer to the underlying kernel object. Once you have the pointer, you can discard the handle by calling ZwClose. At some point, you need to release your reference to the thread object by making a call to ObDereferenceObject:

ObDereferenceObject(thread);

Thread Alerts and APCs

Internally, the Windows NT kernel uses thread alerts as a way of waking threads. It uses an asynchronous procedure call as a way of waking a thread to execute some particular subroutine in that thread's context. The support routines that generate alerts or APCs are not exposed for use by WDM driver writers. But, since the DDK documentation and header files contain a great many references to these concepts, I want to finish this discussion of kernel dispatcher objects by explaining them.

I'll start by describing the "plumbing"—how these two mechanisms work. When someone blocks a thread by calling one of the KeWaitXxx routines, they specify by means of a Boolean argument whether the wait is to be "alertable." An alertable wait might finish early—that is, without any of the wait conditions or the timeout being satisfied—because of a thread alert. Thread alerts originate in user mode when someone calls the native API function NtAlertThread. The kernel returns the special status value STATUS_ALERTED when a wait terminates early because of an alert.

An APC is a mechanism whereby the operating system can execute a function in the context of a particular thread. The asynchronous part of an APC stems from the fact that the system effectively interrupts the target thread to execute an out-of-line subroutine. The action of an APC is somewhat similar to what happens when a hardware interrupt causes a processor to suddenly and, from the point of view of whatever code happens to be running at the time, unpredictably execute an interrupt service routine.

APCs come in three flavors: user-mode, kernel-mode, and special kernel-mode. User-mode code requests a user-mode APC by calling the Win32 API QueueUserAPC. Kernel-mode code requests an APC by calling an undocumented function for which the DDK headers have no prototype. Diligent reverse engineers probably already know the name of this routine and something about how to call it, but it's really just for internal use and I'm not going to say any more about it. The system queues APCs to a specific thread until appropriate execution conditions exist. Appropriate execution conditions depend on the type of APC, as follows:

If the system awakens a thread to deliver an APC, the wait primitive on which the thread was previously blocked returns with one of the special status values STATUS_KERNEL_APC or STATUS_USER_APC.

How APCs Work with I/O Requests

The kernel uses the APC concept for several purposes. We're concerned in this book just with writing device drivers, though, so I'm only going to explain how APCs relate to the process of performing an I/O operation. In one of many possible scenarios, when a user-mode program performs a synchronous ReadFile operation on a handle, the Win32 subsystem calls a kernel-mode routine named (as is widely known despite its being undocumented) NtReadFile. NtReadFile creates and submits an IRP to the appropriate device driver, which often returns STATUS_PENDING to indicate that it hasn't finished the operation. NtReadFile returns this status code to ReadFile, which thereupon calls NtWaitForSingleObject to wait on the file object to which the user-mode handle points. NtWaitForSingleObject, in turn, calls KeWaitForSingleObject to perform a nonalertable, user-mode wait on an event object within the file object.

When the device driver eventually finishes the read operation, it calls IoCompleteRequest, which, in turn, queues a special kernel-mode APC. The APC routine calls KeSetEvent to signal the file object, thereby releasing the application to continue execution. Some sort of APC is required because some of the tasks that need to be performed when an I/O request is completed (such as buffer copying) must occur in the address context of the requesting thread. A kernel-mode APC is required because the thread in question is not in an alertable wait state. A special APC is required because the thread is actually ineligible to run at the time we need to deliver the APC. In fact, the APC routine is the mechanism for awakening the thread.

Kernel-mode routines can also call NtReadFile. Drivers should call ZwReadFile instead, which uses the same system service interface to reach NtReadFile that user-mode programs use. (Note that NtReadFile is not documented for use by device drivers.) If you obey the injunctions in the DDK documentation when you call ZwReadFile, your call to NtReadFile will look almost like a user-mode call and will be processed in almost the same way, with just two differences. The first, which is quite minor, is that any waiting will be done in kernel mode. The other difference is that if you specified in your call to ZwCreateFile that you wanted to do synchronous operations, the I/O Manager will automatically wait for your read to finish. The wait will be alertable or not, depending on the exact option you specify to ZwCreateFile.

How to Specify Alertable and WaitMode Parameters

Now you have enough background to understand the ramifications of the Alertable and WaitMode parameters in the calls to the various wait primitives. As a general rule, you'll never be writing code that responds synchronously to requests from user mode. You could do so for, say, certain I/O control requests. Generally speaking, however, it's better to pend any operations that take a long time to finish (by returning STATUS_PENDING from your dispatch routine) and to finish them asynchronously. So, to continue speaking generally, you don't often call a wait primitive in the first place. Thread blocking is appropriate in a device driver in only a few scenarios, which I'll describe in the following sections.

Kernel Threads Sometimes you'll create your own kernel-mode thread—when your device needs to be polled periodically, for example. In this scenario, any waits performed will be in kernel mode because the thread runs exclusively in kernel mode.

Handling Plug and Play Requests I'll show you in Chapter 6 how to handle the I/O requests that the PnP Manager sends your way. Several such requests require synchronous handling on your part. In other words, you pass them down the driver stack to lower levels and wait for them to complete. You'll be calling KeWaitForSingleObject to wait in kernel mode because the PnP Manager calls you within the context of a kernel-mode thread. In addition, if you needed to perform subsidiary requests as part of handling a PnP request—for example, to talk to a universal serial bus (USB) device—you'd be waiting in kernel mode.

Handling Other I/O Requests When you're handling other sorts of I/O requests and you know that you're running in the context of a nonarbitrary thread that must get the results of your deliberations before proceeding, it might conceivably be appropriate to block that thread by calling a wait primitive. In such a case, you want to wait in the same processor mode as the entity that called you. Most of the time, you can simply rely on the RequestorMode in the IRP you're currently processing. If you somehow gained control by means other than an IRP, you could call ExGetPreviousMode to determine the previous processor mode. If you wait in user mode, and if the behavior you want to achieve is that user-mode programs should be able to terminate the wait early by calling QueueUserAPC, you should perform an alertable wait.

The last situation I mentioned—you're waiting in user mode and need to allow user-mode APCs to break in—is the only one I know of in which you'd want to allow alerts when waiting.

The bottom line: perform nonalertable waits unless you know you shouldn't.