[Previous] [Next]

Other Kernel-Mode Synchronization Primitives

The Windows 2000 kernel offers some additional methods for synchronizing execution between threads or for guarding access to shared objects. In this section, I'll discuss the fast mutex, which is a mutual exclusion object that offers faster performance than a kernel mutex because it's optimized for the case where no contention is actually occurring. I'll also describe the category of support functions that include the word Interlocked in their name somewhere. These functions carry out certain common operations—such as incrementing or decrementing an integer or inserting or removing an entry from a linked list—in an atomic way that prevents multitasking or multiprocessing interference.

Fast Mutex Objects

Compared to kernel mutexes, fast mutexes have the strengths and weaknesses summarized in Table 4-6. On the plus side, a fast mutex is much faster to acquire and release if there's no actual contention for it. On the minus side, you must avoid trying to recursively acquire a fast mutex, and that can mean preventing the delivery of APCs while you own it. Preventing APCs means raising IRQL to APC_LEVEL or above, which effectively negates thread priority and gains you the assurance that your code will execute except while the processor handles a higher-priority interrupt.

Table 4-6. Comparison of kernel and fast mutex objects.

Kernel Mutex Fast Mutex
Can be acquired recursively by a single thread (system maintains a claim counter) Cannot be acquired recursively
Relatively slower Relatively faster
Owner won't receive any but "special" kernel APCs Owner won't receive any APCs
Owner can't be removed from "balance set" (that is, can't be paged out) No automatic priority boost (if you run at or above APC_LEVEL)
Can be part of a multiple object wait Cannot be used as an argument to KeWaitForMultipleObjects

Incidentally, the DDK documentation about kernel mutex objects has long said that the kernel gives a priority boost to a thread that claims a mutex. I'm reliably informed that this hasn't actually been true since 1992 (the year, that is, not the Windows 2000 build number).

Table 4-7 summarizes the service functions you use to work with fast mutexes.

Table 4-7. Service functions for use with executive fast mutexes.

Service Function Description
ExAcquireFastMutex Acquires ownership of mutex, waiting if necessary
ExAcquireFastMutexUnsafe Acquires ownership of mutex, waiting if necessary, in circumstance where caller has already disabled receipt of APCs
ExInitializeFastMutex Initializes mutex object
ExReleaseFastMutex Releases mutex
ExReleaseFastMutexUnsafe Releases mutex without reenabling APC delivery
ExTryToAcquireFastMutex Acquires mutex if possible to do so without waiting

To create a fast mutex, you must first allocate a FAST_MUTEX data structure in nonpaged memory. Then you initialize the object by "calling" ExInitializeFastMutex, which is really a macro in WDM.H:

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
ExInitializeFastMutex(FastMutex);

where FastMutex is the address of your FAST_MUTEX object. The mutex begins life in the unowned state. To acquire ownership later on, call one of these functions:

ASSERT(KeGetCurrentIrql() < DISPATCH_LEVEL);
ExAcquireFastMutex(FastMutex);

or

ASSERT(KeGetCurrentIrql() < DISPATCH_LEVEL);
ExAcquireFastMutexUnsafe(FastMutex);

The first of these functions waits for the mutex to become available, assigns ownership to the calling thread, and then raises the current processor IRQL to APC_LEVEL. Raising the IRQL has the effect of blocking delivery of all APCs. The second of these functions doesn't change IRQL. You need to think about potential deadlocks if you use the "unsafe" function to acquire a fast mutex. The situation you must avoid is an APC routine that is running in the same thread context to acquire the same mutex or any other object that can't be recursively locked. Otherwise, you'll run the risk of instantly deadlocking that thread.

If you don't want to wait if the mutex isn't immediately available, use the "try to acquire" function:

ASSERT(KeGetCurrentIrql() < DISPATCH_LEVEL);
BOOLEAN acquired = ExTryToAcquireFastMutex(FastMutex);

If the return value is TRUE, you now own the mutex. If it's FALSE, someone else owns the mutex and has prevented you from acquiring it.

To release control of a fast mutex and allow some other thread to claim it, call the release function corresponding to the way you acquired the fast mutex:

ASSERT(KeGetCurrentIrql() < DISPATCH_LEVEL);
ExReleaseFastMutex(FastMutex);

or

ASSERT(KeGetCurrentIrql() < DISPATCH_LEVEL);
ExReleaseFastMutexUnsafe(FastMutex);

A fast mutex is fast because the acquisition and release steps are optimized for the usual case when there's no contention for the mutex. The critical step in acquiring the mutex is to atomically decrement and test an integer counter that indicates how many threads either own or are waiting for the mutex. If the test indicates that no other thread owns the mutex, no additional work is required. If the test indicates that another thread does own the mutex, the current thread blocks on a synchronization event that's part of the FAST_MUTEX object. Releasing the mutex entails atomically incrementing and testing the counter. If the test indicates that no thread is currently waiting, no additional work is required. If another thread is waiting, however, the owner calls KeSetEvent to release one of the waiters.

Interlocked Arithmetic

You can call several service functions in a WDM driver to perform arithmetic in a way that's thread-safe and multiprocessor-safe. See Table 4-8. These routines come in two flavors. The first type of routine has a name beginning with Interlocked and performs an atomic operation in such a way that no other thread or CPU can interfere. The other flavor has a name beginning with ExInterlocked and uses a spin lock.

Table 4-8. Service functions for interlocked arithmetic.

Service Function Description
InterlockedCompareExchange Compares and conditionally exchanges
InterlockedDecrement Subtracts one from an integer
InterlockedExchange Exchanges two values
InterlockedExchangeAdd Adds two values and returns sum
InterlockedIncrement Adds one to an integer
ExInterlockedAddLargeInteger Adds value to 64-bit integer
ExInterlockedAddLargeStatistic Adds value to ULONG
ExInterlockedAddUlong Adds value to ULONG and returns initial value
ExInterlockedCompareExchange64 Exchanges two 64-bit values

The InterlockedXxx functions can be called at any IRQL; they can also handle pagable data at PASSIVE_LEVEL because they don't require a spin lock. Although the ExInterlockedXxx routines can be called at any IRQL, they operate on the target data at or above DISPATCH_LEVEL and therefore require a nonpaged argument. The only reason to use an ExInterlockedXxx function is if you have a data variable that you sometimes need to increment or decrement and sometimes need to access throughout some series of instructions. You would explicitly claim the spin lock around the multi-instruction accesses and use the ExInterlockedXxx function to perform the simple increments or decrements.

InterlockedXxx Functions

InterlockedIncrement adds one to a long integer in memory and returns the postincrement value to you:

LONG result = InterlockedIncrement(pLong);

where pLong is the address of a variable typed as a LONG (that is, a long integer). Conceptually, the operation of the function is equivalent to the statement return ++*pLong in C, but the implementation differs from that simple statement in order to provide thread safety and multiprocessor safety. InterlockedIncrement guarantees that the integer is successfully incremented even if code on other CPUs or in other eligible threads on the same CPU is simultaneously trying to alter the same variable. In the nature of the operation, it cannot guarantee that the value it returns is still the value of the variable even one machine cycle later, because other threads or CPUs will be able to modify the variable as soon as the atomic increment operation completes.

InterlockedDecrement, shown below, is similar to InterlockedIncrement, but it subtracts one from the target variable and returns the postdecrement value, just like the C statement return --*pLong but with thread safety and multiprocessor safety.

LONG result = InterlockedDecrement(pLong);

You call InterlockedCompareExchange like this:

LONG target;
LONG result = InterlockedCompareExchange(&target, newval, oldval);

Here, target is a LONG integer used both as input and output to the function, oldval is your guess about the current contents of the target, and newval is the new value that you want installed into the target if your guess is correct. The function performs an operation similar to that indicated in the following C code, but does so via an atomic operation that's both thread-safe and multiprocessor-safe:

LONG CompareExchange(PLONG ptarget, LONG newval, LONG oldval)
  {
  LONG value = *ptarget;
  if (value == oldval)
    *ptarget = newval;
  return value;
  }

In other words, the function always returns the previous value of the target variable to you. In addition, if that previous value equals oldval, it sets the target equal to the newval you specify. The function uses an atomic operation to do the compare and exchange so that the replacement happens only if you're correct in your guess about the previous contents.

You can also call the InterlockedCompareExchangePointer function to perform a similar sort of compare and exchange operation with a pointer. This function is either defined as a compiler intrinsic (that is, a function for which the compiler supplies an inline implementation) or a real function call, depending on how wide pointers are on the platform for which you're compiling and on the ability of the compiler to generate inline code. You could use the pointer version of the function, as shown in the following example, to add a structure to the head of a singly-linked list without needing to acquire a spin lock or raise IRQL:

typedef struct _SOMESTRUCTURE {
  struct _SOMESTRUCTURE* next;
  ... } SOMESTRUCTURE, *PSOMESTRUCTURE;
...
void InsertElement(PSOMESTRUCTURE p, PSOMESTRUCTURE anchor)
  {
  PSOMESTRUCTURE next, first;
  do
    {
    p->next = first = *anchor;
    next = InterlockedCompareExchangePointer(anchor, p, first);
    }
  while (next != first);
  }

Each time through the loop, we make the assumption that the new element will end up being chained to the current head of the list, the address of which we save in the variable named first. Then we call InterlockedCompareExchangePointer to see whether the anchor still points to first even these few nanoseconds later. If so, InterlockedCompareExchangePointer will set the anchor to point our new element p. The fact that the return value from InterlockedCompareExchangePointer is the same as our assumption causes the loop to terminate. If, for some reason, the anchor no longer points to the same first element, we'll discover that fact and repeat the loop.

The last function in this class is InterlockedExchange , which simply uses an atomic operation to replace the value of an integer variable and to return the previous value:

LONG value;
LONG oldval = InterlockedExchange(&value, newval);

As you might have guessed, there's also an InterlockedExchangePointer that exchanges a pointer value (64-bit or 32-bit, depending on the platform).

ExInterlockedXxx Functions

Each of the ExInterlockedXxx functions requires that you create and initialize a spin lock before you call it. Note that the operands of these functions must all be in nonpaged memory because the functions operate on the data at elevated IRQL.

ExInterlockedAddLargeInteger adds two 64-bit integers and returns the previous value of the target:

LARGE_INTEGER value, increment;
KSPIN_LOCK spinlock;
LARGE_INTEGER prev = ExInterlockedAddLargeInteger(&value,
  increment, &spinlock);

Value is the target of the addition and one of the operands. Increment is an integer operand that's added to the target. Spinlock is a spin lock that you previously initialized. The return value is the target's value before the addition. In other words, the operation of this function is similar to the following function except that it occurs under protection of the spin lock:

_ _int64 AddLargeInteger(_ _int64* pvalue, _ _int64 increment)
  {
  _ _int64 prev = *pvalue;
  *pvalue += increment;
  return prev;
  }

Note that the return value is the preaddition value, which contrasts with the postincrement return from InterlockedExchange and similar functions. (Also, not all compilers support the _ _int64 integer data type, and not all computers can perform a 64-bit addition operation using atomic instructions.)

ExInterlockedAddUlong is analogous to ExInterlockedAddLargeInteger except that it works with 32-bit unsigned integers:

ULONG value, increment;
KSPIN_LOCK spinlock;
ULONG prev = ExInterlockedAddUlong(&value, increment, &spinlock);

This function likewise returns the preaddition value of the target of the operation.

ExInterlockedAddLargeStatistic is similar to ExInterlockedAddUlong in that it adds a 32-bit value to a 64-bit value. It hadn't been documented in the DDK at press time, so I'll show you its prototype here:

VOID ExInterlockedAddLargeStatistic(PLARGE_INTEGER Addend,
  ULONG Increment);

This new function is faster than ExInterlockedAddUlong because it doesn't need to return the preincrement value of the Addend variable. It therefore doesn't need to employ a spin lock for synchronization. The atomicity provided by this function is, however, only with respect to other callers of the same function. In other words, if you had code on one CPU calling ExInterlockedAddLargeStatistic at the same time as code on another CPU was accessing the Addend variable for either reading or writing, you could get inconsistent results. I can explain why this is so by showing you this paraphrase of the Intel x86 implementation of the function (not the actual source code):

mov eax, Addend
mov ecx, Increment
lock add [eax], ecx
lock adc [eax+4], 0

This code works correctly for purposes of incrementing the Addend because the lock prefixes guarantee atomicity of each addition operation and because no carries from the low-order 32 bits can ever get lost. The instantaneous value of the 64-bit Addend isn't always consistent, however, because an incrementer might be poised between the ADD and the ADC just at the instant someone makes a copy of the complete 64-bit value. Therefore, even a caller of ExInterlockedCompareExchange64 on another CPU could obtain an inconsistent value.

Interlocked List Access

The Windows NT executive offers three sets of support functions for dealing with linked lists in a thread-safe and multiprocessor-safe way. These functions support doubly-linked lists, singly-linked lists, and a special kind of singly-linked list called an S-List. I discussed noninterlocked doubly-linked and singly-linked lists in the preceding chapter. To close this chapter on synchronization within WDM drivers, I'll explain how to use these interlocked accessing primitives.

If you need the functionality of a FIFO queue, you should use a doubly-linked list. If you need the functionality of a thread-safe and multiprocessor-safe pushdown stack, you should use an S-List. In both cases, to achieve thread safety and multiprocessor safety, you will allocate and initialize a spin lock. The S-List might not actually use the spin lock, however, because the presence of a sequence number might allow the kernel to implement it using just atomic compare-exchange sorts of operations.

The support functions for performing interlocked access to list objects are very similar, so I've organized this section along functional lines. I'll explain how to initialize all three kinds of list. Then I'll explain how to insert an item into all three kinds. After that, I'll explain how to remove items.

Initialization

You can initialize these lists as shown here:

LIST_ENTRY DoubleHead;
SINGLE_LIST_ENTRY SingleHead;
SLIST_HEADER SListHead;

InitializeListHead(&DoubleHead);

SingleHead.Next = NULL;

ExInitializeSListHead(&SListHead);

Don't forget that you must also allocate and initialize a spin lock for each list. Furthermore, the storage for the list heads and all the items you put into the lists must come from nonpaged memory because the support routines perform their accesses at elevated IRQL. Note that the spin lock isn't used during initialization of the list head because it doesn't make any sense to allow contention for list access before the list has been initialized.

Inserting Items

You can insert items at the head and tail of a doubly-linked list and at the head (only) of a singly-linked list or an S-List:

PLIST_ENTRY pdElement, pdPrevHead, pdPrevTail;
PSINGLE_LIST_ENTRY psElement, psPrevHead;
PKSPIN_LOCK spinlock;

pdPrevHead = ExInterlockedInsertHeadList(&DoubleHead, pdElement, spinlock);
pdPrevTail = ExInterlockedInsertTailList(&DoubleHead, pdElement, spinlock);

psPrevHead = ExInterlockedPushEntryList(&SingleHead, psElement, spinlock);

psPrevHead = ExInterlockedPushEntrySList(&SListHead, psElement, spinlock);

The return values are the addresses of the elements previously at the head (or tail) of the list in question. Note that the element addresses you use with these functions are the addresses of list entry structures that are usually embedded in larger structures of some kind, and you will need to use the CONTAINING_RECORD macro to recover the address of the surrounding structure.

Removing Items

You can remove items from the head of any of these lists:

pdElement = ExInterlockedRemoveHeadList(&DoubleHead, spinlock);

psElement = ExInterlockedPopEntryList(&SingleHead, spinlock);

psElement = ExInterlockedPopEntrySList(&SListHead, spinlock);

The return values are NULL if the respective lists are empty. Be sure to test the return value for NULL before applying the CONTAINING_RECORD macro to recover a containing structure pointer.

IRQL Restrictions

You can call the S-List functions only while running at or below DISPATCH_LEVEL. The ExInterlockedXxx functions for accessing doubly-linked or singly-linked lists can be called at any IRQL so long as all references to the list use an ExInterlockedXxx call. The reason for no IRQL restrictions is that the implementations of these functions disable interrupts, which is tantamount to raising IRQL to the highest possible level. Once interrupts are disabled, these functions then acquire the spin lock you've specified. Since no other code can gain control on the same CPU, and since no code on another CPU can acquire the spin lock, your lists are protected.

NOTE
The DDK documentation states this rule in an overly restrictive way for at least some of the ExInterlockedXxx functions. It says that all callers must be running at some single IRQL less than or equal to the DIRQL of your interrupt object. There is, in fact, no requirement that all callers be at the same IRQL, because you can call the functions at any IRQL. Likewise, no restriction to <= DIRQL exists either, but there's also no reason for the code you and I write to raise IRQL higher than that.

It's perfectly okay for you to use ExInterlockedXxx calls to access a singly-linked or doubly-linked list (but not an S-List) in some parts of your code and to use the noninterlocked functions (InsertHeadList and so on) in other parts of your code if you follow a simple rule. Before using a noninterlocked primitive, acquire the same spin lock that your interlocked calls use. Furthermore, restrict list access to code running at or below DISPATCH_LEVEL. For example:

// Access list using noninterlocked calls:

VOID Function1()
  {
  ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
  KIRQL oldirql;
  KeAcquireSpinLock(spinlock, &oldirql);
  InsertHeadList(...);
  RemoveTailList(...);
  ...
  KeReleaseSpinLock(spinlock, oldirql);
  }

// Access list using interlocked calls:

VOID Function2()
  {
  ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
  ExInterlockedInsertTailList(..., spinlock);
  }

The first function must be running at or below DISPATCH_LEVEL because that's a requirement of calling KeAcquireSpinLock. The reason for the IRQL restriction on the interlocked calls in the second function is as follows: Suppose that Function1 acquires the spin lock in preparation for performing some list accesses. Acquiring the spin lock raises IRQL to DISPATCH_LEVEL. Now suppose that an interrupt occurs on the same CPU at a higher IRQL and that Function2 gains control to use one of the ExInterlockedXxx routines. The kernel will now attempt to acquire the same spin lock, and the CPU will deadlock. This problem arises from allowing code running at two different IRQLs to use the same spin lock: Function1 is at DISPATCH_LEVEL, and Function2 is—practically speaking, anyway—at HIGH_LEVEL when it tries to recursively acquire the lock.