Windows 2000 supports direct memory access transfers based on the abstract model of a
computer depicted in Figure 7-6. In this model, the computer is considered to have a collection
of map registers that translate between physical CPU address and bus addresses. Each map
register holds the address of one physical page frame. Hardware accesses memory for reading or
writing by means of a "logical," or bus-specific, address. The map registers play the
same role as page table entries for software by allowing hardware to use different numeric
values for their addresses than the CPU understands.
Figure 7-6. Abstract computer model for DMA transfers.
Some CPUs, such as the Alpha, have actual hardware map registers. One of the steps in initializing a DMA transfer—specifically, the MapTransfer step I'll discuss presently—reserves some of these registers for your use. Other CPUs, such as the Intel x86, do not have map registers, but you write your driver as if they did. The MapTransfer step on such a computer might end up reserving use of physical memory buffers that belong to the system, in which case the DMA operation will proceed using the reserved buffer. Obviously, something has to copy data to or from the DMA buffer before or after the transfer. In certain cases—for example, when dealing with a bus-master device that has scatter/gather capability—the MapTransfer phase might do all of nothing on an architecture without map registers.
The Windows 2000 kernel uses a data structure known as an adapter object to describe the DMA characteristics of a device and to control access to potentially shared resources, such as system DMA channels and map registers. You get a pointer to an adapter object by calling IoGetDmaAdapter during your StartDevice processing. The adapter object has a pointer to a structure named DmaOperations that, in turn, contains pointers to all the other functions you need to call. See Table 74. These functions take the place of global functions (such as IoAllocateAdapter, IoMapTransfer, and the like) that you would have used in previous versions of Windows NT. In fact, the global names are now macros that invoke the DmaOperations functions.
Table 7-4. DmaOperations function pointers for DMA helper routines.DmaOperations Function Pointer | Description |
---|---|
PutDmaAdapter | Destroys adapter object |
AllocateCommonBuffer | Allocates a common buffer |
FreeCommonBuffer | Releases a common buffer |
AllocateAdapterChannel | Reserves adapter and map registers |
FlushAdapterBuffers | Flushes intermediate data buffers after transfer |
FreeAdapterChannel | Releases adapter object and map registers |
FreeMapRegisters | Releases map registers only |
MapTransfer | Programs one stage of a transfer |
GetDmaAlignment | Gets address alignment required for adapter |
ReadDmaCounter | Determines residual count |
GetScatterGatherList | Reserves adapter and construct scatter/gather list |
PutScatterGatherList | Releases scatter/gather list |
How you perform a DMA transfer depends on several factors:
Notwithstanding the fact that many details will be different depending on how these four factors interplay, the steps you perform will have many common features. Figure 7-7 illustrates the overall operation of a transfer. You start the transfer in your StartIo routine by requesting ownership of your adapter object. Ownership has meaning only if you're sharing a system DMA channel with other devices, but the Windows 2000 DMA model demands that you perform this step anyway. When the I/O Manager is able to grant you ownership, it allocates some map registers for your temporary use and calls back to an adapter control routine you provide. In your adapter control routine, you perform a transfer mapping step to arrange the first (maybe the only) stage of the transfer. Multiple stages can be necessary if sufficient map registers aren't available; your device must be capable of handling any delay that might occur between stages.
Figure 7-7. Flow of ownership during DMA.
Once your adapter control routine has initialized the map registers for the first stage, you signal your device to begin operation. Your device will instigate an interrupt when this initial transfer completes, whereupon you'll schedule a DPC. The DPC routine will initiate another staged transfer, if necessary, or else it will complete the request.
Somewhere along the way, you'll release the map registers and the adapter object. The timing of these two events is one of the details that differs based on the factors I summarized earlier in this section.
Now I'll go into detail about the mechanics of what's often called a packet-based DMA transfer, wherein you transfer a discrete amount of data by using the data buffer that accompanies an I/O request packet. Let's start simply and suppose that you face what will be a very common case nowadays: your device is a PCI bus master but does not have scatter/gather capability.
To start with, when you create your device object, you'd ordinarily indicate that you want to use the direct method of data buffering by setting the DO_DIRECT_IO flag. You'd choose the direct method because you'll eventually be passing the address of a memory descriptor list as one of the arguments to the MapTransfer function you'll be calling. This choice poses a bit of a problem with regard to buffer alignment, though. Unless the application uses the FILE_FLAG_NO_BUFFERING flag in its call to CreateFile, the I/O Manager won't enforce the device object's Alignment-Requirement on user-mode data buffers. (It doesn't enforce the requirement for a kernel-mode caller at all except in the checked build.) If your device or the HAL requires DMA buffers to begin on some particular boundary, therefore, you might end up copying a small portion of the user data to a correctly aligned internal buffer to meet the alignment requirement—either that or fail any request that has a misaligned buffer.
In your StartDevice function, you create an adapter object by using code like the following:
INTERFACE_TYPE bustype; ULONG junk; IoGetDeviceProperty(pdx->Pdo, DevicePropertyLegacyBusType, sizeof(bustype), &bustype, &junk); DEVICE_DESCRIPTION dd; RtlZeroMemory(&dd, sizeof(dd)); dd.Version = DEVICE_DESCRIPTION_VERSION; dd.Master = TRUE; dd.InterfaceType = bustype; dd.MaximumLength = MAXTRANSFER; dd.Dma32BitAddresses = TRUE; pdx->AdapterObject = IoGetDmaAdapter(pdx->Pdo, &dd, &pdx->nMapRegisters); |
The last statement in this code fragment is the important one. IoGetDmaAdapter will communicate with the bus driver or the HAL to create an adapter object, whose address it returns to you. The first parameter (pdx->Pdo) identifies the physical device object (PDO) for your device. The second parameter points to a DEVICE_DESCRIPTION structure that you initialize to describe the DMA characteristics of your device. The last parameter indicates where the system should store the maximum number of map registers you'll ever be allowed to attempt to reserve during a single transfer. You'll notice that I reserved two fields in the device extension (AdapterObject and nMapRegisters) to receive the two outputs from this function.
In your StopDevice function, you destroy the adapter object with this call:
VOID StopDevice(...) { ... if (pdx->AdapterObject) (*pdx->AdapterObject->DmaOperations->PutDmaAdapter) (pdx->AdapterObject); pdx->AdapterObject = NULL; ... } |
You won't expect to receive an official DMA resource when your device is a bus master. That is, your resource extraction loop won't need a CmResourceTypeDma case label. The PnP Manager doesn't assign you a DMA resource because your hardware itself contains all the necessary electronics for performing DMA transfers, so nothing additional needs to be assigned to you.
Previous versions of Windows NT relied on a service function named HalGetAdapter to acquire the DMA adapter object. That function still exists for compatibility, but new WDM drivers should call IoGetDmaAdapter instead. The difference between the two is that IoGetDmaAdapter first issues an IRP_MN_QUERY_INTERFACE Plug and Play IRP to determine whether the physical device object supports the GUID_BUS_INTERFACE_STANDARD direct call interface. If so, IoGetDmaAdapter uses that interface to allocate the adapter object. If not, it simply calls HalGetAdapter.
Table 7-5 summarizes the fields in the DEVICE_DESCRIPTION structure you pass to IoGetDmaAdapter. The only fields that are relevant for a bus-master device are those shown in the preceding StartDevice code fragment. The HAL might or might not need to know whether your device recognizes 32-bit or 64-bit addresses—the Intel x86 HAL uses this flag only when you allocate a common buffer, for example—but you should indicate that capability anyway to retain portability. By zeroing the entire structure, we set ScatterGather to FALSE. Since we won't be using a system DMA channel, none of DmaChannel, DmaPort, DmaWidth, DemandMode, AutoInitialize, IgnoreCount, and DmaSpeed will be examined by the routine that creates our adapter object.
Table 7-5. Device description structure used with IoGetDmaAdapter.
Field Name | Description | Relevant To Device |
---|---|---|
Version | Version number of structure—initialize to DEVICE_DESCRIPTION_VERSION | All |
Master | Bus-master device—set based on your knowledge of device | All |
ScatterGather | Device supports scatter/gather list—set based on your knowledge of device | All |
DemandMode | Use system DMA controller's demand mode—set based on your knowledge of device | Slave |
AutoInitialize | Use system DMA controller's autoinitialize mode—set based on your knowledge of device | Slave |
Dma32BitAddresses | Can use 32-bit physical addresses | Common buffer |
IgnoreCount | Controller doesn't maintain an accurate transfer count—set based on your knowledge of device | Slave |
Reserved1 | Reserved—must be FALSE | |
Dma64BitAddresses | Can use 64-bit physical addresses | Common buffer |
DoNotUse2 | Reserved—must be 0 | |
DmaChannel | DMA channel number—initialize from Channel attribute of resource descriptor | Slave |
InterfaceType | Bus type—use result of IoGetDeviceProperty call to get DevicePropertyLegacyBusType | All |
DmaWidth | Width of transfers—set based on your knowledge of device to Width8Bits, Width16Bits, or Width32Bits | Slave |
DmaSpeed | Speed of transfers—set based on your knowledge of device to Compatible, TypeA, TypeB, TypeC, or TypeF | Slave |
MaximumLength | Maximum length of a single transfer—set based on your knowledge of device (and round up to a multiple of PAGE_SIZE) | All |
DmaPort | Microchannel-type bus port number—initialize from Port attribute of resource descriptor | Slave |
To initiate an I/O operation, your StartIo routine first has to reserve the adapter object by calling the object's AllocateAdapterChannel routine. One of the arguments to AllocateAdapterChannel is the address of an adapter control routine that the I/O Manager will call when the reservation has been accomplished. Here's an example of code you would use to prepare and execute the call to AllocateAdapterChannel:
1 2 3 4 5 |
typedef struct _DEVICE_EXTENSION { ... PADAPTER_OBJECT AdapterObject; // device's adapter object ULONG nMapRegisters; // max # map registers ULONG nMapRegistersAllocated; // # allocated for this xfer ULONG numxfer; // # bytes transferred so far ULONG xfer; // # bytes to transfer during this stage ULONG nbytes; // # bytes remaining to transfer PVOID vaddr; // virtual address for current stage PVOID regbase; // map register base for this stage ... } DEVICE_EXTENSION, *PDEVICE_EXTENSION; VOID StartIo(PDEVICE_OBJECT fdo, PIRP Irp) { PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension; NTSTATUS status = IoAcquireRemoveLock(&pdx->RemoveLock, Irp); if (!NT_SUCCESS(status)) { CompleteRequest(Irp, status, 0); return; } PMDL mdl = Irp->MdlAddress; pdx->numxfer = 0; pdx->xfer = pdx->nbytes = MmGetMdlByteCount(mdl); pdx->vaddr = MmGetMdlVirtualAddress(mdl); ULONG nregs = ADDRESS_AND_SIZE_TO_SPAN_PAGES(pdx->vaddr, pdx->nbytes); if (nregs > pdx->nMapRegisters) { nregs = pdx->nMapRegisters; pdx->xfer = nregs * PAGE_SIZE - MmGetMdlByteOffset(mdl); } pdx->nMapRegistersAllocated = nregs; status = (*pdx->AdapterObject->DmaOperations ->AllocateAdapterChannel)(pdx->AdapterObject, fdo, nregs, (PDRIVER_CONTROL) AdapterControl, pdx); if (!NT_SUCCESS(status)) { IoReleaseRemoveLock(&pdx->RemoveLock, Irp); CompleteRequest(Irp, status, 0); StartNextPacket(&pdx->dqReadWrite, fdo); } } |
In general, several devices can share a single adapter object. Adapter object sharing happens in real life only when you rely on the system DMA controller; bus-master devices own dedicated adapter objects. But, since you don't need to know how the system decides when to create adapter objects, you shouldn't make any assumptions about it. In general, then, the adapter object might be busy when you call AllocateAdapterChannel, and your request might therefore be put into a queue until the adapter object becomes available. Also, all DMA devices on the computer share a set of map registers. Further delay can ensue until the requested number of registers becomes available. Both of these delays occur inside AllocateAdapterChannel, which calls your adapter control procedure when the adapter object and all the map registers you asked for are available.
Even though a PCI bus-mastering device owns its own adapter object, if the device doesn't
have scatter/gather capability, it requires the use of map registers. On CPUs like Alpha that
have map registers, AllocateAdapterChannel will reserve them for your use. On CPUs like Intel
that don't have map registers, AllocateAdapterChannel will reserve use of a software
surrogate, such as a contiguous area of physical memory.
As I've been discussing, AllocateAdapterChannel eventually calls your adapter control routine (at DISPATCH_LEVEL, just like your StartIo routine does). You have two tasks to accomplish. First, you should call the adapter object's MapTransfer routine to prepare the map registers and other system resources for the first stage of your I/O operation. In the case of a bus-mastering device, MapTransfer will return a logical address that represents the starting point for the first stage. This logical address might be the same as a CPU physical memory address, and it might not be. All you need to know about it is that it's the right address to program into your hardware. MapTransfer might also trim the length of your request to fit the map registers it's using, which is why you need to supply the address of the variable that contains the current stage length as an argument.
Your second task is to perform whatever device-dependent steps are required to inform your device of the physical address and to start the operation on your hardware:
1 2 3 4 5 6 7 |
IO_ALLOCATION_ACTION AdapterControl(PDEVICE_OBJECT fdo, PIRP junk, PVOID regbase, PDEVICE_EXTENSION pdx) { PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite); PMDL mdl = Irp->MdlAddress; PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp); BOOLEAN isread = stack->MajorFunction == IRP_MJ_READ; pdx->regbase = regbase; KeFlushIoBuffers(mdl, isread, TRUE); PHYSICAL_ADDRESS address = (*pdx->AdapterObject->DmaOperations->MapTransfer) (pdx->AdapterObject, mdl, regbase, pdx->vaddr, &pdx->xfer, !isread); ... return DeallocateObjectKeepRegisters; } |
An interrupt usually occurs shortly after you start the transfer, and the interrupt service routine usually requests a DPC to deal with completion of the first stage of the transfer. Your DPC routine would look something like this:
1 2 3 4 5 6 7 8 |
VOID DpcForIsr(PKDPC Dpc, PDEVICE_OBJECT fdo, PIRP junk, PVOID Context) { PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension; PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite); PMDL mdl = Irp->MdlAddress; BOOLEAN isread = IoGetCurrentIrpStackLocation(Irp) ->MajorFunction == IRP_MJ_READ; (*pdx->AdapterObject->DmaOperations->FlushAdapterBuffers) (pdx->AdapterObject, mdl, pdx->regbase, pdx->vaddr, pdx->xfer, !isread); pdx->nbytes -= pdx->xfer; pdx->numxfer += pdx->xfer; NTSTATUS status = STATUS_SUCCESS; ... if (pdx->nbytes && NT_SUCCESS(status)) { pdx->vaddr = (PVOID) ((PUCHAR) pdx->vaddr + pdx->xfer); pdx->xfer = pdx->nbytes; ULONG nregs = ADDRESS_AND_SIZE_TO_SPAN_PAGES(pdx->vaddr, pdx->nbytes); if (nregs > pdx->nMapRegistersAllocated) { nregs = pdx->nMapRegistersAllocated; pdx->xfer = nregs * PAGE_SIZE; } PHYSICAL_ADDRESS address = (*pdx->AdapterObject->DmaOperations->MapTransfer) (pdx->AdapterObject, mdl, pdx->regbase, pdx->vaddr, pdx->xfer, !isread); ... } else { ULONG numxfer = pdx->numxfer; (*pdx->AdapterObject->DmaOperations->FreeMapRegisters) (pdx->AdapterObject, pdx->regbase, pdx->nMapRegistersAllocated); IoReleaseRemoveLock(&pdx->RemoveLock, Irp); StartNextPacket(&pdx->dqReadWrite, fdo); CompleteRequest(Irp, status, numxfer); } } |
If your hardware has scatter/gather support, the system has a much easier time doing DMA transfers to and from your device. The scatter/gather capability permits the device to perform a transfer involving pages that aren't contiguous in physical memory.
Your StartDevice routine creates its adapter object in just about the same way I've already discussed, except (of course) that you'll set the ScatterGather flag to TRUE.
The traditional method—that is, the method you would have used in previous versions of Windows NT—to program a DMA transfer involving scatter/gather functionality is practically identical to the packet-based example considered in the previous section, "Performing DMA Transfers." The only difference is that instead of making one call to MapTransfer for each stage of the transfer, you need to make multiple calls. Each call gives you the information you need for a single element in a scatter/gather list that contains a physical address and length. When you're done with the loop, you can send the scatter/gather list to your device by using some device-specific method, and you can then initiate the transfer.
I'm going to make some assumptions about the framework into which you'll fit the construction of a scatter/gather list. First, I'll assume that you've defined a manifest constant named MAXSG that represents the maximum number of scatter/gather list elements your device can handle. To make life as simple as possible, I'm also going to assume that you can just use the SCATTER_GATHER_LIST structure defined in WDM.H to construct the list:
typedef struct _SCATTER_GATHER_ELEMENT { PHYSICAL_ADDRESS Address; ULONG Length; ULONG_PTR Reserved; } SCATTER_GATHER_ELEMENT, *PSCATTER_GATHER_ELEMENT; typedef struct _SCATTER_GATHER_LIST { ULONG NumberOfElements; ULONG_PTR Reserved; SCATTER_GATHER_ELEMENT Elements[]; } SCATTER_GATHER_LIST, *PSCATTER_GATHER_LIST; |
Finally, I'm going to suppose that you can simply allocate a maximum-sized scatter/gather list in your AddDevice function and leave it lying around for use whenever you need it:
pdx->sglist = (PSCATTER_GATHER_LIST) ExAllocatePool(NonPagedPool, sizeof(SCATTER_GATHER_LIST) + MAXSG * sizeof(SCATTER_GATHER_ELEMENT)); |
With this infrastructure in place, your AdapterControl procedure would look like this:
1 2 3 4 5 6 7 8 |
IO_ALLOCATION_ACTION AdapterControl(PDEVICE_OBJECT fdo, PIRP junk, PVOID regbase, PDEVICE_EXTENSION pdx) { PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite); PMDL mdl = Irp->MdlAddress; BOOLEAN isread = IoGetCurrentIrpStackLocation(Irp) ->MajorFunction == IRP_MJ_READ; pdx->regbase = regbase; KeFlushIoBuffers(mdl, isread, TRUE); PSCATTER_GATHER_LIST sglist = pdx->sglist; ULONG xfer = pdx->xfer; PVOID vaddr = pdx->vaddr; pdx->xfer = 0; ULONG isg = 0; while (xfer && isg < MAXSG) { ULONG elen = xfer; sglist->Elements[isg].Address = (*pdx->AdapterObject->DmaOperations->MapTransfer) (pdx->AdapterObject, mdl, regbase, pdx->vaddr, &elen, !isread); sglist->Elements[isg].Length = elen; xfer -= elen; pdx->xfer += elen; vaddr = (PVOID) ((PUCHAR) vaddr + elen); ++isg; } sglist->NumberOfElements = isg; ... return DeallocateObjectKeepRegisters; } |
Your device now performs its DMA transfer and, presumably, interrupts to signal completion. Your ISR requests a DPC, and your DPC routine initiates the next stage in the operation. The DPC routine would perform a MapTransfer loop like the one I just showed you as part of that initiation process. I'll leave the details of that code as an exercise for you.
Windows 2000 provides a shortcut to avoid the relatively cumbersome loop of calls to MapTransfer in the common case in which you can accomplish the entire transfer by using either no map registers or no more than the maximum number of map registers returned by IoGetDmaAdapter. The shortcut, which is illustrated in the SCATGATH sample on the companion disc, involves calling the GetScatterGatherList routine instead of AllocateAdapterChannel. Your StartIo routine looks like this:
VOID StartIo(PDEVICE_OBJECT fdo, PIRP Irp) { PDEVICE_EXTENSION pdx = (PDEVICE_EXTENSION) fdo->DeviceExtension; PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp); NTSTATUS status = IoAcquireRemoveLock(&pdx->RemoveLock, Irp); if (!NT_SUCCESS(status)) { CompleteRequest(Irp, status, 0); return; } PMDL mdl = Irp->MdlAddress; ULONG nbytes = MmGetMdlByteCount(mdl); PVOID vaddr = MmGetMdlVirtualAddress(mdl); BOOLEAN isread = stack->MajorFunction == IRP_MJ_READ; pdx->numxfer = 0; pdx->nbytes = nbytes; status = (*pdx->AdapterObject->DmaOperations->GetScatterGatherList) (pdx->AdapterObject, fdo, mdl, vaddr, nbytes, (PDRIVER_LIST_CONTROL) DmaExecutionRoutine, pdx, !isread); if (!NT_SUCCESS(status)) { IoReleaseRemoveLock(&pdx->RemoveLock, Irp); CompleteRequest(Irp, status, 0); StartNextPacket(&pdx->dqReadWrite, fdo); } } |
The call to GetScatterGatherList, shown in bold in the previous code fragment, is the main difference between this StartIo routine and the one we looked at in the preceding section. GetScatterGatherList waits, if necessary, until you can be granted use of the adapter object and all the map registers you need. Then it builds a SCATTER_GATHER_LIST structure and passes it to the DmaExecutionRoutine. You can then program your device by using the physical addresses in the scatter/gather elements and initiate the transfer:
1 2 |
VOID DmaExecutionRoutine(PDEVICE_OBJECT fdo, PIRP junk, PSCATTER_GATHER_LIST sglist, PDEVICE_EXTENSION pdx) { PIRP Irp = GetCurrentIrp(&pdx->dqReadWrite); pdx->sglist = sglist; ... } |
When the transfer finishes, call the adapter object's PutScatterGatherList to release the list and the adapter:
VOID DpcForIsr(PKDPC Dpc, PDEVICE_OBJECT fdo, PIRP junk, PVOID Context) { ... (*pdx->AdapterObject->DmaOperations->PutScatterGatherList) (pdx->AdapterObject, pdx->sglist, !isread); ... } |
To decide whether you can use GetScatterGatherList, you need to be able to predict whether you'll meet the preconditions for its use. On an Intel 32-bit platform, scatter/gather devices on a PCI or EISA bus can be sure of not needing any map registers. Even on an ISA bus, you'll be allowed to request up to 16 map register surrogates (eight if you're also a bus-mastering device) unless physical memory is so tight that the I/O system can't allocate its intermediate I/O buffers. In that case, you wouldn't be able to do DMA using the traditional method either, so there'd be no point in worrying about it.
If you can't predict with certainty at the time you code your driver that you'll be able to use GetScatterGatherList, my advice is to just fall back on the traditional loop of MapTransfer calls. You'll need to put that code in place anyway to deal with cases in which GetScatterGatherList won't work, and having two pieces of logic in your driver is just unnecessary complication.
If your device is not a bus master, DMA capability requires that it use the system DMA controller. As I've said, people often use the phrase slave DMA, which emphasizes that such a device is not master of its own DMA fate. The system DMA controllers have several characteristics that affect the internal details of how DMA transfers proceed:
Notwithstanding these factors, your driver code will be very similar to the bus-mastering code we've just discussed. Your StartDevice routine just works a little harder to set up its call to IoGetDmaAdapter, and your AdapterControl and DPC routines apportion the steps of releasing the adapter object and map registers differently.
In StartDevice, you have a little bit of additional code to determine which DMA channel the PnP Manager has assigned for you, and you also need to initialize more of the fields of the DEVICE_DESCRIPTION structure for IoGetDmaAdapter:
1 2 3 |
NTSTATUS StartDevice(...) { ULONG dmachannel; // system DMA channel # ULONG dmaport; // MCA bus port number ... for (ULONG i = 0; i < nres; ++i, ++resource) { switch (resource->Type) { case CmResourceTypeDma: dmachannel = resource->u.Dma.Channel; dmaport = resource->u.Dma.Port; break; } } ... INTERFACE_TYPE bustype; IoGetDeviceProperty(...); DEVICE_DESCRIPTION dd; RtlZeroMemory(&dd, sizeof(dd)); dd.Version = DEVICE_DESCRIPTION_VERSION; dd.InterfaceType = bustype; dd.MaximumLength = MAXTRANSFER; dd.DmaChannel = dmachannel; dd.DmaPort = dmaport; dd.DemandMode = ??; dd.AutoInitialize = ??; dd.IgnoreCount = ??; dd.DmaWidth = ??; dd.DmaSpeed = ??; pdx->AdapterObject = IoGetDmaAdapter(...); } |
Everything about your adapter control and DPC procedures will be identical to the code we looked at earlier for handling a bus-mastering device without scatter/gather capability, except for two small details. First, AdapterControl returns a different value:
IO_ALLOCATION_ACTION AdapterControl(...) { ... return KeepObject; } |
The return value KeepObject indicates that we want to retain control over the map registers and the DMA channel we're using. Second, since we didn't release the adapter object when AdapterControl returned, we have to do so in the DPC routine by calling FreeAdapterChannel instead of FreeMapRegisters:
VOID DpcForIsr(...) { ... (*pdx->AdapterObject->DmaOperations->FreeAdapterChannel) (pdx->AdapterObject); ... } |
By the way, you don't need to remember how many map registers you were assigned—I previously showed you an nMapRegistersAllocated variable in the device extension to be used for this purpose—since you won't be calling FreeMapRegisters.
As I mentioned in "Transfer Strategies," you might want to allocate a common buffer for your device to use in performing DMA transfers. A common buffer is an area of nonpaged, physically contiguous memory. Your driver uses a fixed virtual address to access the buffer. Your device uses a fixed logical address to access the same buffer.
You can use the common buffer area in several ways. You can support a device that continuously transfers data to or from memory by using the system DMA controller's autoinitialize mode. In this mode of operation, completion of one transfer triggers the controller to immediately reinitialize for another transfer.
Another use for a common buffer area is as a means to avoid extra data copying. The MapTransfer routine often copies the data you supply into auxiliary buffers owned by the I/O Manager and used for DMA. If you're stuck with doing slave DMA on an ISA bus, it's especially likely that MapTransfer will copy data to conform to the 16-MB address and buffer alignment requirements of the ISA DMA controller. But if you have a common buffer, you'll avoid the copy steps.
You'd normally allocate your common buffer at StartDevice time after creating your adapter object:
typedef struct _DEVICE_EXTENSION { ... PVOID vaCommonBuffer; PHYSICAL_ADDRESS paCommonBuffer; ... } DEVICE_EXTENSION, *PDEVICE_EXTENSION; dd.Dma32BitAddresses = ??; dd.Dma64BitAddresses = ??; pdx->AdapterObject = IoGetDmaAdapter(...); pdx->vaCommonBuffer = (*pdx->AdapterObject->DmaOperations->AllocateCommonBuffer) (pdx->AdapterObject, <length>, &pdx->paCommonBuffer, FALSE); |
Prior to calling IoGetDmaAdapter, you set the Dma32BitAddresses and Dma64BitAddresses flags in the DEVICE_DESCRIPTION structure to state the truth about your device's addressing capabilities. That is, if your device can address a buffer using any 32-bit physical address, set Dma32BitAddresses to TRUE. If it can address a buffer using any 64-bit physical address, set Dma64BitAddresses to TRUE.
In the call to AllocateCommonBuffer, the second argument is the byte length of the buffer you want to allocate. The fourth argument is a BOOLEAN value that indicates whether you want the allocated memory to be capable of entry into the CPU cache (TRUE) or not (FALSE).
AllocateCommonBuffer returns a virtual address. This address is the one you use within your driver to access the allocated buffer area. AllocateCommonBuffer also sets the PHYSICAL_ADDRESS pointed to by the third argument to be the logical address used by your device for its own buffer access.
NOTE
The DDK carefully uses the term logical address to refer to the address value returned by MapTransfer and the address value returned by the third argument of AllocateCommonBuffer. On many CPU architectures, a logical address will be a physical memory address that the CPU understands. On other architectures, it might be an address that only the I/O bus understands. Perhaps bus address would have been a better term.
If you're going to be performing slave DMA, you must create an MDL to describe the virtual addresses you receive. The actual purpose of the MDL is to occupy an argument slot in an eventual call to MapTransfer. MapTransfer won't end up doing any copying, but it requires the MDL to discover that it doesn't need to do any copying! You'd normally create the MDL in your StartDevice function just after allocating the common buffer:
pdx->vaCommonBuffer = ...; pdx->mdlCommonBuffer = IoAllocateMdl(pdx->vaCommonBuffer, <length>, FALSE, FALSE, NULL); MmBuildMdlForNonPagedPool(pdx->mdlCommonBuffer); |
To perform an output operation, first make sure by some means (such as an explicit memory copy) that the common buffer contains the data you want to send to the device. The other DMA logic in your driver will be essentially the same as I showed you earlier (in "Performing DMA Transfers"). You'll call AllocateAdapterChannel. It will call your adapter control routine, which will call KeFlushIoBuffers—if you allocated a cacheable buffer—and then call MapTransfer. Your DPC routine will call FlushAdapterBuffers and FreeAdapterChannel. In all of these calls, you'll specify the common buffer's MDL instead of the one that accompanied the read or write IRP you're processing. Some of the service routines you call won't do as much work when you have a common buffer as when you don't, but you must call them anyway. At the end of an input operation, you might need to copy data out of your common buffer to some other place.
To fulfill a request to read or write more data than fits in your common buffer, you might need to periodically refill or empty the buffer. The adapter object's ReadDmaCounter function allows you to determine the progress of the ongoing transfer to help you decide what to do.
If your device is a bus master, allocating a common buffer allows you to dispense with calling AllocateAdapterChannel, MapTransfer, and FreeMapRegisters. You don't need to call those routines because AllocateCommonBuffer also reserves the map registers, if any, needed for your device to access the buffer. Each bus-master device has an adapter object that isn't shared with other devices and for which you therefore need never wait. Since you have a virtual address you can use to access the buffer at any time, and since your device's bus-mastering capability allows it to access the buffer by using the physical address you've received back from AllocateCommonBuffer, no additional work is required.
A few cautions are in order with respect to common buffer allocation and usage. Physically contiguous memory is scarce in a running system—so scarce that you might not be able to allocate the buffer you want unless you stake your claim quite early in the life of a new session. The Memory Manager makes a limited effort to shuffle memory pages around to satisfy your request, and that process can delay the return from AllocateCommonBuffer for a period of time. But the effort might fail, and you must be sure to handle the failure case. Not only does a common buffer tie up potentially scarce physical pages, but it can also tie up map registers that could otherwise be used by other devices. For both these reasons, you should use a common-buffer strategy advisedly.
Another caution about common buffers arises from the fact that the Memory Manager necessarily gives you one or more full pages of memory. Allocating a common buffer that's just a few bytes long is wasteful and should be avoided. On the other hand, it's also wasteful to allocate several pages of memory that don't actually need to be physically contiguous. As the DDK suggests, therefore, it's better to make several requests for smaller blocks if the blocks don't have to be contiguous.
You would ordinarily release the memory occupied by your common buffer in your StopDevice routine just before you destroy the adapter object:
(*pdx->AdapterObject->DmaOperations->FreeCommonBuffer) (pdx->AdapterObject, <length>, pdx->paCommonBuffer, pdx->vaCommonBuffer, FALSE); |
The second parameter to FreeCommonBuffer is the same length value you used when you allocated the buffer. The last parameter indicates whether the memory is cacheable, and it should be the same as the last argument you used in the call to AllocateCommonBuffer.
The PKTDMA sample driver on the companion disc illustrates how to perform bus-master DMA operations without scatter/gather support using the AMCC S5933 PCI matchmaker chip. I've already discussed details of how this driver initializes the device in StartDevice and how it initiates a DMA transfer in StartIo. I've also discussed nearly all of what happens in this driver's AdapterControl and DpcForIsr routines. I indicated earlier that these routines would have some device-dependent code for starting an operation on the device; I wrote a helper function named StartTransfer for that purpose:
1 1 2 3 |
VOID StartTransfer(PDEVICE_EXTENSION pdx, PHYSICAL_ADDRESS address, BOOLEAN isread) { ULONG mcsr = READ_PORT_ULONG((PULONG)(pdx->portbase + MCSR); ULONG intcsr = READ_PORT_ULONG((PULONG)(pdx->portbase + INTCSR); if (isread) { mcsr |= MCSR_WRITE_NEED4 | MCSR_WRITE_ENABLE; intcsr |= INTCSR_WTCI_ENABLE; WRITE_PORT_ULONG((PULONG)(pdx->portbase + MWTC), pdx->xfer); WRITE_PORT_ULONG((PULONG)(pdx->portbase + MWAR), address.LowPart); } else { mcsr |= MCSR_READ_NEED4 | MCSR_READ_ENABLE; intcsr |= INTCSR_RTCI_ENABLE; WRITE_PORT_ULONG((PULONG)(pdx->portbase + MRTC), pdx->xfer); WRITE_PORT_ULONG((PULONG)(pdx->portbase + MRAR), address.LowPart); } WRITE_PORT_ULONG((PULONG)(pdx->portbase + INTCSR), intcsr); WRITE_PORT_ULONG((PULONG)(pdx->portbase + MCSR), mcsr); } |
This routine sets up the S5933 operations registers for a DMA transfer and then starts the transfer running. The steps in the process are:
It's not obvious from this fragment of code, but the S5933 is actually capable of doing a DMA read and a DMA write at the same time. I wrote PKTDMA in such a way that only one operation (either a read or a write) can be occurring. To generalize the driver to allow both kinds of operation to occur simultaneously, you would need to (a) implement separate queues for read and write IRPs, and (b) create two device objects and two adapter objects—one pair for reading and the other for writing—so as to avoid the embarrassment of trying to queue the same object twice inside AllocateAdapterChannel. I thought putting that additional complication into the sample would end up confusing you. (I know I'm being pretty optimistic about my expository skills to imply that I haven't already confused you, but it could have been worse.)
PCI42 included an interrupt routine that did a small bit of work to move some data. PKTDMA's interrupt routine is a little simpler:
1 2 3 |
BOOLEAN OnInterrupt(PKINTERRUPT InterruptObject, PDEVICE_EXTENSION pdx) { ULONG intcsr = READ_PORT_ULONG((PULONG) (pdx->portbase + INTCSR)); if (!(intcsr & INTCSR_INTERRUPT_PENDING)) return FALSE; ULONG mcsr = READ_PORT_ULONG((PULONG) (pdx->portbase + MCSR)); WRITE_PORT_ULONG((PULONG) (pdx->portbase + MCSR), mcsr & ~(MCSR_WRITE_ENABLE | MCSR_READ_ENABLE)); intcsr &= ~(INTCSR_WTCI_ENABLE | INTCSR_WTCI_ENABLE); BOOLEAN dpc = GetCurrentIrp(&pdx->dqReadWrite) != NULL; while (intcsr & INTCSR_INTERRUPT_PENDING) { InterlockedOr(&pdx->intcsr, intcsr); WRITE_PORT_ULONG((PULONG) (pdx->portbase + INTCSR), intcsr); intcsr = READ_PORT_ULONG((PULONG) (pdx->portbase + INTCSR)); } if (dpc) IoRequestDpc(pdx->DeviceObject, NULL, NULL); return TRUE; } |
I'll only discuss the ways in which this ISR differs from the one in PCI42:
You can test PKTDMA if you have an S5933DK1 development board. If you ran the PCI42 test, you already installed the S5933DK1.SYS driver to handle the ISA add-on interface card. If not, you'll need to install that driver for this test. Then install PKTDMA.SYS as the driver for the S5933 development board itself. You can then run the TEST.EXE test program that's in the PKTDMA\TEST\DEBUG directory. TEST will perform a write for 8192 bytes to PKTDMA. It will also issue a DeviceIoControl to S5933DK1 to read the data back from the add-on side, and it will verify that it read the right values.