Blog Post

Windows Kernel Stack is a Precious Little Resource

EXCEPTION_DOUBLE_FAULT and the UNEXPECTED_KERNEL_MODE_TRAP (7f) bugcheck.

Windows Kernel Stack is a Precious Little Resource - EXCEPTION_DOUBLE_FAULT and the UNEXPECTED_KERNEL_MODE_TRAP (7f) bugcheck.

Intro

If you've done any kernel programming in Windows you probably heard about the kernel stack being quite limited in size. The following BSOD (crash, or a "bugcheck") will illustrate what happens in practice when a kernel driver exhausts the size of the Windows kernel stack.

I learned it the hard way on my own example.

The Crash

I was working on my kernel driver and was adding a debugging feature to it when all of a sudden the target OS crashed with the following mysterious bugcheck that seemed like it came out of nowhere:

WinDbg - Bugcheck[Copy]
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
BugCheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: ffffd78058b61e50
Arg3: ffffd28f6bfd3ff0
Arg4: fffff80142817109

And when I retrieved the callstack it had the following sequence of functions:

WinDbg Callstack[Copy]
00 nt!DbgBreakPointWithStatus
01 nt!KiBugCheckDebugBreak+0x12
02 nt!KeBugCheck2+0x946
03 nt!KeBugCheckEx+0x107
04 nt!KiBugCheckDispatch+0x69
05 nt!KiDoubleFaultAbort+0x2c3
06 nt!HalpInterruptSendIpi+0x39
07 nt!HalRequestIpiSpecifyVector+0x7e
08 nt!KiIpiSendRequest+0x38b
09 nt!KeFlushMultipleRangeTb+0x12b
0a nt!MiFlushTbList+0x88
0b nt!MiDeleteNonPagedPoolTail+0x19
0c nt!MiClearNonPagedPtes+0x181
0d nt!MmFreePoolMemory+0x1aa
0e nt!RtlpHpEnvFreeVA+0x12
0f nt!RtlpHpFreeVA+0x3a
10 nt!RtlpHpSegMgrCommit+0x1e3
11 nt!RtlpHpSegPageRangeCommit+0x1ee
12 nt!RtlpHpSegPageRangeCoalesce+0x1b1
13 nt!RtlpHpSegPageRangeShrink+0xeb
14 nt!RtlpHpSegFree+0x9c
15 nt!RtlpHpFreeHeap+0xac
16 nt!ExpFreeHeapSpecialPool+0x221
17 nt!ExFreeHeapPool+0x1b42e0
18 nt!ExFreePool+0x9
19 nt!VerifierExFreePoolWithTag+0x31
1a Wdf01000!FxPoolFree+0x39 [minkernel\wdf\framework\shared\object\wdfpool.cpp @ 456] 
1b Wdf01000!FxObject::operator delete+0x11 [minkernel\wdf\framework\shared\object\fxobject.cpp @ 326] 
1c Wdf01000!FxMemoryBuffer::`scalar deleting destructor'+0x3f
1d Wdf01000!FxObject::SelfDestruct+0x1b [minkernel\wdf\framework\shared\inc\private\common\fxobject.hpp @ 453] 
1e Wdf01000!FxObject::ProcessDestroy+0xb8 [minkernel\wdf\framework\shared\object\fxobjectstatemachine.cpp @ 403] 
1f Wdf01000!FxObject::FinalRelease+0x1c [minkernel\wdf\framework\shared\object\fxobject.cpp @ 249] 
20 Wdf01000!FxObject::Release+0x4c [minkernel\wdf\framework\shared\inc\private\common\fxobject.hpp @ 881] 
21 Wdf01000!FxMemoryObject::Release+0x56 [minkernel\wdf\framework\shared\inc\private\common\FxMemoryObject.hpp @ 146] 
22 Wdf01000!FxObject::DeletedAndDisposedWorkerLocked+0x3e [minkernel\wdf\framework\shared\object\fxobjectstatemachine.cpp @ 1246] 
23 Wdf01000!FxObject::DeleteWorkerAndUnlock+0xd1 [minkernel\wdf\framework\shared\object\fxobjectstatemachine.cpp @ 972] 
24 Wdf01000!FxObject::DeleteObject+0x19d [minkernel\wdf\framework\shared\object\fxobjectstatemachine.cpp @ 124] 
25 Wdf01000!imp_WdfObjectDelete+0x6f [minkernel\wdf\framework\shared\object\fxobjectapi.cpp @ 319] 
26 MyDriver!WdfObjectDelete+0x3c [C:\Program Files (x86)\Windows Kits\10\Include\wdf\kmdf\1.15\wdfobject.h @ 743] 
27 MyDriver!std::kstring::_insertFromStrW+0x39d [C:\C++\MyDriver\kstrings.cpp @ 3495] 
28 MyDriver!std::kstring::operator+=+0x2d [C:\C++\MyDriver\kstrings.cpp @ 346] 
29 MyDriver!MYSTRUCT::toDebugStr+0x150 [C:\C++\MyDriver\MyStruct.cpp @ 66] 
2a MyDriver!DevMain::_complete_pending_read_request+0x207 [C:\C++\MyDriver\DevMain.cpp @ 18065] 
2b MyDriver!WDFREQUEST2::completeRequestWithChecks+0x3f6 [C:\C++\MyDriver\WdfRequest2.cpp @ 1038] 
2c MyDriver!DevMain::completePendingReadRequest+0x68b [C:\C++\MyDriver\DevMain.cpp @ 4560] 
2d MyDriver!DevMain::processRawURB+0x1c4f [C:\C++\MyDriver\DevMain.cpp @ 6369] 
2e MyDriver!DevMain::processURB_FromIO+0xa6 [C:\C++\MyDriver\DevMain.cpp @ 4784] 
2f MyDriver!DevMain::endpointIoInternalDeviceControl+0x33b [C:\C++\MyDriver\DevMain.cpp @ 3377] 
30 Wdf01000!FxIoQueueIoInternalDeviceControl::Invoke+0x46 [minkernel\wdf\framework\shared\inc\private\common\FxIoQueueCallbacks.hpp @ 267] 
31 Wdf01000!FxIoQueue::DispatchRequestToDriver+0x2ce [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 3343] 
32 Wdf01000!FxIoQueue::DispatchEvents+0x520 [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 3125] 
33 Wdf01000!FxIoQueue::QueueRequestFromForward+0x98 [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 2496] 
34 Wdf01000!FxIoQueue::ForwardRequestWorker+0x11b [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 1444] 
35 Wdf01000!FxIoQueue::ForwardRequest+0x26 [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 1781] 
36 Wdf01000!imp_WdfRequestForwardToIoQueue+0xce [minkernel\wdf\framework\shared\core\fxrequestapi.cpp @ 3149] 
37 udecx!UsbDevice_EvtControlTransferUrb+0x5c9
38 Wdf01000!FxIoQueueIoInternalDeviceControl::Invoke+0x46 [minkernel\wdf\framework\shared\inc\private\common\FxIoQueueCallbacks.hpp @ 267] 
39 Wdf01000!FxIoQueue::DispatchRequestToDriver+0x2ce [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 3343] 
3a Wdf01000!FxIoQueue::DispatchEvents+0x520 [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 3125] 
3b Wdf01000!FxIoQueue::QueueRequest+0xae [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 2371] 
3c Wdf01000!FxPkgIo::DispatchStep2+0x486 [minkernel\wdf\framework\shared\irphandlers\io\fxpkgio.cpp @ 482] 
3d Wdf01000!imp_WdfDeviceWdmDispatchIrpToIoQueue+0x17c [minkernel\wdf\framework\shared\core\km\fxdeviceapikm.cpp @ 495] 
3e ucx01000!UrbHandler_USBPORTStyle_Legacy_SCT_GetSetDescriptor+0x1dc
3f ucx01000!Urb_USBPORTStyle_ProcessURB+0x7b9
40 ucx01000!RootHub_Pdo_EvtInternalDeviceControlIrpPreprocessCallback+0xb5
41 Wdf01000!PreprocessIrp+0x2e [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1519] 
42 Wdf01000!DispatchWorker+0x17b [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1582] 
43 Wdf01000!FxDevice::Dispatch+0x199 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1603] 
44 Wdf01000!FxDevice::DispatchWithLock+0x267 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1447] 
45 nt!IopfCallDriver+0x53
46 nt!IovCallDriver+0x266
47 nt!IofCallDriver+0x1af711
48 UsbHub3!HUBPDO_EvtDeviceWdmIrpPreprocess+0x11d2
49 Wdf01000!PreprocessIrp+0x2e [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1519] 
4a Wdf01000!DispatchWorker+0x17b [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1582] 
4b Wdf01000!FxDevice::Dispatch+0x199 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1603] 
4c Wdf01000!FxDevice::DispatchWithLock+0x267 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1447] 
4d nt!IopfCallDriver+0x53
4e nt!IovCallDriver+0x266
4f nt!IofCallDriver+0x1af711
50 hidusb!HumCallUSB+0x1bb
51 hidusb!HumGetDescriptorRequest+0x18f
52 hidusb!HumGetReportDescriptor+0x9a
53 hidusb!HumInternalIoctl+0x192
54 HIDCLASS!HidpCallDriver+0xba
55 HIDCLASS!HidpCallDriverSynchronous+0x68
56 HIDCLASS!GetHIDRawReportDescriptor+0x99
57 HIDCLASS!HidpGetDeviceDescriptor+0xb8
58 HIDCLASS!AllocDeviceResources+0xb6
59 HIDCLASS!HidpFdoInitializeDevice+0x12
5a HIDCLASS!HIDSM_InitializingDevice+0x10
5b HIDCLASS!HIDSM_ExecuteEntryFunctionsAndPushPopStateMachinesForCurrentState+0x4a
5c HIDCLASS!HIDSM_AddEvent+0x650
5d HIDCLASS!HIDSM_AddHidsmEvent+0x10
5e HIDCLASS!HidpStartDevice+0x13d
5f HIDCLASS!HidpFdoPnp+0xe7
60 HIDCLASS!HidpIrpMajorPnp+0x71
61 HIDCLASS!HidpMajorHandler+0x1b1
62 nt!IopfCallDriver+0x53
63 nt!IovCallDriver+0x266
64 nt!IofCallDriver+0x1af711
65 nt!PnpAsynchronousCall+0xea
66 nt!PnpSendIrp+0x9e
67 nt!PnpStartDevice+0x88
68 nt!PnpStartDeviceNode+0xec
69 nt!PipProcessStartPhase1+0x73
6a nt!PipProcessDevNodeTree+0x319
6b nt!PiProcessReenumeration+0x88
6c nt!PnpDeviceActionWorker+0x206
6d nt!ExpWorkerThread+0x105
6e nt!PspSystemThreadStartup+0x55
6f nt!KiStartSystemThread+0x28

The fault and the stack itself didn't look like anything that I was expecting.

The Reasoning

Usually the EXCEPTION_DOUBLE_FAULT takes place when an exception happens when processing another exception. This usually happens as the result of some serious hardware failure, such as a DDR fault, for instance. At first I was suspecting a similar cause and re-ran my test. To my surprise the exact same bugcheck repeated itself.

That is how I knew that that was not a hardware failure.

A hardware failure would either not allow me to boot Windows, or would not reproduce at the exact same spot with the same callstack.

My next guess was a memory corruption of some important kernel structure. Mostly something that was responsible for handling exceptions in the kernel. Thus I spent the next couple of hours reviewing my recent code changes that could indicate where my newly added code could be corrupting the kernel memory.

To my frustration, I found none.

The Kernel Stack

The next possibility was the kernel stack corruption. Since the exception handler (and primary stack unwinding procedure) relies on the stack pointer being in an accessible location, and any failure in that area would result in a catastrophic failure like I saw above, I started looking where my code could've been corrupting the stack.

The size of the callstack kinda gave me the first clue - it was unusually long. But I couldn't see anything wrong with it. Yes, the KMDF (framework) functions take a good chunk of the stack, but that was outside of my control.

Then when I failed to find a reasonable explanation to what happened I started approaching it using the method of exclusion. That is when you start commenting out any freshly added code and re-run the tests until the crash goes away. It happened when I commented out my trckr debugging structure that I added to my driver.

It went something like this. My driver-specific struct was declared as such:

C++[Copy]
struct MYSTRUCT
{
    LIST_ENTRY listOthers;
    WDFREQUEST wdfRequest;
    WDFMEMORY wdfMemory;

    ULONG nRefCount;
    // ...

#ifdef DBG
    REF_CNT_TRACKER trckr[4];
#endif

    void toDebugStr() const;
};

static_assert(__is_trivially_copyable(MYSTRUCT), "The class must only have members that can be copied via memcpy");

I was testing to see if that struct was deallocated at the right moment and wanted to add some extra debugging details to it to let me track instances when something increments and then decrements its reference count in nRefCount. Thus I added the trckr member to it, that would compile only in the debugging build of the driver.

The tracking struct was defined as such:

C++[Copy]
#pragma pack(push)
#pragma pack(1)
struct REF_CNT_TRACKER_ITEM
{
    short nSpecCode;
    USHORT nRefCount;
    DWORD dwThreadID;
};
#pragma pack(pop)

struct REF_CNT_TRACKER
{
    volatile LONG nNextIdx;                     //Index in 'ringBuffer'
    REF_CNT_TRACKER_ITEM ringBuffer[0x100];
};

The exact workings of the tracking algorithm does not matter for this post. What matters though, is that adding it to MYSTRUCT had also added 2052 bytes in size.

In most part it didn't matter though because an instance of MYSTRUCT was allocated on the heap as the WDF object context:

C++[Copy]
WDFMEMORY wdfMem;
status = WdfMemoryCreateFromLookaside(wdfLookaside, &wdfMem);
if(status == STATUS_SUCCESS)
{
	MYSTRUCT* pStruct = WDFMEMORY_to_MYSTRUCT(wdfMem);
	memset(pStruct, 0, sizeof(*pStruct));

	// ...
}

Where the context was declared as such:

C++[Copy]
WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(MYSTRUCT, WDFMEMORY_to_MYSTRUCT);

The construct above will allocate the memory for MYSTRUCT from the kernel heap. Which is the way it is supposed to be. No issues there.

The Bug

The issue with the ballooned size of MYSTRUCT was in my debugging MYSTRUCT::toDebugStr function that was supposed to write the contents of MYSTRUCT into the WPP log. It was trying to be "smart about it" and first make a copy of the memory used by an instance of MYSTRUCT and then log from that memory. The thinking behind it was to eliminate a possible crash if an instance of MYSTRUCT was referring to some bad location in memory. In other words, the toDebugStr function was attempting not the crash the OS if someone asked it to log some bad instance of the MYSTRUCT.

It did it roughly as such:

Buggy![Copy]
//WARNING: BUGGY function, DO NOT USE!!!
//
void MYSTRUCT::toDebugStr() const
{
	if(isSystemVaRangeValid(this, sizeof(*this)))
    {
        MYSTRUCT _this;
        memcpy(&_this, this, sizeof(*this));

        //Do some prep-work
        //...

        //Finally log into WPP
        TraceEvents(TRACE_LEVEL_VERBOSE, TRACE_QUEUE, 
            "listOthers(%p):{F=%p,B=%p}, wdfRequest=%p, wdfMemory=%p, nRefCount=%u",
            &listOthers,
            _this.listOthers.Flink,
            _this.listOthers.Blink,
            _this.wdfRequest,
            _this.wdfMemory,
            _this.nRefCount);
	}
}

The idea was to check memory reserved for MYSTRUCT before reading from it during logging. (After all, you don't want your logging function to crash the target OS.) That check was done in the isSystemVaRangeValid function:

I know that the isSystemVaRangeValid would not guarantee that the following reading from that memory would not fail. This was just a quick way of eliminating a read from an obviously bad virtual address range.
C++[Copy]
BOOLEAN isSystemVaRangeValid(__in const void* pVa, __in size_t szcbSz)
{
    if((intptr_t)szcbSz <= 0)
    {
        //Fail if size is zero or negative
        return FALSE;
    }

    if(pVa < MmSystemRangeStart)
    {
        //Don't bother if it's not in a kernel range
        return FALSE;
    }

    intptr_t nCntPgs = ADDRESS_AND_SIZE_TO_SPAN_PAGES(pVa, szcbSz);
    for(PUCHAR* pB = (PUCHAR*)pVa; nCntPgs > 0; pB += PAGE_SIZE, nCntPgs--)
    {                                                                       
        if(!MmIsAddressValid(pB))
        {                                                                   
            //Bad address
            return FALSE;                                  
        }
    }

    return TRUE;
}

If you look at the callstack the crash happened in my MYSTRUCT::toDebugStr function. But why?

As an exercise, go ahead and try to figure it out on your own.

My actual toDebugStr was doing more work by utilizing my custom std::kstring class for easier handling of strings in the kernel space, akin to what STL does, but that is beside the point of this example.

An astute observer will notice that my MYSTRUCT::toDebugStr was creating a local copy of MYSTRUCT in _this variable. It could do so because I deliberately chose to use only primitive data types for members of MYSTRUCT. The static_assert with the Microsoft's __is_trivially_copyable C++ template ensures that MYSTRUCT does not have any complex members that require construction and destruction. With those members in place, doing a memcpy on them would create a bug.

But, if you remember about local variables, the memory for them is reserved on the stack. Thus, my _this variable in the MYSTRUCT::toDebugStr function was using the kernel stack.

At first, the original MYSTRUCT, without my later addition of REF_CNT_TRACKER was small enough not to overflow the kernel stack. But when I added that debugging struct, which increased it by 2052 bytes, that was enough to push the kernel stack over the limit.

You can learn about the limitations of the kernel stack in this MSDN article. At the moment of this writing, its size is limited to about 3 PAGE_SIZEs, or to about 12k bytes.

The sequence of failures started from the callstack being already quite long (thanks in part to the WDF's extra functions.) Then my MYSTRUCT::toDebugStr reserved additional space from the stack for storing my REF_CNT_TRACKER. But that in itself didn't send it over the edge. Purely by chance, when that function attempted to free memory in the WdfObjectDelete call (which I didn't include in my example for brevity), coupled with additional space reserved by the Driver Verifier and the need to flush TB list (in nt!MiFlushTbList), that sequence was enough to completely exhaust the limits of the kernel stack.

As a result, something inside the memory manager's logic failed to access the stack and called the nt!KiDoubleFaultAbort that raised an unrecoverable UNEXPECTED_KERNEL_MODE_TRAP bugcheck with the EXCEPTION_DOUBLE_FAULT subtype. Most likely this happened because the memory manager had no kernel stack left to handle its functions any further.

This was technically not a double-fault exception, but a lot of code inside the kernel memory manager assumes that it already handles a fault, such as a page-fault for instance, thus it could've made the double-fault assumption that we saw.

What matters in our case is that the kernel stack exhaustion had crashed the system, and that was the cause for the bugcheck.

It took me a few hours to discover what happened.

The Fix

The fix in this case was not to allocate my _this variable from the stack, and use the heap instead:

Fixed[Copy]
void MYSTRUCT::toDebugStr() const
{
	if(isSystemVaRangeValid())
    {
        //Allocate memory from the heap
        PVOID pMem = ExAllocatePool2(POOL_FLAG_NON_PAGED, sizeof(*this), '1gat');
        if(pMem)
        {
            MYSTRUCT& _this = *(MYSTRUCT*)pMem;
            memcpy(&_this, this, sizeof(*this));

            //Do some prep-work
            //...

            //Finally log into WPP
            TraceEvents(TRACE_LEVEL_VERBOSE, TRACE_QUEUE, 
                "listOthers(%p):{F=%p,B=%p}, wdfRequest=%p, wdfMemory=%p, nRefCount=%u",
                &listOthers,
                _this.listOthers.Flink,
                _this.listOthers.Blink,
                _this.wdfRequest,
                _this.wdfMemory,
                _this.nRefCount);

            //And need to remember to free it
            ExFreePool(pMem);
		}
	}
}

It requires a little bit more work, and is somewhat slower, but it fully resolves the issue of stack exhaustion that lead to this crash.

Conclusion

The sequence above and the resolution is quite straightforward. Identifying it though, was anything but.

Unfortunately, a lot of the kernel crashes are not very well documented, or even labeled to quickly clue you in to the actual cause of the crash. Thus, I'll be posting these to share the debugging process in hopes of letting you write less buggy drivers.

Stay tuned...

Related Articles