Reverse Engineering Virtual Functions Compiled With Visual Studio C++ Compiler - Part 1 - Understanding virtual function tables, vtable, __purecall, novtable, Control Flow Guard.

Intro

If you're planning to reverse engineer binary code that was compiled with a Microsoft Visual Studio C++ compiler you will most certainly encounter virtual functions. They are not really a rocket science to grasp, but coupled with an inherent inability of modern decompilers to properly translate them, and complicated by the use of the Control Flow Guard feature by the compiler, virtual functions may throw off a novice reverse engineer.

This blog post with hopefully answer the questions posed in the title and will give some guidance for when you encounter these concepts.

For an easier navigation here's the table of contents:

Setup
Disassembly
Class Sizes
Virtual Function Table - vtable / vftable
Members In Inherited Classes
__purecall Functions
novtable Directive
Control Flow Guard
Conclusion

Setup

Let's set up our work bench first. The easiest way to go is to create a small C++ project in Visual Studio, add some test code into it and then check how it looks in a compiled form. This way we won't have to deal with the intricacies of our actual binary that we need to reverse engineer, plus we can also adjust parameters for our tests by simply re-compiling this C++ project.

Below are my proposed test classes.

Please don't try to find any logic in this code. The only reason it exists is to illustrate the concept of virtual functions in C++.
Additionally, I'm instantiating member variables with random constants for the purpose of being able to distinguish them in the compiled code. If I set them all to 0's, this will complicate the readability of the produced Assembly code.

Class S_0 is what you would call the base class:

C++[Copy]

struct S_0
{
	volatile unsigned int _count = 100;		     //hex 0x64
	const void* _pPtr = NULL;
	int _nPtrSet = 0x234;

	virtual __declspec(noinline) void Query(int v) = 0;
	virtual __declspec(noinline) int AddRef() = 0;
	virtual __declspec(noinline) int Release() = 0;

	virtual __declspec(noinline) bool SetPtr(const void* p)
	{
		_pPtr = p;
		_nPtrSet = 0x998;

		return true;
	}

	virtual __declspec(noinline) const void* GetPtr()
	{
		return _nPtrSet != 0x234 ? _pPtr : NULL;
	}

	int GetPtrSetFlag()
	{
		return _nPtrSet;
	}
};

Note my use of the Microsoft-specific __declspec(noinline) directive. It's there only to force the compiler not to inline those functions. It may not do it anyway, but I just needed to be sure. Without inlining this will improve the readability of the resulting assembly code.

You would generally not use __declspec(noinline) in your production code.

Then class S_1. It is there just so that we can derive our S_2 from it:

C++[Copy]

struct S_1
{
	int _n = 0xabc;

	virtual void __declspec(noinline) SetN(int n)
	{
		_n = n;
	}

	virtual int __declspec(noinline) GetN()
	{
		return _n;
	}
};

And finally, S_2 is the class that we will be reviewing further:

C++[Copy]

struct S_2 : public S_1, public S_0
{
	int _v = 0xee2;
	
	void __declspec(noinline) Query(int v)
	{
		SetN(0xFE);
		AddRef();
		if(GetPtr())
		{
			_v = v;
		}
	}

	__declspec(noinline) int AddRef()
	{
		return InterlockedIncrement(&_count);
	}

	__declspec(noinline) int Release()
	{
		int v = InterlockedDecrement(&_count);
		if(!v)
		{
			_v = 0xee2;
		}

		return v;
	}

	__declspec(noinline) bool SetPtr(const void* p)
	{
		bool bRes = __super::SetPtr(p);
		if(!bRes)
		{
			_v = 0xff1;
		}

		return bRes;
	}

	__declspec(noinline) void IncV()
	{
		if(_v != 0xee2 &&
			_v != 0xff1)
		{
			_v += 0x23;
		}
	}
};

As you can see, the main difference in the S_2 class is that it inherits from the S_1 and S_0 classes.

Last thing we need for our test is to create an instance of the S_2 class. I'll do it in a function:

C++[Copy]

void __declspec(noinline) TestVirtualFunctions()
{
	S_2* s2 = new S_2();

	s2->Query(0x71);
	
	s2->SetPtr(L"Hello!");

	const void* p = s2->GetPtr();
	if(p)
	{
		s2->Release();
	}

	delete s2;
}

Then let's compile it for an optimized Release configuration and see what we've got.

I will be compiling it for the x64 (Intel) CPU since that architecture seems to be the most poplar today. But you can build it for ARM64, if you please. It won't change the logic that I will describe below, although the Assembly code will be different.

Disassembly

We don't really need an external debugger at this point to get the disassembly. We can use Visual Studio for that. Simply put a breakpoint at the beginning of the TestVirtualFunctions and start debugging, by pressing F5. Then when the breakpoint hits, press Ctrl+F11 to show disassembly.

Pressing Ctrl+F11 several times will toggle the disassembly view.

I'll skip the epilogue and the prologue of that function and go straight to the meat of the issue:

TestVirtualFunctions[Copy]

    mov         ecx,38h                               ; Size of the S_2 class in bytes
    call        qword ptr [__imp_operator new]        ; 'new' operator to allocate memory for the S_2 class

    mov         rbx,rax                               ; RBX = 'this' pointer to S_2 class
    test        rax,rax  
    je          lbl_null_ptr                          ; Skip down if allocation failed

    xor         eax,eax                               ; Construct S_1 class first
    mov         qword ptr [rbx+8],0ABCh               ; S_1::_n
    mov         qword ptr [rbx+18h],rax               ; Reset all members to zeros
    lea         rcx,[S_1::'vftable' (014000FDD8h)]    ; Set up S_1::'vftable' (that will be overwritten later)
    mov         qword ptr [rbx+20h],rax  
    mov         qword ptr [rbx+28h],rax  
    mov         qword ptr [rbx+30h],rax  
    mov         qword ptr [rbx],rcx                    ; Since we're constructing S_1, then use 'S_1::vftable'

    lea         rcx,[S_0::'vftable' (014000FDA8h)]     ; Then construct S_0 class
    mov         qword ptr [rbx+20h],rax  
	lea         rax,[S_2::'vftable' (014000FD90h)]  
    mov         qword ptr [rbx+10h],rcx  
    mov         qword ptr [rbx],rax  

    lea         rax,[S_2::'vftable' (014000FD60h)]     ; Finally, construct S_2 class (by overriding S_0 vtable)
    mov         qword ptr [rbx+10h],rax  
    mov         dword ptr [rbx+18h],64h  
    mov         dword ptr [rbx+28h],234h  
    mov         dword ptr [rbx+30h],0EE2h  
    jmp         lbl_cont_1 

lbl_null_ptr:

    mov         rbx,rax                               ; RBX = Set 'this' pointer to S_2 class to NULL

lbl_cont_1:

    mov         rax,qword ptr [rbx+10h]               ; RAX = pointer to the vtable for the S_2 class
    lea         rcx,[rbx+10h]                         ; RCX = 'this' pointer to the S_2 class
    mov         edx,71h                               ; RDX = input parameter
    call        qword ptr [rax]                       ; invoking s2->Query(0x71)

    mov         rax,qword ptr [rbx+10h]               ; same logic here (and further down below)
    lea         rdx,[string L"Hello!"]  
    lea         rcx,[rbx+10h]  
    call        qword ptr [rax+18h]                   ; invoking s2->SetPtr(L"Hello!")

    mov         rax,qword ptr [rbx+10h]  
    lea         rcx,[rbx+10h]  
    call        qword ptr [rax+20h]                   ; invoking s2->GetPtr()

    test        rax,rax  
    je          lbl_cont_2                            ; if previous call returned 0, then skip the next call

    mov         rax,qword ptr [rbx+10h]  
    lea         rcx,[rbx+10h]  
    call        qword ptr [rax+10h]                   ; invoking s2->Release()

lbl_cont_2:

For brevity I didn't include Assembly code for the last delete operator. It is not relevant for our example.

Let's look at the assembly code above, piece by piece.

First, the new operator allocates memory for the S_2 class. The interesting detail is its size in bytes. Note that it is 0x38 bytes. But why so?

Class Sizes

Let's add some test code to get sizes of each class:

C++[Copy]

int size_s0 = sizeof(S_0);          //0x20
int size_s1 = sizeof(S_1);          //0x10
int size_s2 = sizeof(S_2);          //0x38

We can see that each class reserves memory to store data for its member variables.

For instance, S_0 has 3 members: _count of type unsigned int, _pPtr of type void*, and _nPtrSet of type int. Thus, remembering that we compiled it for a 64-bit CPU, and assuming that the type int takes up 4 bytes (with the Microsoft compiler), and type void* takes up 8 bytes, all 3 members should've used up 16 bytes, or 0x10. But why is sizeof(S_0) giving us 0x20 (or 32)?

The first part of the answer lies in the way the 64-bit compiler is aligning class members. For performance reasons (and also to comply with some CPU alignment requirements) the compiler aligns every member on an 8-byte boundary. Which means that the 1st and 3rd members of the type int, or _count and _nPtrSet, have a 4-byte gap, or unused padding after them.

But it still gives us only: (4 + 4) + 8 + (4 + 4) = 24. What are the other 8 bytes used for?

Those 8-bytes go into storing a pointer to what is known as the vtable.

Thus, our layout of the memory for the S_0 class becomes:

S_0 Class Layout[Copy]

+0x0    8-bytes:   S_0::'vftable'
+0x8    4-bytes:   _count
+0xC    4-bytes:   --------------  <padding>
+0x10   8-bytes:   _pPtr
+0x18   4-bytes:   _nPtrSet
+0x1C   4-bytes:   --------------  <padding>
+0x20

And, that is why the size of S_0 is 0x20 bytes.

We can apply similar logic to create a diagram of the size of the S_1 class:

S_1 Class Layout[Copy]

+0x0    8-bytes:   S_1::'vftable'
+0x8    4-bytes:   _n
+0x0C   4-bytes:   --------------  <padding>
+0x10

S_1 has only one member _n of type int. But it also has a vtable, that goes first in its memory layout. Thus, we get its size as 0x10 bytes.

The final piece of the puzzle is the size of the S_2 class. As you can see, its size becomes the sum of the sizes of the S_0 and S_1 classes, plus its own member _v of type int and its padding. This is understandable, since S_2 class inherits from both S_0 and S_1. Or, in other words, it "combines" their capabilities, that we can see in the memory layout of its members.

If we bring up the memory pane in the Visual Studio, and point it to the beginning of S_2 class in memory, we can deduce the following layout of its members:

S_2 Class Layout[Copy]

+0x0    8-bytes:   S_2::'vftable'
+0x8    4-bytes:   S_1::_n
+0x0C   4-bytes:   --------------  <padding>
+0x10   8-bytes:   S_0::'vftable'
+0x18   4-bytes:   S_0::_count
+0x1C   4-bytes:   --------------  <padding>
+0x20   8-bytes:   S_0::_pPtr
+0x28   4-bytes:   S_0::_nPtrSet
+0x2C   4-bytes:   --------------  <padding>
+0x30   4-bytes:   S_2::_v
+0x3C   4-bytes:   --------------  <padding>
+0x38

Wow, there's a lot to unpack here. We'll get to it later.

But for now, the question becomes - where does S_1's pointer to its vtable go in S_2? We seem to be 8 bytes short there.

The answer lies in another compiler optimization. But before we can get to it we need to review what a vtable is.

Virtual Function Table - vtable / vftable

Most of the C++ folks call it vtable (which stands for "Virtual Table"), while Microsoft call it vftable (which probably means, "Virtual Function Table".) Both names mean the same thing though.

A vtable is simply an array of virtual function pointers in memory. The order of these functions follows their declarations in the C++ source code, with each function pointer being 8 bytes long (for a 64-bit build.)

The compiler uses this array of function pointers to invoke each function when it needs to.

A careful observer will notice something. Why does it need to store function pointers in a memory array? Why can't it just encode them in a direct CALL instruction?

If you look at the compiled Assembly code, you will notice that if I call some function that is not virtual:
C++[Copy]
TestVirtualFunctions();
It will get compiled into:
x64[Copy]
E8 E9 FE FF FF       call        TestVirtualFunctions
Which is just a direct CALL instruction with a relative offset to the address of the TestVirtualFunctions function.

Most non-virtual functions are compiled into such a CALL instruction.

The answer to the question that was posed above is simple: the vtable, or an array of function pointers, is needed to account for the polymorphism of the C++ language. Or, when one (virtual) function name may represent different class functions. (Check the polymorphism link for an example.)

To account for a changeable nature of a virtual function, the compiler may invoke it as such:

x64[Copy]

    mov         rax,qword ptr [rbx]  
    call        qword ptr [rax+10h]

Note that it takes a pointer to the vtable from the memory pointed by rbx, stores it in the rax register and then invokes a function at the offset 0x10 in the vtable.

It's hard to overlook how confusing this "pointer to pointer" game becomes with virtual functions. And, that is probably why you are also reading this blog post.

By having such function pointer dereference, a compiler can account for the polymorphism by simply switching the vtable from one class to another. In case of the assembly code above, all it needs to do is to provide a different pointer value in rbx register.

Members In Inherited Classes

OK, so let's get back to the layout of the S_2 class in memory. What happened to the vtable for the inherited S_1 class?

To understand, lets dump the contents of the vtables from the disassembly code that we saw above.

Note that there's still an unanswered question of multiple vtables there. We'll have to get back to it later. For now though let's just dump their contents.

Like we concluded above, a vtable is just an array of function pointers.

S_1::vftable (014000FDD8h)[Copy]

Offset  Address      Size       Function Address       Function Declaration 
-----------------------------------------------------------------------------
+0x0    014000FDD8h  8-bytes:   00000001400015f0h      void S_1::SetN(int)
+0x8    014000FDE0h  8-bytes:   0000000140001600h      int S_1::GetN()
+0x10

I provided virtual addresses for each function pointer in the vtable (in the "Address" column), as well as the addresses of each function (in the "Function Address" column.) This will help us distinguish between them and to see their layout pattern.
You may also use these addresses in the Ctrl+F ("search on page") option to have them highlighted in other areas of this blog post.

S_0::vftable (014000FDA8h)[Copy]

Offset  Address      Size       Function Address       Function Declaration 
-----------------------------------------------------------------------------
+0x0    014000FDA8h  8-bytes:   0000000140004092h      __imp__purecall
+0x8    014000FDB0h  8-bytes:   0000000140004092h      __imp__purecall
+0x10   014000FDB8h  8-bytes:   0000000140004092h      __imp__purecall
+0x18   014000FDC0h  8-bytes:   00000001400015c0h      bool S_0::SetPtr(const void*)
+0x20   014000FDC8h  8-bytes:   00000001400015d0h      const void* S_0::GetPtr()
+0x28

Note that the vtable above doesn't have a pointer to the GetPtrSetFlag function since it is not marked as virtual. That function is encoded with just a direct call instruction, like I showed here.

Then the strange looking "intermediate" vtable for the S_2 class:

S_2::vftable (014000FD90h) - intermediate[Copy]

Offset  Address      Size       Function Address       Function Declaration 
-----------------------------------------------------------------------------
+0x0    014000FD90h  8-bytes:   00000001400015f0h      void S_1::SetN(int)
+0x8    014000FD98h  8-bytes:   0000000140001600h      int S_1::GetN()
+0x10   014000FDA0h  8-bytes:   0000000140010f78h      --------------  <padding> 1
+0x18   014000FDA8h  8-bytes:   0000000140004092h      __imp__purecall
+0x20   014000FDB0h  8-bytes:   0000000140004092h      __imp__purecall
+0x28   014000FDB8h  8-bytes:   0000000140004092h      __imp__purecall
+0x30   014000FDC0h  8-bytes:   00000001400015c0h      bool S_0::SetPtr(const void*)
+0x38   014000FDC8h  8-bytes:   00000001400015d0h      const void* S_0::GetPtr()
+0x40   014000FDD0h  8-bytes:   0000000140010f50h      --------------  <padding> 1
+0x48   014000FDD8h  8-bytes:   00000001400015f0h      void S_1::SetN(int)
+0x50   014000FDE0h  8-bytes:   0000000140001600h      int S_1::GetN()
+0x58

And finally:

S_2::vftable (014000FD60h)[Copy]

Offset  Address      Size       Function Address       Function Declaration 
-----------------------------------------------------------------------------
+0x0    014000FD60h  8-bytes:   0000000140001610h     void S_2::Query(int v)
+0x8    014000FD68h  8-bytes:   0000000140001660h     int S_2::AddRef()
+0x10   014000FD70h  8-bytes:   0000000140001670h     int S_2::Release()
+0x18   014000FD78h  8-bytes:   0000000140001690h     bool S_2::SetPtr(const void*)
+0x20   014000FD80h  8-bytes:   00000001400015d0h     const void* S_0::GetPtr()
+0x28   014000FD88h  8-bytes:   0000000140010f28h     --------------  <padding> 1
+0x30   014000FD90h  8-bytes:   00000001400015f0h     void S_1::SetN(int)
+0x38   014000FD98h  8-bytes:   0000000140001600h     int S_1::GetN()
+0x40

Let's try to understand what's going on with those function pointers:

An observant reader may have noticed that every following vtable contains the contents of all previous vtables. That is why I used addresses for each function pointer (in the "Address" column.) You can use them to understand the placement of each vtable in memory.

We can combine the layouts of all vtables above into one:

All vftables combined[Copy]

Offset  Address      Size       Function Address       Function Declaration 
-----------------------------------------------------------------------------
S_2::'vftable' (014000FD60h)

+0x0    014000FD60h  8-bytes:   0000000140001610h     void S_2::Query(int v)
+0x8    014000FD68h  8-bytes:   0000000140001660h     int S_2::AddRef()
+0x10   014000FD70h  8-bytes:   0000000140001670h     int S_2::Release()
+0x18   014000FD78h  8-bytes:   0000000140001690h     bool S_2::SetPtr(const void*)
+0x20   014000FD80h  8-bytes:   00000001400015d0h     const void* S_0::GetPtr()
+0x28   014000FD88h  8-bytes:   0000000140010f28h     --------------  <padding> 1

S_2::'vftable' (014000FD90h) - intermediate

+0x30   014000FD90h  8-bytes:   00000001400015f0h     void S_1::SetN(int)
+0x38   014000FD98h  8-bytes:   0000000140001600h     int S_1::GetN()
+0x40   014000FDA0h  8-bytes:   0000000140010f78h     --------------  <padding> 1

S_0::'vftable' (014000FDA8h)

+0x48   014000FDA8h  8-bytes:   0000000140004092h     __imp__purecall
+0x50   014000FDB0h  8-bytes:   0000000140004092h     __imp__purecall
+0x58   014000FDB8h  8-bytes:   0000000140004092h     __imp__purecall
+0x60   014000FDC0h  8-bytes:   00000001400015c0h     bool S_0::SetPtr(const void*)
+0x68   014000FDC8h  8-bytes:   00000001400015d0h     const void* S_0::GetPtr()
+0x70   014000FDD0h  8-bytes:   0000000140010f50h     --------------  <padding> 1

S_1::'vftable' (014000FDD8h)

+0x78   014000FDD8h  8-bytes:   00000001400015f0h     void S_1::SetN(int)
+0x80   014000FDE0h  8-bytes:   0000000140001600h     int S_1::GetN()
+0x88

As you can see, these vtables follow each other in a sequential order in memory.

In a nutshell, the reason C++ compiler chose to build the final vtable in a such way was to comply with the C++ spec for constructing inherited classes.

You can turn off this behavior by using the novtable directive.

The order of virtual functions in each vtable matches the order at which they are declared in their respective classes. (See S_0, S_1 or S_2 class declarations.)
Again, keep in mind that non-virtual functions, such as S_0::GetPtrSetFlag, are not included in the vtable.
An interesting observation is that a padding pointer to a QWORD value 1 seems to delimit one vtable from the next one in the same class. It appears at the end of a vtable if another vtable, that is a part of the class inheritance, follows it in a sequence.
This is an undocumented behavior, so you shouldn't rely on it.
Note that the compiler is building and including virtual functions even if they are not used anywhere in the code. An example of such function is int S_1::GetN(). As you can see, it is not called anywhere in our code, but the compiler still included it in the vtable, as well as compiled its implementation.
This is different from regular (non-virtual) functions that may be optimized away if nothing is calling them.

__purecall Functions

You might have noticed in the vtable (from the example above) some strange functions with the name __imp__purecall.

I described the __imp prefix in a separate blog post. So I won't touch it here.

Raymond Chen had already described the nature of the "purecall" functions. So please read his blog post, titled "What is __purecall?".

In a nutshell, when C++ objects with inheritance are instantiated, they are constructed in stages. For example. In case of the S_2 class, in stage one the constructor builds up its S_0 class first, then S_1 and then the final S_2 class. This process doesn't happen atomically, and thus for a very brief period the final vtable for the class S_2 needs to be filled up with something. That something becomes those __imp__purecall functions.

__purecall functions are nothing more than just placeholders for a debugging assertion, and an abort(), if someone manages to invoke them in a vtable that is not yet fully constructed.

Or, if __purecall function is invoked in a release build, the process will self-terminate as a security measure.

Let's modify our S_0 class to demonstrate it:

C++[Copy]

struct S_0
{
    S_0()
    {
        invokeQuery();	// BAD! DO NOT DO IT!
    }

    void invokeQuery()
    {
        Query(1);		// This will invoke a __purecall function and abort the process !!!
    }

    virtual __declspec(noinline) void Query(int v) = 0;

	//...
};

As you can see, we're invoking a virtual function from a base constructor in S_0, before it is constructed in a derived S_2 class. In terms of C++, "there's nothing there to invoke". But in terms of a low-level Assembly language, there must be something in the vtable in memory for the Query function pointer. To facilitate debugging, C++ compiler adds a __purecall function as a placeholder before a pointer to the actual Query function is inserted into the vtable. That placeholder function is invoked in the example above, that will display a debugging assertion message in a debugging build, or will simply invoke the abort() function in a release build.

An interesting observation is the way modern Visual Studio C++ compiler handles invocations of virtual functions from within the class constructor. (This is generally a bad idea, due to multiple ways to cause confusion!)
For the sake of the experiment, let's see what happens on the Assembly level if we decide to do it.

If we modify our S_2 class to invoke a virtual function from its constructor:
C++[Copy]
struct S_2 : public S_1, public S_0
{
	S_2()
	{
		Query(0xDA);
	}
		
	//...
};
	
The Query function is invoked as a direct call instruction, technically disregarding the whole vtable concept. (I'm using machine opcodes for the listings below to distinguish between the encodings of different call instructions.)
x64[Copy]
BA DA 00 00 00       mov         edx, 0DAh                   ; RDX = first function parameter, or 0xDA
48 8D 4B 10          lea         rcx, [rbx+10h]              ; RCX = 'this' pointer to class S_2
E8 84 FE FF FF       call        S_2::Query (0140001610h)    ; direct call - encoding starts with E8, followed by a relative offset
But if you invoke the same Query function from outside of a constructor (or destructor) it will be encoded as an indirect call instruction, using the class vtable:
x64[Copy]
48 8B 43 10          mov         rax, qword ptr [rbx+10h]    ; RAX = pointer to the 'vtable' for class S_2
BA 71 00 00 00       mov         edx, 0DAh                   ; RDX = first function parameter, or 0xDA
48 8D 4B 10          lea         rcx, [rbx+10h]              ; RCX = 'this' pointer to class S_2
FF 10                call        qword ptr [rax]             ; indirect call - encoded as: FF 10
My guess is that this is another security measure to prevent bugs when using a vtable that may not be fully constructed, or if it is being torn down from a destructor. By using a direct call instruction, the compiler removes any ambiguity.

novtable Directive

Another interesting test is to see what happens if we use the novtable directive with a class.

Remember our __purecall function test above? Let's modify it to use novtable:

C++[Copy]

struct __declspec(novtable) S_0
{
	S_0()
	{
		invokeQuery();	// BAD! DO NOT DO IT!
	}

	void invokeQuery()
	{
		Query(1);		// This will generate a null-pointer dereference!
	}

	virtual __declspec(noinline) void Query(int v) = 0;

	//...
};

The addition of __declspec(novtable) will remove the staged construction of the vtable, which will simplify it, but will also remove all debugging precautions, such as the __purecall function, that I described earlier.

So the invocation of the Query function in the code sample above will attempt to read its function pointer at address 0, which will crash the process.

Such bug will be pretty difficult to diagnose in a complex program. Thus, do not use the novtable directive! It's not worth saving just a few machine cycles in the constructor in exchange for less readable debugging errors.

Finally, let's see what happens on the Assembly level if we construct our classes with the novtable directive.

For the sake of simplicity, let's declare all of our classes: S_0, S_1 and S_2, with the novtable directive:

C++[Copy]

struct __declspec(novtable) S_0
{
	//...
};

struct __declspec(novtable) S_1
{
	//...
};

struct __declspec(novtable) S_2 : public S_1, public S_0
{
	//...
}

In this case the creation of the S_2 class:

C++[Copy]

S_2* s2 = new S_2();

Will turn it from its previous Assembly layout, to a more simplified form:

MASM[Copy]

    mov         ecx,38h
    call        qword ptr [__imp_operator new]        ; 'new' operator to allocate memory

    mov         rbx,rax                               ; RBX = 'this' pointer to class S_2
    test        rax,rax
    je          lbl_null_ptr                          ; Skip down if allocation failed

    xorps       xmm0,xmm0                             ; Reset 128-bit XMM0 register to 0
    xor         eax,eax                               ; Reset 64-bit RAX register to 0

    movups      xmmword ptr [rbx],xmm0                ; Zero out S_2 class memory (or 38h bytes total)
    movups      xmmword ptr [rbx+10h],xmm0
    movups      xmmword ptr [rbx+20h],xmm0
    mov         qword ptr [rbx+30h],rax

    mov         dword ptr [rbx+8],0ABCh               ; Set S_1::_n
    mov         dword ptr [rbx+18h],64h               ; Initialize S_2 class members
    mov         qword ptr [rbx+20h],rax
    mov         dword ptr [rbx+28h],234h
    mov         dword ptr [rbx+30h],0EE2h

    jmp         lbl_cont_1

lbl_null_ptr:
    xor         ebx,ebx                               ; RBX = set 'this' pointer to class S_2 to NULL

lbl_cont_1:

So this is the result of using the novtable directive from the Assembly level. It does simplify it a bit.

Control Flow Guard

You might have heard of a security measure that Microsoft calls "Control Flow Guard", or CFG. (The rest of the world calls it "Control Flow Integrity". But it's basically the same thing.)

The birth of the CFG comes on the heels of exploitation of the invocation of virtual functions in a vtable by malicious software.

In a nutshell, if a binary exploit allows an attacker to inject malicious code into a process (say, into a web browser), the way virtual functions are invoked can allow an attacker to pivot such an exploit into the execution of their API of choice.

For instance, in the following invocation of a virtual function:

MASM[Copy]

    mov         rax, qword ptr [rbx]
    lea         rcx, [rbx]
    call        qword ptr [rax]

An attacker can hijack an indirect CALL instruction by setting up the RAX register to point to the address of some system API that they desire to invoke (such as WinExec for instance) and then jump to the address of that CALL instruction to execute that API.

WinExec function is an attractive API for an attacker because it allows them to start a process of their choosing by specifying just two input parameters.

Microsoft have addressed this type of binary exploits by modifying the indirect CALL instructions, that are used to invoke virtual functions, into a short CFG security shim that checks the address of a virtual function against a CFG bitmap, that holds addresses of all allowed functions. If such address is present in the CFG bitmap, the virtual call succeeds. Otherwise, the CFG security shim crashes the process.

CFG must be enabled in the properties of the C++ project before such project is compiled. This is needed for the compiler to add CFG security shims into Assembly code.
To enable CFG in the Visual Studio, go to properties of your C++ project, then navigate to "C/C++", and click on "Code Generation". After that set "Control Flow Guard" option to "Yes (/guard:cf)". Click OK and recompile the project.

Enabling CFG will slow down your compiled code to a small degree.

After that invoking a virtual function:

C++[Copy]

s2->Query(0x71);

Will generate Assembly code with the use of a __guard_dispatch_* CFG security shim. It may look like this:

MASM[Copy]

    mov         rax,qword ptr [rbx+10h]                  ; RAX = address of the vtable
    lea         rcx,[rbx+10h]                            ; RCX = 'this' pointer to the class containing the virtual function
    mov         edx,71h                                  ; RDX = first function input parameter
    mov         rax,qword ptr [rax]                      ; RAX = address of the virtual function to call
    call        qword ptr [__guard_dispatch_icall_fptr]  ; Invocation of the CFG security shim

To understand how the CFG security shim works, let's check its Assembly code:

CFG security shims may have slightly different internal names and implemetation, depending on the type of virtual functions that they guard, and on the version of the C++ compiler that was used for compilation. Additionally, user-mode and kernel-mode CFG security shims are compiled differently.
The code sample below demonstrates a user-mode CFG security shim that was generated by the Visual Studio 2022 C++ compiler at the time of this writing. Note that it may change in the future without any notice.

__guard_dispatch_icall_fptr[Copy]

	; RAX = address of a virtual function to call

    mov         r11,qword ptr [CFG_bitmap]   ; R11 = base of CFG bitmap
    mov         r10,rax
    shr         r10,9                        ; R10 = derive CFG bitmap index as virtual function address divided by 64
    mov         r11,qword ptr [r11+r10*8]    ; R11 = address in CFG bitmap for the virtual function address

    mov         r10,rax
    shr         r10,3                        ; R10 = bit number as virtual function address divided by 8

    test        al,0Fh                       ; Check if virtual function address is aligned on 16 bytes
    jnz         lbl_1                        ; and jump if it is not ...

    bt          r11,r10                      ; CF = R11 & (1 << (R10 % 64))
    jnc         lbl_2                        ; jump if CF==0

    jmp         rax                          ; All good - jump to our virtual function

lbl_1:

    ; If virtual function address is not aligned on 16 bytes
	; (Since most functions are, the code will rarely get to this clause.)

    btr         r10,0                        ; Clear bit 0 in R10
    bt          r11,r10                      ; CF = R11 & (1 << (R10 % 64))
    jnc         lbl_bad                      ; jump if CF==0

lbl_2:

    or          r10,1                        ; Set bit 0 in R10
    bt          r11,r10                      ; CF = R11 & (1 << (R10 % 64))
    jnc         lbl_bad                      ; jump if CF==0

    jmp         rax                          ; All good - jump to our virtual function

lbl_bad:

    mov         r10d,1                       ; CFG tests failed! Crash the process ...
    jmp         crash_process

Note that the code in the CFG shim must be careful not to clobber the nonvolatile CPU registers. Thus, it's using only RAX, R10 and R11 registers.

The exact layout of the CFG bitmap is outside of the scope of this blog post. But just to go over it briefly. The CFG bitmap contains bits that are set for valid addresses of functions in a module. (Each bit in the CFG bitmap represents a 64-byte chunk of the address space.) The code above checks for that and crashes the process if a bit for a virtual function address is not set in the CFG bitmap.

Conclusion

This blog post turned out into another lengthy one. Thus I ran out of space to describe how you can use a decompiler, like Ghidra, to facilitate your reverse engineering work with Assembly code that contains virtual functions and CFG security shims.

I will get back to explaining that in the next post. So stay tuned ...

Blog Post

Reverse Engineering Virtual Functions Compiled With Visual Studio C++ Compiler - Part 1

Understanding virtual function tables, vtable, __purecall, novtable, Control Flow Guard.

Intro

Table Of Contents

Setup

Disassembly

Class Sizes

Virtual Function Table - vtable / vftable

Members In Inherited Classes

__purecall Functions

novtable Directive

Control Flow Guard

Conclusion

Social Media

Contact

Related Articles