Intro
If you're planning to reverse engineer binary code that was compiled with a Microsoft Visual Studio C++ compiler you will most certainly encounter virtual functions. They are not really a rocket science to grasp, but coupled with an inherent inability of modern decompilers to properly translate them, and complicated by the use of the Control Flow Guard feature by the compiler, virtual functions may throw off a novice reverse engineer.
This blog post with hopefully answer the questions posed in the title and will give some guidance for when you encounter these concepts.
Table Of Contents
For an easier navigation here's the table of contents:
- Setup
- Disassembly
- Class Sizes
- Virtual Function Table - vtable / vftable
- Members In Inherited Classes
- __purecall Functions
- novtable Directive
- Control Flow Guard
- Conclusion
Setup
Let's set up our work bench first. The easiest way to go is to create a small C++ project in Visual Studio, add some test code into it and then check how it looks in a compiled form. This way we won't have to deal with the intricacies of our actual binary that we need to reverse engineer, plus we can also adjust parameters for our tests by simply re-compiling this C++ project.
Below are my proposed test classes.
Please don't try to find any logic in this code. The only reason it exists is to illustrate the concept of virtual functions in C++.Additionally, I'm instantiating member variables with random constants for the purpose of being able to distinguish them in the compiled code. If I set them all to 0's, this will complicate the readability of the produced Assembly code.
Class S_0
is what you would call the base class:
struct S_0
{
volatile unsigned int _count = 100; //hex 0x64
const void* _pPtr = NULL;
int _nPtrSet = 0x234;
virtual __declspec(noinline) void Query(int v) = 0;
virtual __declspec(noinline) int AddRef() = 0;
virtual __declspec(noinline) int Release() = 0;
virtual __declspec(noinline) bool SetPtr(const void* p)
{
_pPtr = p;
_nPtrSet = 0x998;
return true;
}
virtual __declspec(noinline) const void* GetPtr()
{
return _nPtrSet != 0x234 ? _pPtr : NULL;
}
int GetPtrSetFlag()
{
return _nPtrSet;
}
};
Note my use of the Microsoft-specific __declspec(noinline)
directive. It's there only to force the compiler not to inline those functions. It may not do it anyway, but I just needed to be sure. Without inlining this will improve the readability of the resulting assembly code.
You would generally not use __declspec(noinline)
in your production code.
Then class S_1
. It is there just so that we can derive our S_2
from it:
struct S_1
{
int _n = 0xabc;
virtual void __declspec(noinline) SetN(int n)
{
_n = n;
}
virtual int __declspec(noinline) GetN()
{
return _n;
}
};
And finally, S_2
is the class that we will be reviewing further:
struct S_2 : public S_1, public S_0
{
int _v = 0xee2;
void __declspec(noinline) Query(int v)
{
SetN(0xFE);
AddRef();
if(GetPtr())
{
_v = v;
}
}
__declspec(noinline) int AddRef()
{
return InterlockedIncrement(&_count);
}
__declspec(noinline) int Release()
{
int v = InterlockedDecrement(&_count);
if(!v)
{
_v = 0xee2;
}
return v;
}
__declspec(noinline) bool SetPtr(const void* p)
{
bool bRes = __super::SetPtr(p);
if(!bRes)
{
_v = 0xff1;
}
return bRes;
}
__declspec(noinline) void IncV()
{
if(_v != 0xee2 &&
_v != 0xff1)
{
_v += 0x23;
}
}
};
As you can see, the main difference in the S_2
class is that it inherits from the S_1
and S_0
classes.
Last thing we need for our test is to create an instance of the S_2
class. I'll do it in a function:
void __declspec(noinline) TestVirtualFunctions()
{
S_2* s2 = new S_2();
s2->Query(0x71);
s2->SetPtr(L"Hello!");
const void* p = s2->GetPtr();
if(p)
{
s2->Release();
}
delete s2;
}
Then let's compile it for an optimized Release
configuration and see what we've got.
I will be compiling it for the x64 (Intel) CPU since that architecture seems to be the most poplar today. But you can build it for ARM64, if you please. It won't change the logic that I will describe below, although the Assembly code will be different.
Disassembly
We don't really need an external debugger at this point to get the disassembly. We can use Visual Studio for that. Simply put a breakpoint at the beginning of the TestVirtualFunctions
and start debugging, by pressing F5. Then when the breakpoint hits, press Ctrl+F11 to show disassembly.
Pressing Ctrl+F11 several times will toggle the disassembly view.
I'll skip the epilogue and the prologue of that function and go straight to the meat of the issue:
mov ecx,38h ; Size of the S_2 class in bytes
call qword ptr [__imp_operator new] ; 'new' operator to allocate memory for the S_2 class
mov rbx,rax ; RBX = 'this' pointer to S_2 class
test rax,rax
je lbl_null_ptr ; Skip down if allocation failed
xor eax,eax ; Construct S_1 class first
mov qword ptr [rbx+8],0ABCh ; S_1::_n
mov qword ptr [rbx+18h],rax ; Reset all members to zeros
lea rcx,[S_1::'vftable' (014000FDD8h)] ; Set up S_1::'vftable' (that will be overwritten later)
mov qword ptr [rbx+20h],rax
mov qword ptr [rbx+28h],rax
mov qword ptr [rbx+30h],rax
mov qword ptr [rbx],rcx ; Since we're constructing S_1, then use 'S_1::vftable'
lea rcx,[S_0::'vftable' (014000FDA8h)] ; Then construct S_0 class
mov qword ptr [rbx+20h],rax
lea rax,[S_2::'vftable' (014000FD90h)]
mov qword ptr [rbx+10h],rcx
mov qword ptr [rbx],rax
lea rax,[S_2::'vftable' (014000FD60h)] ; Finally, construct S_2 class (by overriding S_0 vtable)
mov qword ptr [rbx+10h],rax
mov dword ptr [rbx+18h],64h
mov dword ptr [rbx+28h],234h
mov dword ptr [rbx+30h],0EE2h
jmp lbl_cont_1
lbl_null_ptr:
mov rbx,rax ; RBX = Set 'this' pointer to S_2 class to NULL
lbl_cont_1:
mov rax,qword ptr [rbx+10h] ; RAX = pointer to the vtable for the S_2 class
lea rcx,[rbx+10h] ; RCX = 'this' pointer to the S_2 class
mov edx,71h ; RDX = input parameter
call qword ptr [rax] ; invoking s2->Query(0x71)
mov rax,qword ptr [rbx+10h] ; same logic here (and further down below)
lea rdx,[string L"Hello!"]
lea rcx,[rbx+10h]
call qword ptr [rax+18h] ; invoking s2->SetPtr(L"Hello!")
mov rax,qword ptr [rbx+10h]
lea rcx,[rbx+10h]
call qword ptr [rax+20h] ; invoking s2->GetPtr()
test rax,rax
je lbl_cont_2 ; if previous call returned 0, then skip the next call
mov rax,qword ptr [rbx+10h]
lea rcx,[rbx+10h]
call qword ptr [rax+10h] ; invoking s2->Release()
lbl_cont_2:
For brevity I didn't include Assembly code for the last delete
operator. It is not relevant for our example.
Let's look at the assembly code above, piece by piece.
First, the new
operator allocates memory for the S_2
class. The interesting detail is its size in bytes. Note that it is 0x38
bytes. But why so?
Class Sizes
Let's add some test code to get sizes of each class:
int size_s0 = sizeof(S_0); //0x20
int size_s1 = sizeof(S_1); //0x10
int size_s2 = sizeof(S_2); //0x38
We can see that each class reserves memory to store data for its member variables.
For instance, S_0
has 3 members: _count
of type unsigned int
, _pPtr
of type void*
, and _nPtrSet
of type int
. Thus, remembering that we compiled it for a 64-bit CPU, and assuming that the type int
takes up 4 bytes (with the Microsoft compiler), and type void*
takes up 8 bytes, all 3 members should've used up 16 bytes, or 0x10. But why is sizeof(S_0)
giving us 0x20
(or 32)?
The first part of the answer lies in the way the 64-bit compiler is aligning class members. For performance reasons (and also to comply with some CPU alignment requirements) the compiler aligns every member on an 8-byte boundary. Which means that the 1st and 3rd members of the type int
, or _count
and _nPtrSet
, have a 4-byte gap, or unused padding after them.
But it still gives us only: (4 + 4) + 8 + (4 + 4) = 24. What are the other 8 bytes used for?
Those 8-bytes go into storing a pointer to what is known as the vtable
.
Thus, our layout of the memory for the S_0
class becomes:
+0x0 8-bytes: S_0::'vftable'
+0x8 4-bytes: _count
+0xC 4-bytes: -------------- <padding>
+0x10 8-bytes: _pPtr
+0x18 4-bytes: _nPtrSet
+0x1C 4-bytes: -------------- <padding>
+0x20
And, that is why the size of S_0
is 0x20 bytes.
We can apply similar logic to create a diagram of the size of the S_1
class:
+0x0 8-bytes: S_1::'vftable'
+0x8 4-bytes: _n
+0x0C 4-bytes: -------------- <padding>
+0x10
S_1
has only one member _n
of type int
. But it also has a vtable
, that goes first in its memory layout. Thus, we get its size as 0x10
bytes.
The final piece of the puzzle is the size of the S_2
class. As you can see, its size becomes the sum of the sizes of the S_0
and S_1
classes, plus its own member _v
of type int
and its padding. This is understandable, since S_2
class inherits from both S_0
and S_1
. Or, in other words, it "combines" their capabilities, that we can see in the memory layout of its members.
If we bring up the memory pane in the Visual Studio, and point it to the beginning of S_2
class in memory, we can deduce the following layout of its members:
+0x0 8-bytes: S_2::'vftable'
+0x8 4-bytes: S_1::_n
+0x0C 4-bytes: -------------- <padding>
+0x10 8-bytes: S_0::'vftable'
+0x18 4-bytes: S_0::_count
+0x1C 4-bytes: -------------- <padding>
+0x20 8-bytes: S_0::_pPtr
+0x28 4-bytes: S_0::_nPtrSet
+0x2C 4-bytes: -------------- <padding>
+0x30 4-bytes: S_2::_v
+0x3C 4-bytes: -------------- <padding>
+0x38
Wow, there's a lot to unpack here. We'll get to it later.
But for now, the question becomes - where does S_1
's pointer to its vtable
go in S_2
? We seem to be 8 bytes short there.
The answer lies in another compiler optimization. But before we can get to it we need to review what a vtable
is.
Virtual Function Table - vtable / vftable
Most of the C++ folks call it vtable
(which stands for "Virtual Table"), while Microsoft call it vftable
(which probably means, "Virtual Function Table".) Both names mean the same thing though.
A vtable
is simply an array of virtual function pointers in memory. The order of these functions follows their declarations in the C++ source code, with each function pointer being 8 bytes long (for a 64-bit build.)
The compiler uses this array of function pointers to invoke each function when it needs to.
A careful observer will notice something. Why does it need to store function pointers in a memory array? Why can't it just encode them in a direct CALL
instruction?
If you look at the compiled Assembly code, you will notice that if I call some function that is not virtual: It will get compiled into:Which is just a direct
CALL
instruction with a relative offset to the address of theTestVirtualFunctions
function.Most non-virtual functions are compiled into such a
CALL
instruction.
The answer to the question that was posed above is simple: the vtable
, or an array of function pointers, is needed to account for the polymorphism of the C++ language. Or, when one (virtual) function name may represent different class functions. (Check the polymorphism link for an example.)
To account for a changeable nature of a virtual function, the compiler may invoke it as such:
Note that it takes a pointer to the vtable
from the memory pointed by rbx
, stores it in the rax
register and then invokes a function at the offset 0x10
in the vtable
.
It's hard to overlook how confusing this "pointer to pointer" game becomes with virtual functions. And, that is probably why you are also reading this blog post.
By having such function pointer dereference, a compiler can account for the polymorphism by simply switching the vtable
from one class to another. In case of the assembly code above, all it needs to do is to provide a different pointer value in rbx
register.
Members In Inherited Classes
OK, so let's get back to the layout of the S_2
class in memory. What happened to the vtable
for the inherited S_1
class?
To understand, lets dump the contents of the vtable
s from the disassembly code that we saw above.
Note that there's still an unanswered question of multiple vtable
s there. We'll have to get back to it later. For now though let's just dump their contents.
Like we concluded above, a vtable
is just an array of function pointers.
Offset Address Size Function Address Function Declaration
-----------------------------------------------------------------------------
+0x0 014000FDD8h 8-bytes: 00000001400015f0h void S_1::SetN(int)
+0x8 014000FDE0h 8-bytes: 0000000140001600h int S_1::GetN()
+0x10
I provided virtual addresses for each function pointer in thevtable
(in the "Address" column), as well as the addresses of each function (in the "Function Address" column.) This will help us distinguish between them and to see their layout pattern.You may also use these addresses in the Ctrl+F ("search on page") option to have them highlighted in other areas of this blog post.
Offset Address Size Function Address Function Declaration
-----------------------------------------------------------------------------
+0x0 014000FDA8h 8-bytes: 0000000140004092h __imp__purecall
+0x8 014000FDB0h 8-bytes: 0000000140004092h __imp__purecall
+0x10 014000FDB8h 8-bytes: 0000000140004092h __imp__purecall
+0x18 014000FDC0h 8-bytes: 00000001400015c0h bool S_0::SetPtr(const void*)
+0x20 014000FDC8h 8-bytes: 00000001400015d0h const void* S_0::GetPtr()
+0x28
Note that thevtable
above doesn't have a pointer to theGetPtrSetFlag
function since it is not marked as virtual. That function is encoded with just a directcall
instruction, like I showed here.
Then the strange looking "intermediate" vtable
for the S_2
class:
Offset Address Size Function Address Function Declaration
-----------------------------------------------------------------------------
+0x0 014000FD90h 8-bytes: 00000001400015f0h void S_1::SetN(int)
+0x8 014000FD98h 8-bytes: 0000000140001600h int S_1::GetN()
+0x10 014000FDA0h 8-bytes: 0000000140010f78h -------------- <padding> 1
+0x18 014000FDA8h 8-bytes: 0000000140004092h __imp__purecall
+0x20 014000FDB0h 8-bytes: 0000000140004092h __imp__purecall
+0x28 014000FDB8h 8-bytes: 0000000140004092h __imp__purecall
+0x30 014000FDC0h 8-bytes: 00000001400015c0h bool S_0::SetPtr(const void*)
+0x38 014000FDC8h 8-bytes: 00000001400015d0h const void* S_0::GetPtr()
+0x40 014000FDD0h 8-bytes: 0000000140010f50h -------------- <padding> 1
+0x48 014000FDD8h 8-bytes: 00000001400015f0h void S_1::SetN(int)
+0x50 014000FDE0h 8-bytes: 0000000140001600h int S_1::GetN()
+0x58
And finally:
Offset Address Size Function Address Function Declaration
-----------------------------------------------------------------------------
+0x0 014000FD60h 8-bytes: 0000000140001610h void S_2::Query(int v)
+0x8 014000FD68h 8-bytes: 0000000140001660h int S_2::AddRef()
+0x10 014000FD70h 8-bytes: 0000000140001670h int S_2::Release()
+0x18 014000FD78h 8-bytes: 0000000140001690h bool S_2::SetPtr(const void*)
+0x20 014000FD80h 8-bytes: 00000001400015d0h const void* S_0::GetPtr()
+0x28 014000FD88h 8-bytes: 0000000140010f28h -------------- <padding> 1
+0x30 014000FD90h 8-bytes: 00000001400015f0h void S_1::SetN(int)
+0x38 014000FD98h 8-bytes: 0000000140001600h int S_1::GetN()
+0x40
Let's try to understand what's going on with those function pointers:
- An observant reader may have noticed that every following
vtable
contains the contents of all previousvtable
s. That is why I used addresses for each function pointer (in the "Address" column.) You can use them to understand the placement of eachvtable
in memory.We can combine the layouts of all
vtable
s above into one:All vftables combined[Copy]Offset Address Size Function Address Function Declaration ----------------------------------------------------------------------------- S_2::'vftable' (014000FD60h) +0x0 014000FD60h 8-bytes: 0000000140001610h void S_2::Query(int v) +0x8 014000FD68h 8-bytes: 0000000140001660h int S_2::AddRef() +0x10 014000FD70h 8-bytes: 0000000140001670h int S_2::Release() +0x18 014000FD78h 8-bytes: 0000000140001690h bool S_2::SetPtr(const void*) +0x20 014000FD80h 8-bytes: 00000001400015d0h const void* S_0::GetPtr() +0x28 014000FD88h 8-bytes: 0000000140010f28h -------------- <padding> 1 S_2::'vftable' (014000FD90h) - intermediate +0x30 014000FD90h 8-bytes: 00000001400015f0h void S_1::SetN(int) +0x38 014000FD98h 8-bytes: 0000000140001600h int S_1::GetN() +0x40 014000FDA0h 8-bytes: 0000000140010f78h -------------- <padding> 1 S_0::'vftable' (014000FDA8h) +0x48 014000FDA8h 8-bytes: 0000000140004092h __imp__purecall +0x50 014000FDB0h 8-bytes: 0000000140004092h __imp__purecall +0x58 014000FDB8h 8-bytes: 0000000140004092h __imp__purecall +0x60 014000FDC0h 8-bytes: 00000001400015c0h bool S_0::SetPtr(const void*) +0x68 014000FDC8h 8-bytes: 00000001400015d0h const void* S_0::GetPtr() +0x70 014000FDD0h 8-bytes: 0000000140010f50h -------------- <padding> 1 S_1::'vftable' (014000FDD8h) +0x78 014000FDD8h 8-bytes: 00000001400015f0h void S_1::SetN(int) +0x80 014000FDE0h 8-bytes: 0000000140001600h int S_1::GetN() +0x88
As you can see, these
vtable
s follow each other in a sequential order in memory.In a nutshell, the reason C++ compiler chose to build the final
vtable
in a such way was to comply with the C++ spec for constructing inherited classes.You can turn off this behavior by using the
novtable
directive. - The order of virtual functions in each
vtable
matches the order at which they are declared in their respective classes. (SeeS_0
,S_1
orS_2
class declarations.)Again, keep in mind that non-virtual functions, such as
S_0::GetPtrSetFlag
, are not included in thevtable
. - An interesting observation is that a padding pointer to a
QWORD
value 1 seems to delimit onevtable
from the next one in the same class. It appears at the end of avtable
if anothervtable
, that is a part of the class inheritance, follows it in a sequence.This is an undocumented behavior, so you shouldn't rely on it.
- Note that the compiler is building and including virtual functions even if they are not used anywhere in the code. An example of such function is
int S_1::GetN()
. As you can see, it is not called anywhere in our code, but the compiler still included it in thevtable
, as well as compiled its implementation.This is different from regular (non-virtual) functions that may be optimized away if nothing is calling them.
__purecall Functions
You might have noticed in the vtable
(from the example above) some strange functions with the name __imp__purecall
.
I described the __imp
prefix in a separate blog post. So I won't touch it here.
Raymond Chen had already described the nature of the "purecall" functions. So please read his blog post, titled "What is __purecall?".
In a nutshell, when C++ objects with inheritance are instantiated, they are constructed in stages. For example. In case of the S_2
class, in stage one the constructor builds up its S_0
class first, then S_1
and then the final S_2
class. This process doesn't happen atomically, and thus for a very brief period the final vtable
for the class S_2
needs to be filled up with something. That something becomes those __imp__purecall
functions.
__purecall functions are nothing more than just placeholders for a debugging assertion, and an abort()
, if someone manages to invoke them in a vtable
that is not yet fully constructed.
Or, if __purecall
function is invoked in a release build, the process will self-terminate as a security measure.
Let's modify our S_0
class to demonstrate it:
struct S_0
{
S_0()
{
invokeQuery(); // BAD! DO NOT DO IT!
}
void invokeQuery()
{
Query(1); // This will invoke a __purecall function and abort the process !!!
}
virtual __declspec(noinline) void Query(int v) = 0;
//...
};
As you can see, we're invoking a virtual function from a base constructor in S_0
, before it is constructed in a derived S_2
class. In terms of C++, "there's nothing there to invoke". But in terms of a low-level Assembly language, there must be something in the vtable
in memory for the Query
function pointer. To facilitate debugging, C++ compiler adds a __purecall
function as a placeholder before a pointer to the actual Query
function is inserted into the vtable
. That placeholder function is invoked in the example above, that will display a debugging assertion message in a debugging build, or will simply invoke the abort()
function in a release build.
An interesting observation is the way modern Visual Studio C++ compiler handles invocations of virtual functions from within the class constructor. (This is generally a bad idea, due to multiple ways to cause confusion!)For the sake of the experiment, let's see what happens on the Assembly level if we decide to do it.
If we modify our
S_2
class to invoke a virtual function from its constructor:The
Query
function is invoked as a directcall
instruction, technically disregarding the wholevtable
concept. (I'm using machine opcodes for the listings below to distinguish between the encodings of differentcall
instructions.)x64[Copy]BA DA 00 00 00 mov edx, 0DAh ; RDX = first function parameter, or 0xDA 48 8D 4B 10 lea rcx, [rbx+10h] ; RCX = 'this' pointer to class S_2 E8 84 FE FF FF call S_2::Query (0140001610h) ; direct call - encoding starts with E8, followed by a relative offset
But if you invoke the same
Query
function from outside of a constructor (or destructor) it will be encoded as an indirectcall
instruction, using the classvtable
:x64[Copy]48 8B 43 10 mov rax, qword ptr [rbx+10h] ; RAX = pointer to the 'vtable' for class S_2 BA 71 00 00 00 mov edx, 0DAh ; RDX = first function parameter, or 0xDA 48 8D 4B 10 lea rcx, [rbx+10h] ; RCX = 'this' pointer to class S_2 FF 10 call qword ptr [rax] ; indirect call - encoded as: FF 10
My guess is that this is another security measure to prevent bugs when using a
vtable
that may not be fully constructed, or if it is being torn down from a destructor. By using a directcall
instruction, the compiler removes any ambiguity.
novtable Directive
Another interesting test is to see what happens if we use the novtable
directive with a class.
Remember our __purecall
function test above? Let's modify it to use novtable
:
struct __declspec(novtable) S_0
{
S_0()
{
invokeQuery(); // BAD! DO NOT DO IT!
}
void invokeQuery()
{
Query(1); // This will generate a null-pointer dereference!
}
virtual __declspec(noinline) void Query(int v) = 0;
//...
};
The addition of __declspec(novtable)
will remove the staged construction of the vtable
, which will simplify it, but will also remove all debugging precautions, such as the __purecall
function, that I described earlier.
So the invocation of the Query
function in the code sample above will attempt to read its function pointer at address 0, which will crash the process.
Such bug will be pretty difficult to diagnose in a complex program. Thus, do not use the novtable
directive! It's not worth saving just a few machine cycles in the constructor in exchange for less readable debugging errors.
Finally, let's see what happens on the Assembly level if we construct our classes with the novtable
directive.
For the sake of simplicity, let's declare all of our classes: S_0
, S_1
and S_2
, with the novtable
directive:
struct __declspec(novtable) S_0
{
//...
};
struct __declspec(novtable) S_1
{
//...
};
struct __declspec(novtable) S_2 : public S_1, public S_0
{
//...
}
In this case the creation of the S_2
class:
Will turn it from its previous Assembly layout, to a more simplified form:
mov ecx,38h
call qword ptr [__imp_operator new] ; 'new' operator to allocate memory
mov rbx,rax ; RBX = 'this' pointer to class S_2
test rax,rax
je lbl_null_ptr ; Skip down if allocation failed
xorps xmm0,xmm0 ; Reset 128-bit XMM0 register to 0
xor eax,eax ; Reset 64-bit RAX register to 0
movups xmmword ptr [rbx],xmm0 ; Zero out S_2 class memory (or 38h bytes total)
movups xmmword ptr [rbx+10h],xmm0
movups xmmword ptr [rbx+20h],xmm0
mov qword ptr [rbx+30h],rax
mov dword ptr [rbx+8],0ABCh ; Set S_1::_n
mov dword ptr [rbx+18h],64h ; Initialize S_2 class members
mov qword ptr [rbx+20h],rax
mov dword ptr [rbx+28h],234h
mov dword ptr [rbx+30h],0EE2h
jmp lbl_cont_1
lbl_null_ptr:
xor ebx,ebx ; RBX = set 'this' pointer to class S_2 to NULL
lbl_cont_1:
So this is the result of using the novtable
directive from the Assembly level. It does simplify it a bit.
Control Flow Guard
You might have heard of a security measure that Microsoft calls "Control Flow Guard", or CFG. (The rest of the world calls it "Control Flow Integrity". But it's basically the same thing.)
The birth of the CFG comes on the heels of exploitation of the invocation of virtual functions in a vtable
by malicious software.
In a nutshell, if a binary exploit allows an attacker to inject malicious code into a process (say, into a web browser), the way virtual functions are invoked can allow an attacker to pivot such an exploit into the execution of their API of choice.
For instance, in the following invocation of a virtual function:
An attacker can hijack an indirect CALL
instruction by setting up the RAX
register to point to the address of some system API that they desire to invoke (such as WinExec
for instance) and then jump to the address of that CALL
instruction to execute that API.
WinExec
function is an attractive API for an attacker because it allows them to start a process of their choosing by specifying just two input parameters.
Microsoft have addressed this type of binary exploits by modifying the indirect CALL
instructions, that are used to invoke virtual functions, into a short CFG security shim that checks the address of a virtual function against a CFG bitmap, that holds addresses of all allowed functions. If such address is present in the CFG bitmap, the virtual call succeeds. Otherwise, the CFG security shim crashes the process.
CFG must be enabled in the properties of the C++ project before such project is compiled. This is needed for the compiler to add CFG security shims into Assembly code.To enable CFG in the Visual Studio, go to properties of your C++ project, then navigate to "C/C++", and click on "Code Generation". After that set "Control Flow Guard" option to "Yes (/guard:cf)". Click OK and recompile the project.
Enabling CFG will slow down your compiled code to a small degree.
After that invoking a virtual function:
Will generate Assembly code with the use of a __guard_dispatch_*
CFG security shim. It may look like this:
mov rax,qword ptr [rbx+10h] ; RAX = address of the vtable
lea rcx,[rbx+10h] ; RCX = 'this' pointer to the class containing the virtual function
mov edx,71h ; RDX = first function input parameter
mov rax,qword ptr [rax] ; RAX = address of the virtual function to call
call qword ptr [__guard_dispatch_icall_fptr] ; Invocation of the CFG security shim
To understand how the CFG security shim works, let's check its Assembly code:
CFG security shims may have slightly different internal names and implemetation, depending on the type of virtual functions that they guard, and on the version of the C++ compiler that was used for compilation. Additionally, user-mode and kernel-mode CFG security shims are compiled differently.The code sample below demonstrates a user-mode CFG security shim that was generated by the Visual Studio 2022 C++ compiler at the time of this writing. Note that it may change in the future without any notice.
; RAX = address of a virtual function to call
mov r11,qword ptr [CFG_bitmap] ; R11 = base of CFG bitmap
mov r10,rax
shr r10,9 ; R10 = derive CFG bitmap index as virtual function address divided by 64
mov r11,qword ptr [r11+r10*8] ; R11 = address in CFG bitmap for the virtual function address
mov r10,rax
shr r10,3 ; R10 = bit number as virtual function address divided by 8
test al,0Fh ; Check if virtual function address is aligned on 16 bytes
jnz lbl_1 ; and jump if it is not ...
bt r11,r10 ; CF = R11 & (1 << (R10 % 64))
jnc lbl_2 ; jump if CF==0
jmp rax ; All good - jump to our virtual function
lbl_1:
; If virtual function address is not aligned on 16 bytes
; (Since most functions are, the code will rarely get to this clause.)
btr r10,0 ; Clear bit 0 in R10
bt r11,r10 ; CF = R11 & (1 << (R10 % 64))
jnc lbl_bad ; jump if CF==0
lbl_2:
or r10,1 ; Set bit 0 in R10
bt r11,r10 ; CF = R11 & (1 << (R10 % 64))
jnc lbl_bad ; jump if CF==0
jmp rax ; All good - jump to our virtual function
lbl_bad:
mov r10d,1 ; CFG tests failed! Crash the process ...
jmp crash_process
Note that the code in the CFG shim must be careful not to clobber the nonvolatile CPU registers. Thus, it's using onlyRAX
,R10
andR11
registers.
The exact layout of the CFG bitmap is outside of the scope of this blog post. But just to go over it briefly. The CFG bitmap contains bits that are set for valid addresses of functions in a module. (Each bit in the CFG bitmap represents a 64-byte chunk of the address space.) The code above checks for that and crashes the process if a bit for a virtual function address is not set in the CFG bitmap.
Conclusion
This blog post turned out into another lengthy one. Thus I ran out of space to describe how you can use a decompiler, like Ghidra, to facilitate your reverse engineering work with Assembly code that contains virtual functions and CFG security shims.
I will get back to explaining that in the next post. So stay tuned ...