This article contains functions and features that are not documented by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.
Intro
If you've ever looked into compiled Assembly language code for your native application, done with the Microsoft Visual Studio, you may notice that some function calls:
May have this __imp_
prefix added to them:
But what is it? And why is it there?
This blog post will shed some light on what that __imp_
prefix is and how you can take advantage of its presence.
Some Theory
Very briefly. When Microsoft compiler builds your native application for the Intel x86/x64 architecture,
it needs to replace each function call in your source code with the CPU instruction, that is named unsurprisingly, call
. The way this instruction works, it requires
the address of the function in memory to make a jump to. For the Intel architecture the following options are available:
CALL rel32
- (Opcode:E8, n0, n1, n2, n3
) executes a relative short call with displacement relative to the next instruction.CALL ptr [addr]
- (Opcode:FF, 15, a0, a1, a2, a3
) executes an absolute indirect long call to an address that is read from memory.
The issue is that during the build stage the compiler has no idea where the address of the function will be.
It may be somewhere in the same module, in which case it can use the short call; or it can be in a module that is mapped during
the load-time linking, for which
the long call will be required. Unfortunately though, because those two call
instructions have different length, the compiler
cannot just leave a gap and fill them in later. It couldn't pick the longer instruction and then shrink the gap either. If you refer to the Intel CPU instruction set, many
of its instructions rely on relative offsets, that will be broken if we decide to shrink gaps in the machine code.
You also have to remember that when engineers were initially dealing with this dilemma, the hardware was very slow. So picking one option over another would either affect the compilation time (by making a compiler run several passes), or it would affect the load-time linking, or it can make the resulting code run slower.
They chose the following compromise:
- The compiler will initially assume a short call, since statistically there should be more of those.
- The linker will then fill in the relative addresses in all short call instructions, but if any of them happen to need load-time linking
it will create a separate stub with a
JMP
instruction to accommodate for that.On the Assembly level it will look as such. The initial short call will refer to the JMP stub:
And the JMP stub itself will be just a long jump, doing steps very similar to the long call instruction:
The downside in this case is that we are wasting CPU cycles on the short call and then on the long jump, instead of just doing the long call initially. This won't be a big deal with the modern hardware, but, remember that during the time when this was devised, CPUs were much, much slower than now. So it mattered.
To address this issue, Microsoft compiler allowed developers to mark declarations of imported functions with the
__declspec(dllimport)
directive. So if you knew that yourImportedFunc
was imported from an outside module, you'd mark it as such:This will instruct the compiler to treat it as the function requiring relocation during the load-time linking and to allocate the long call instruction for it, thus bypassing the need for the JMP stub.
Note that the modern compilers don't rely on the
__declspec(dllimport)
directive as much if you have the/GL (Whole Program Optimization)
enabled (as well as the/LTCG (Link-time code generation)
option.)
The __imp_ Prefix
You might have noticed in the JMP stub that same __imp_
prefix again.
So let me explain where that comes from.
The process of load-time linking to some imported module includes the stage, called relocation, where the addresses of functions in the loaded (or mapped) module need to be added into all long call CPU instructions in the code.
Stemming from the efficiency of such operation, writing addresses of imported functions all over the module's address space in RAM
(or where the long call instructions are located) would decimate the cache, and thus slow down the load-time linking process.
Instead, the PE file has addresses of all imported functions grouped into a linear array of pointers, that is collectively called
"Import Address Table", or IAT, for short.
It can be accessed via the IMAGE_DIRECTORY_ENTRY_IAT
in the PE header.
For the Microsoft compiler though, the IAT is represented as an array of global variables, with each element bearing the name of the imported function,
preceded by the __imp_
prefix.
My guess is that __imp_
stands for "implementation", or maybe for "implied", and probably has nothing to do with
the mystical character in the video game.
And that's where that __imp_
comes from.
Note that the __imp_
prefix is undocumented by Microsoft, even though they have been using it since early days of their compilers.
In most cases though, you won't need to be concerned with any of this from your C++ code, as the modern compilers are smart enough to pick the right
CPU call
instruction with high efficiency.
But there are some unique situations where we can use the __imp_
prefix to our advantage. Let me follow up with those.
Coding in Assembly
When coding in the Assembly language, the use of the __imp_
prefix for the imported function calls may be more important, and, to be honest, quite a few low-level
developers seem to neglect it. Let me show it in an example.
First, let's make our three test functions with different calling conventions and place them in a DLL:
extern "C" __declspec(dllexport) int __cdecl TestFunc1(int v, int b)
{
return v + b;
}
extern "C" __declspec(dllexport) int __stdcall TestFunc2(int v, int b)
{
return v - b;
}
extern "C" __declspec(dllexport) int __fastcall TestFunc3(int v, int b)
{
return v * b;
}
Note that the calling conventions will matter only for thex86
build. For thex64
build they all will be treated as the__fastcall
.
Then, say, for whatever reason we would want to call them from our function written in Assembly language. In this case I will have to separate their implementations into each bitness of their code.
64-bit Implementation
The x64
implementation is much easier, so let's start from it. Also, we don't need to test all 3 functions, since they will be called exactly the same way.
Thus, let's just pick TestFunc1
.
So let's add the following into the .asm
file:
extrn TestFunc1 : PROC ; note that TestFunc1 is defined as PROC
extrn __imp_TestFunc1 : QWORD ; while __imp_TestFunc1 is defined as QWORD!
.code
ALIGN 16
asm_func PROC
sub rsp, 28h ; set up shadow stack for x64 calling convention
mov rdx, 5
mov rcx, 10
call TestFunc1 ; uses slower JMP stub
mov rdx, 5
mov rcx, 10
call __imp_TestFunc1 ; Direct call
add rsp, 28h ; Restore stack
ret
asm_func ENDP
END
There are several points of interest to review in the code above:
- Compile and walk the
asm_func
function with the Visual Studio debugger. Then step into theTestFunc1
call. In it you will see a JMP stub:But when you step into the
__imp_TestFunc1
call from ourasm_func
function, it will lead straight to theTestFunc1
function in our imported DLL.So by using the
__imp_
prefix we are saving an unnecessaryjmp
from the JMP stub. This may not be a big deal in C++, but since you're coding in Assembly, this may make a difference for you. - Note that
TestFunc1
is defined asPROC
on top, while__imp_TestFunc1
asQWORD
. This is important! If you define__imp_TestFunc1
asPROC
, the MASM compiler will treat it as the beginning of the code you want to call. This will cause the crash.
32-bit Implementation
The x86
implementation is a little different. Let's review it next.
So let's code the following in the .asm
file:
.686p
.model flat, C
OPTION LANGUAGE: SYSCALL ; needed to prevent prepending of names with _
EXTERN __imp__TestFunc1 : DWORD ; __cdecl
EXTERN __imp__TestFunc2@8 : DWORD ; __stdcall
EXTERN @TestFunc3@8 : PROC
EXTERN __imp_@TestFunc3@8 : DWORD ; __fastcall
OPTION LANGUAGE: C ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
EXTERN TestFunc1 : PROC
EXTERN TestFunc2@8 : PROC
.code
ALIGN 8
asm_func PROC
; __cdecl calling convention
push 5
push 10
call TestFunc1 ; uses slower JMP stub
add esp,8
push 5
push 10
call __imp__TestFunc1 ; Direct call
add esp,8
; __stdcall calling convention
push 5
push 10
call TestFunc2@8 ; uses slower JMP stub
push 5
push 10
call __imp__TestFunc2@8 ; Direct call
; __fastcall calling convention
mov edx, 5
mov ecx, 10
call @TestFunc3@8 ; uses slower JMP stub
mov edx, 5
mov ecx, 10
call __imp_@TestFunc3@8 ; Direct call
ret
asm_func ENDP
END
The code is very similar to the x64
implementation, except that all 3 calling conventions need different handling.
Thus the top of the file is dedicated to properly formatting for each function call, using the naming decoration for each.
Then just as for the x64
implementation, note that each call to an external function not prepended with the __imp_
prefix
will be first redirected to a JMP stub. So it may be prudent to avoid that by using exported function names from the IAT,
by prepending them with the __imp_
prefix. In that case the call will go directly to the imported function.
Overriding System Function Calls
This is more of an obfuscation practice than anything else. Or, such technique may be used by an antivirus or security product to install function trampolines.
Since Microsoft compiler keeps pointers to all imported functions in the IAT, we can override them with our own functions.
For instance, if we take the CloseHandle
system API
that is normally used to close a handle:
We can use it's address in the IAT to put our own trampoline in. For our silly purpose, let's just make it show a message box.
For that we need to code it to match the declaration of the original
CloseHandle
function:
BOOL __stdcall CloseHandleOverride(HANDLE hHandle)
{
MessageBox(NULL, L"Hello world from CloseHandle!", L"Message", MB_OK);
return FALSE;
}
Then when our app starts, we need to do the actual override in the IAT. For that we will use the __imp_CloseHandle
global
variable that the compiler will hold for us. It will contain the address of the mapped CloseHandle
import in our process.
We can reference it by declaring it as such:
For the x64
build of the project, we can then do the following:
DWORD dwOldProtect;
VirtualProtect(&__imp_CloseHandle, sizeof(__imp_CloseHandle), PAGE_READWRITE, &dwOldProtect);
__imp_CloseHandle = CloseHandleOverride;
VirtualProtect(&__imp_CloseHandle, sizeof(__imp_CloseHandle), dwOldProtect, &dwOldProtect);
Note that the memory page that contains IAT is initially set to read-only for security reasons. Thus we will need to change its protection status to PAGE_READWRITE
with the
VirtualProtect
function first.
And then revert it back after our override.
After that, calling CloseHandle
anywhere in our process:
Will result in our CloseHandleOverride
being called instead:
Note that this example is totally impractical, and in a real override you would save the original address of the CloseHandle
function and call it at the end.
I didn't want to complicate this simple test and thus didn't do it.
32-bit Variant
In x86
, or 32-bit variant, the mangling of function names is somewhat more complex. The calling conventions used by the Microsoft C compiler
are different for x86
, thus it uses additional decorations for function name mangling, as such:
__cdecl
- uses the underscore_
prefix (except when functions that use C linkage are exported). Ex:_ImportedFunc
__stdcall
- uses the underscore_
prefix and the@n
suffix, wheren
is the size of the arguments. Ex:_ImportedFunc@8
__fastcall
- uses the underscore@
prefix and the@n
suffix, wheren
is the size of the arguments. Ex:@ImportedFunc@8
You can check this naming scheme by loading the compiled binary file into the WinAPI Search app.
So in case of an x86
build, we can't just use __imp_CloseHandle
global variable like we did for x64
. We need to slightly adjust it
using the undocumented /ALTERNATENAME:
linker command:
#define ALTNAME(x,n) __pragma(comment(linker, "/ALTERNATENAME:__imp_" #x "=__imp__" #x "@" #n))
ALTNAME(CloseHandle, 4)
#define impCloseHandle _imp_CloseHandle // __imp__CloseHandle@4
extern "C" extern void* impCloseHandle;
And then use that preprocessor definition in the C+ code in a similar way as we did for the x64
build:
DWORD dwOldProtect;
VirtualProtect(&impCloseHandle, sizeof(impCloseHandle), PAGE_READWRITE, &dwOldProtect);
impCloseHandle = CloseHandleOverride;
VirtualProtect(&impCloseHandle, sizeof(impCloseHandle), dwOldProtect, &dwOldProtect);
This will give us the same silly override of the CloseHandle
system function in our process.
Delay Load Imports
One other internal use of the __imp_
prefix, that Rbmm pointed out to me, is for marking the
Delay Load Imports.
Delayed loading is a very old concept when a module (or DLL) is loaded only upon request, or when one of its imported functions is used, versus doing it during
the load-time linking stage.
You can define a DLL to be delay loaded via the /DELAYLOAD
command line switch, or through the project properties window:
In that case the compiler creates global stub functions with names of the original functions in the delay loaded module prepended with the __imp_load_
prefix.
Those stub function pointers are originally stored in the IAT. But when a delay loaded function is loaded upon the first request, its pointer in the IAT is overwritten by the actual
function pointer in the loaded module. Thus, any subsequent calls to that function will resolve to the actual function being called.
For instance, if the TestFunc1
was from a delay loaded module, the compiler will create a stub function with the name __imp_load_TestFunc1
whose pointer
will be originally stored in the IAT.
Thus before that delay-load function is loaded, __imp_TestFunc1
will point to __imp_load_TestFunc1
. So we can use the following logic to determine if some specific
delay-load function was already loaded:
extern "C" extern void* __imp_load_TestFunc1;
extern "C" extern void* __imp_TestFunc1;
bool Is_TestFunc1_Loaded()
{
//RETURN: true if TestFunc1 delay-load function is loaded
return &__imp_load_TestFunc1 != __imp_TestFunc1;
}
#include <assert.h>
int main()
{
assert(!Is_TestFunc1_Loaded());
TestFunc1(10, 5);
assert(Is_TestFunc1_Loaded());
}
Or, you can just use the following macro:
#define IS_DELAY_LOAD_FUNC_LOADED(f) (&__imp_load_ ##f != __imp_ ##f)
int main()
{
assert(!IS_DELAY_LOAD_FUNC_LOADED(TestFunc1));
TestFunc1(10, 5);
assert(IS_DELAY_LOAD_FUNC_LOADED(TestFunc1));
}
So you can do something like this. First, let's define a preprocessor macro that will check it for us:
#define CHECK_DELAY_LOAD(f) extern "C" extern void* __imp_load_ ##f; \
void test_delay_load ##f(){(__imp_load_ ##f) ? 1 : 0; }
And then you can use this macro as such on an imported function name:
Note that you need to use the CHECK_DELAY_LOAD
macro on a global scope, or outside of any function definition.
With the check above you will get a linker error if TestFunc1
is not in the module declared for delayed loading:
error LNK2001: unresolved external symbol __imp_load_TestFunc1
I admit that this is not a very helpful error message. But if you search your solution for TestFunc1
, it should clue you in to the cause of the problem.
For that, make sure to keep the comment in place for the CHECK_DELAY_LOAD
macro, as I showed above.
Static Linking Tricks
Another interesting situation that Rbmm showed me was utilization of the __imp_
prefix with static linking to system functions.
Say, if you have some API that is not available in all versions of the operating system.
Let's take OpenThemeDataForDpi
for instance.
It is available on "Windows 10, version 1703" or later.
So just coding something like this:
HTHEME hTheme = OpenThemeDataForDpi(HWND_DESKTOP, L"TaskbarPearl", GetDpiForWindow(HWND_DESKTOP));
will result in the module not being able to load on any operating system prior to Windows 10, version 1703.
As a solution, you can link to it dynamically, or during run-time. But there's also another clever way to do it.
First, declare an external pointer to the function in question in the IAT and set it to 0:
Note that we're using the __imp_
prefix for that.
Then use the following construct to call the OpenThemeDataForDpi
function:
HTHEME GetStartButtonTheme()
{
//RETURN:
// = Handle for the Start Button HTHEME object
// = NULL if error
#define NOT_SUPPORTED (FARPROC)(-1)
//Use the singleton approach
if (!__imp_OpenThemeDataForDpi)
{
__imp_OpenThemeDataForDpi = GetProcAddress(LoadLibrary(L"uxtheme.dll"), "OpenThemeDataForDpi");
if (!__imp_OpenThemeDataForDpi)
{
//Operating system doesn't support this API
__imp_OpenThemeDataForDpi = NOT_SUPPORTED;
return NULL;
}
}
else if (__imp_OpenThemeDataForDpi == NOT_SUPPORTED)
{
//Previously tested - API not supported
return NULL;
}
return OpenThemeDataForDpi(HWND_DESKTOP, L"TaskbarPearl", GetDpiForWindow(HWND_DESKTOP));
}
The code above will use the __imp_OpenThemeDataForDpi
global variable in the IAT to store the pointer to the OpenThemeDataForDpi
function.
And in case the operating system doesn't support that API, our module will load just fine but our GetStartButtonTheme
function will return NULL.
Conclusion
These are just smaller nuances about the __imp_
prefix that may be important for the low-level developers.
It was originally pointed out to me by Rbmm, and I decided to throw it into a blog post to share it with everyone.
In case you know of some other clever uses of the __imp_
prefix, please don't hesitate to put them in the comments below.