Blog Post

Intricacies of Microsoft Compilers

The case of a curious __imp_.

Intricacies of Microsoft Compilers - The case of a curious __imp_.
This article contains functions and features that are not documented by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.

Intro

If you've ever looked into compiled Assembly language code for your native application, done with the Microsoft Visual Studio, you may notice that some function calls:

C++
CloseHandle(hHandle);

May have this __imp_ prefix added to them:

x86-64
call    qword ptr [__imp_CloseHandle] 

But what is it? And why is it there?

This blog post will shed some light on what that __imp_ prefix is and how you can take advantage of its presence.

Some Theory

Very briefly. When Microsoft compiler builds your native application for the Intel x86/x64 architecture, it needs to replace each function call in your source code with the CPU instruction, that is named unsurprisingly, call. The way this instruction works, it requires the address of the function in memory to make a jump to. For the Intel architecture the following options are available:

  • CALL rel32 - (Opcode: E8, n0, n1, n2, n3) executes a relative short call with displacement relative to the next instruction.
  • CALL ptr [addr] - (Opcode: FF, 15, a0, a1, a2, a3) executes an absolute indirect long call to an address that is read from memory.

The issue is that during the build stage the compiler has no idea where the address of the function will be. It may be somewhere in the same module, in which case it can use the short call; or it can be in a module that is mapped during the load-time linking, for which the long call will be required. Unfortunately though, because those two call instructions have different length, the compiler cannot just leave a gap and fill them in later. It couldn't pick the longer instruction and then shrink the gap either. If you refer to the Intel CPU instruction set, many of its instructions rely on relative offsets, that will be broken if we decide to shrink gaps in the machine code.

You also have to remember that when engineers were initially dealing with this dilemma, the hardware was very slow. So picking one option over another would either affect the compilation time (by making a compiler run several passes), or it would affect the load-time linking, or it can make the resulting code run slower.

They chose the following compromise:

  • The compiler will initially assume a short call, since statistically there should be more of those.
  • The linker will then fill in the relative addresses in all short call instructions, but if any of them happen to need load-time linking it will create a separate stub with a JMP instruction to accommodate for that.

    On the Assembly level it will look as such. The initial short call will refer to the JMP stub:

    x86-64 Machine Code
    E8 19 00 00 00       call    ImportedFunc

    And the JMP stub itself will be just a long jump, doing steps very similar to the long call instruction:

    x86-64 Machine Code
    ; ImportedFunc PROC
    FF 25 62 0E 00 00    jmp     qword ptr [__imp_ImportedFunc]

    The downside in this case is that we are wasting CPU cycles on the short call and then on the long jump, instead of just doing the long call initially. This won't be a big deal with the modern hardware, but, remember that during the time when this was devised, CPUs were much, much slower than now. So it mattered.

    To address this issue, Microsoft compiler allowed developers to mark declarations of imported functions with the __declspec(dllimport) directive. So if you knew that your ImportedFunc was imported from an outside module, you'd mark it as such:

    C++
    extern "C" __declspec(dllimport) int __cdecl ImportedFunc(int a, int b);

    This will instruct the compiler to treat it as the function requiring relocation during the load-time linking and to allocate the long call instruction for it, thus bypassing the need for the JMP stub.

    Note that the modern compilers don't rely on the __declspec(dllimport) directive as much if you have the /GL (Whole Program Optimization) enabled (as well as the /LTCG (Link-time code generation) option.)
    C++ Project Page - Whole Program Optimization
    Visual Studio 2019, C++ project properties window.

The __imp_ Prefix

You might have noticed in the JMP stub that same __imp_ prefix again. So let me explain where that comes from.

The process of load-time linking to some imported module includes the stage, called relocation, where the addresses of functions in the loaded (or mapped) module need to be added into all long call CPU instructions in the code.

Stemming from the efficiency of such operation, writing addresses of imported functions all over the module's address space in RAM (or where the long call instructions are located) would decimate the cache, and thus slow down the load-time linking process. Instead, the PE file has addresses of all imported functions grouped into a linear array of pointers, that is collectively called "Import Address Table", or IAT, for short. It can be accessed via the IMAGE_DIRECTORY_ENTRY_IAT in the PE header.

For the Microsoft compiler though, the IAT is represented as an array of global variables, with each element bearing the name of the imported function, preceded by the __imp_ prefix.

My guess is that __imp_ stands for "implementation", or maybe for "implied", and probably has nothing to do with the mystical character in the video game.

And that's where that __imp_ comes from.

Note that the __imp_ prefix is undocumented by Microsoft, even though they have been using it since early days of their compilers.

In most cases though, you won't need to be concerned with any of this from your C++ code, as the modern compilers are smart enough to pick the right CPU call instruction with high efficiency.

But there are some unique situations where we can use the __imp_ prefix to our advantage. Let me follow up with those.

Coding in Assembly

When coding in the Assembly language, the use of the __imp_ prefix for the imported function calls may be more important, and, to be honest, quite a few low-level developers seem to neglect it. Let me show it in an example.

First, let's make our three test functions with different calling conventions and place them in a DLL:

C++
extern "C" __declspec(dllexport) int __cdecl TestFunc1(int v, int b)
{
    return v + b;
}

extern "C" __declspec(dllexport) int __stdcall TestFunc2(int v, int b)
{
    return v - b;
}

extern "C" __declspec(dllexport) int __fastcall TestFunc3(int v, int b)
{
    return v * b;
}
Note that the calling conventions will matter only for the x86 build. For the x64 build they all will be treated as the __fastcall.

Then, say, for whatever reason we would want to call them from our function written in Assembly language. In this case I will have to separate their implementations into each bitness of their code.

64-bit Implementation

The x64 implementation is much easier, so let's start from it. Also, we don't need to test all 3 functions, since they will be called exactly the same way. Thus, let's just pick TestFunc1.

So let's add the following into the .asm file:

x86-64
extrn TestFunc1 : PROC             ; note that TestFunc1 is defined as PROC
extrn __imp_TestFunc1 : QWORD      ; while __imp_TestFunc1 is defined as QWORD!

.code
ALIGN 16


asm_func PROC
    sub     rsp, 28h               ; set up shadow stack for x64 calling convention
    
    mov     rdx, 5
    mov     rcx, 10
    call    TestFunc1              ; uses slower JMP stub
    
    mov     rdx, 5
    mov     rcx, 10
    call    __imp_TestFunc1        ; Direct call
    
    add		rsp, 28h               ; Restore stack
    ret
asm_func ENDP

END

There are several points of interest to review in the code above:

  1. Compile and walk the asm_func function with the Visual Studio debugger. Then step into the TestFunc1 call. In it you will see a JMP stub:
    x86-64
    ; TestFunc1 PROC
        jmp      qword ptr [__imp_TestFunc1]

    But when you step into the __imp_TestFunc1 call from our asm_func function, it will lead straight to the TestFunc1 function in our imported DLL.

    So by using the __imp_ prefix we are saving an unnecessary jmp from the JMP stub. This may not be a big deal in C++, but since you're coding in Assembly, this may make a difference for you.

  2. Note that TestFunc1 is defined as PROC on top, while __imp_TestFunc1 as QWORD. This is important! If you define __imp_TestFunc1 as PROC, the MASM compiler will treat it as the beginning of the code you want to call. This will cause the crash.

32-bit Implementation

The x86 implementation is a little different. Let's review it next.

So let's code the following in the .asm file:

x86
.686p
.model flat, C


OPTION LANGUAGE: SYSCALL            ;  needed to prevent prepending of names with _

EXTERN __imp__TestFunc1 : DWORD     ; __cdecl

EXTERN __imp__TestFunc2@8 : DWORD   ; __stdcall

EXTERN @TestFunc3@8 : PROC
EXTERN __imp_@TestFunc3@8 : DWORD   ; __fastcall

OPTION LANGUAGE: C                  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

EXTERN TestFunc1 : PROC
EXTERN TestFunc2@8 : PROC


.code
ALIGN 8


asm_func PROC

    ; __cdecl calling convention
    push    5
    push    10
    call    TestFunc1              ; uses slower JMP stub
    add     esp,8
    
    push    5
    push    10
    call    __imp__TestFunc1       ; Direct call
    add     esp,8
    
    ; __stdcall calling convention
    push    5
    push    10
    call    TestFunc2@8            ; uses slower JMP stub
    
    push    5
    push    10
    call    __imp__TestFunc2@8     ; Direct call
    
    ; __fastcall calling convention
    mov     edx, 5
    mov     ecx, 10
    call    @TestFunc3@8           ; uses slower JMP stub
    
    mov     edx, 5
    mov     ecx, 10
    call    __imp_@TestFunc3@8     ; Direct call
    
    ret
asm_func ENDP

END

The code is very similar to the x64 implementation, except that all 3 calling conventions need different handling. Thus the top of the file is dedicated to properly formatting for each function call, using the naming decoration for each.

Then just as for the x64 implementation, note that each call to an external function not prepended with the __imp_ prefix will be first redirected to a JMP stub. So it may be prudent to avoid that by using exported function names from the IAT, by prepending them with the __imp_ prefix. In that case the call will go directly to the imported function.

Overriding System Function Calls

This is more of an obfuscation practice than anything else. Or, such technique may be used by an antivirus or security product to install function trampolines.

Since Microsoft compiler keeps pointers to all imported functions in the IAT, we can override them with our own functions. For instance, if we take the CloseHandle system API that is normally used to close a handle:

C++
CloseHandle(hHandle);

We can use it's address in the IAT to put our own trampoline in. For our silly purpose, let's just make it show a message box. For that we need to code it to match the declaration of the original CloseHandle function:

C++
BOOL __stdcall CloseHandleOverride(HANDLE hHandle)
{
	MessageBox(NULL, L"Hello world from CloseHandle!", L"Message", MB_OK);

	return FALSE;
}

Then when our app starts, we need to do the actual override in the IAT. For that we will use the __imp_CloseHandle global variable that the compiler will hold for us. It will contain the address of the mapped CloseHandle import in our process. We can reference it by declaring it as such:

C++ (x64 build)
extern "C" extern void* __imp_CloseHandle;

For the x64 build of the project, we can then do the following:

C++ (x64 build)
DWORD dwOldProtect;
VirtualProtect(&__imp_CloseHandle, sizeof(__imp_CloseHandle), PAGE_READWRITE, &dwOldProtect);
__imp_CloseHandle = CloseHandleOverride;
VirtualProtect(&__imp_CloseHandle, sizeof(__imp_CloseHandle), dwOldProtect, &dwOldProtect);

Note that the memory page that contains IAT is initially set to read-only for security reasons. Thus we will need to change its protection status to PAGE_READWRITE with the VirtualProtect function first. And then revert it back after our override.

After that, calling CloseHandle anywhere in our process:

C++
CloseHandle(hHandle);

Will result in our CloseHandleOverride being called instead:

Hello world message box

Note that this example is totally impractical, and in a real override you would save the original address of the CloseHandle function and call it at the end. I didn't want to complicate this simple test and thus didn't do it.

32-bit Variant

In x86, or 32-bit variant, the mangling of function names is somewhat more complex. The calling conventions used by the Microsoft C compiler are different for x86, thus it uses additional decorations for function name mangling, as such:

  • __cdecl - uses the underscore _ prefix (except when functions that use C linkage are exported). Ex: _ImportedFunc
  • __stdcall - uses the underscore _ prefix and the @n suffix, where n is the size of the arguments. Ex: _ImportedFunc@8
  • __fastcall - uses the underscore @ prefix and the @n suffix, where n is the size of the arguments. Ex: @ImportedFunc@8
You can check this naming scheme by loading the compiled binary file into the WinAPI Search app.
WinAPI Search

So in case of an x86 build, we can't just use __imp_CloseHandle global variable like we did for x64. We need to slightly adjust it using the undocumented /ALTERNATENAME: linker command:

C++
#define ALTNAME(x,n) __pragma(comment(linker, "/ALTERNATENAME:__imp_" #x "=__imp__" #x "@" #n))

ALTNAME(CloseHandle, 4)

#define impCloseHandle _imp_CloseHandle			//  __imp__CloseHandle@4
extern "C" extern void* impCloseHandle;

And then use that preprocessor definition in the C+ code in a similar way as we did for the x64 build:

C++ (x86 build)
DWORD dwOldProtect;
VirtualProtect(&impCloseHandle, sizeof(impCloseHandle), PAGE_READWRITE, &dwOldProtect);
impCloseHandle = CloseHandleOverride;
VirtualProtect(&impCloseHandle, sizeof(impCloseHandle), dwOldProtect, &dwOldProtect);

This will give us the same silly override of the CloseHandle system function in our process.

Delay Load Imports

One other internal use of the __imp_ prefix, that Rbmm pointed out to me, is for marking the Delay Load Imports. Delayed loading is a very old concept when a module (or DLL) is loaded only upon request, or when one of its imported functions is used, versus doing it during the load-time linking stage.

You can define a DLL to be delay loaded via the /DELAYLOAD command line switch, or through the project properties window:

C++ Project Page - Delay Loaded DLLs
Visual Studio 2019, C++ project properties window: Configuration Properties > Linker > Input > Delay Loaded DLLs.

In that case the compiler creates global stub functions with names of the original functions in the delay loaded module prepended with the __imp_load_ prefix. Those stub function pointers are originally stored in the IAT. But when a delay loaded function is loaded upon the first request, its pointer in the IAT is overwritten by the actual function pointer in the loaded module. Thus, any subsequent calls to that function will resolve to the actual function being called.

For instance, if the TestFunc1 was from a delay loaded module, the compiler will create a stub function with the name __imp_load_TestFunc1 whose pointer will be originally stored in the IAT. Thus before that delay-load function is loaded, __imp_TestFunc1 will point to __imp_load_TestFunc1. So we can use the following logic to determine if some specific delay-load function was already loaded:

x64 build
extern "C" extern void* __imp_load_TestFunc1;
extern "C" extern void* __imp_TestFunc1;

bool Is_TestFunc1_Loaded()
{
	//RETURN: true if TestFunc1 delay-load function is loaded
	return &__imp_load_TestFunc1 != __imp_TestFunc1;
}

#include <assert.h>

int main()
{
	assert(!Is_TestFunc1_Loaded());

	TestFunc1(10, 5);

	assert(Is_TestFunc1_Loaded());
}

Or, you can just use the following macro:

x64 build
#define IS_DELAY_LOAD_FUNC_LOADED(f) (&__imp_load_ ##f != __imp_ ##f)

int main()
{
	assert(!IS_DELAY_LOAD_FUNC_LOADED(TestFunc1));

	TestFunc1(10, 5);

	assert(IS_DELAY_LOAD_FUNC_LOADED(TestFunc1));
}
Other than that, there's no real tangible application in it for a developer, except that maybe you can also use it to check if a certain function is declared as a delay loaded import during a compilation stage.

So you can do something like this. First, let's define a preprocessor macro that will check it for us:

C++
#define CHECK_DELAY_LOAD(f) extern "C" extern void* __imp_load_ ##f; \
	void test_delay_load ##f(){(__imp_load_ ##f) ? 1 : 0; }

And then you can use this macro as such on an imported function name:

C++
CHECK_DELAY_LOAD(TestFunc1);		//Check if TestFunc1 is in a delayed loaded module
Note that you need to use the CHECK_DELAY_LOAD macro on a global scope, or outside of any function definition.

With the check above you will get a linker error if TestFunc1 is not in the module declared for delayed loading:

error LNK2001: unresolved external symbol __imp_load_TestFunc1

I admit that this is not a very helpful error message. But if you search your solution for TestFunc1, it should clue you in to the cause of the problem. For that, make sure to keep the comment in place for the CHECK_DELAY_LOAD macro, as I showed above.

Static Linking Tricks

Another interesting situation that Rbmm showed me was utilization of the __imp_ prefix with static linking to system functions. Say, if you have some API that is not available in all versions of the operating system. Let's take OpenThemeDataForDpi for instance. It is available on "Windows 10, version 1703" or later.

So just coding something like this:

C++
HTHEME hTheme = OpenThemeDataForDpi(HWND_DESKTOP, L"TaskbarPearl", GetDpiForWindow(HWND_DESKTOP));

will result in the module not being able to load on any operating system prior to Windows 10, version 1703.

As a solution, you can link to it dynamically, or during run-time. But there's also another clever way to do it.

First, declare an external pointer to the function in question in the IAT and set it to 0:

C++
extern "C" void* __imp_OpenThemeDataForDpi = NULL;

Note that we're using the __imp_ prefix for that.

Then use the following construct to call the OpenThemeDataForDpi function:

C++
HTHEME GetStartButtonTheme()
{
	//RETURN:
	//		= Handle for the Start Button HTHEME object
	//		= NULL if error

#define NOT_SUPPORTED (FARPROC)(-1)

	//Use the singleton approach
	if (!__imp_OpenThemeDataForDpi)
	{
		__imp_OpenThemeDataForDpi = GetProcAddress(LoadLibrary(L"uxtheme.dll"), "OpenThemeDataForDpi");

		if (!__imp_OpenThemeDataForDpi)
		{
			//Operating system doesn't support this API
			__imp_OpenThemeDataForDpi = NOT_SUPPORTED;

			return NULL;
		}
	}
	else if (__imp_OpenThemeDataForDpi == NOT_SUPPORTED)
	{
		//Previously tested - API not supported
		return NULL;
	}

	return OpenThemeDataForDpi(HWND_DESKTOP, L"TaskbarPearl", GetDpiForWindow(HWND_DESKTOP));
}

The code above will use the __imp_OpenThemeDataForDpi global variable in the IAT to store the pointer to the OpenThemeDataForDpi function. And in case the operating system doesn't support that API, our module will load just fine but our GetStartButtonTheme function will return NULL.

Conclusion

These are just smaller nuances about the __imp_ prefix that may be important for the low-level developers. It was originally pointed out to me by Rbmm, and I decided to throw it into a blog post to share it with everyone.

In case you know of some other clever uses of the __imp_ prefix, please don't hesitate to put them in the comments below.

Related Articles