Blog Post

Intricacies of Microsoft Compilers - Part 2

The use of __imp_ and __imp_load_ prefixes.

This article contains functions and features that are not documented by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.

Intro

This blog post will give some explanations of the internal use of the __imp_ and __imp_load_ prefixes by Microsoft compilers.

Types of the CALL instruction

In the Intel architecture the compiler can generally use two types of the call CPU instruction:

  • CALL rel32 - specifies relative displacement against the next instruction.
  • CALL [m] - specifies absolute indirect address.

If we make a call to a function inside some module (or if the distance to the function is within -2Gb/+2Gb, or -0x80000000/+0x7FFFFFFF) then using the CALL rel32 instruction becomes more efficient, than the CALL [m] one. Here's why:

  1. 5 bytes versus 6.
  2. CALL rel32 is base-independent and doesn't need any additional relocation. CALL [m] on the other hand, requires an absolute address in memory in [m]. This means that every such memory slot needs to be described in the relocation table (which adds +2 bytes), plus we also need to fix this address when the module is relocated.

That is why compiler tries to use CALL rel32 instruction to call functions inside the same module.

But when calling functions in another module, the situation changes.

First of all, it may not be possible to use CALL rel32 as we can't go further out than the -2Gb/+2Gb from that instruction. Even in the 32-bit code (with 4Gb memory space) this may not be enough. (Although in Windows it is enough since we can't cross the boundary between the user and kernel space.) But in the 64-bit system, some mapped modules often sit too far apart from each other in memory. For instance, one module can be mapped at 0x7FF729110000 and another one at 0x7FFFE0480000 with a relative offset of 0x8B7370000 between them, that exceeds -0x80000000/+0x7FFFFFFF.

Then, even if all modules weren't mapped further out than -2Gb/+2Gb from each other (which is possible for 32-bit Windows), there's also another problem - relocation.

When we make a call with CALL rel32 inside a module, the distance between 2 functions (rel32) is unchangeable no matter what base address the module is mapped at. That distance is known at the linking-time. But if this is a call between two modules that can be mapped at different base addresses, the distance between them is different every time they load, and we can know it only during run-time. That is why every CALL rel32 instruction for function calls in another module needs to be described with the use of relocation.

Of course, we will also need relocation for the CALL [m] instructions. But it is only needed for the memory slot it points to (or [m]) and not for the instruction itself. There are (usually) a smaller number of such memory slots needed than the instructions themselves.

As an example, let's take a call to the CloseHandle function. A program may contain multiple calls to that function from different places. In other words, there're many instructions like:

x86[Copy]
call [__imp_CloseHandle]

But in that case the memory slot (with the address of the kernel32!CloseHandle function) will be only one. We can also group all the memory slots into a continuous linear array (usually not larger than a single memory page in size) and thus we will have to modify only that page during relocation. Such region is called "Import Address Table" (IAT) and it is described in the IMAGE_DIRECTORY_ENTRY_IAT directory in the PE file.

Thus we have to call functions in two different ways:

  • CALL rel32 - if the function is located inside the module where the call instruction is.
  • CALL [m] - if the function is located inside a different module.
In this case we are not talking about virtual function calls, where most of the times compiler uses CALL [m] instruction, where [m] points to an address inside a virtual function table. But that is done for the purpose of flexibility and functionality and thus we're not concerned with saving space there.

Also, at times, one may notice the CALL reg instruction in a well optimized code. The compiler may do this in a situation as such:

x86-64[Copy]
call [__imp_CloseHandle]
call [__imp_CloseHandle]
; ...
call [__imp_CloseHandle]

The code above can be optimized to:

x86-64[Copy]
mov rbx, [__imp_CloseHandle]     ; we read once from [__imp_CloseHandle] and place the result into the non-volatile register: RBX
call rbx
call rbx
; ...
call rbx

Thus, to call a function from one module to another we can use different forms of the call instruction. But to pick one, the compiler needs to know where the function that is being called is location. By default the compiler assumes that it is located in the same module, and generates the CALL rel32 instruction.

If we want to tell the compiler that some function is located in another module, there is a special __declspec(dllimport) attribute. When the compiler detects a function declared with it:

C++[Copy]
__declspec(dllimport) type Func(...);

It implicitly defines a new variable:

C++[Copy]
extern void* __imp_Func;

And calls the function using call [__imp_Func] instruction.

MASM assembler allows to write call __imp_Func, or without square brackets. But I added them for better understanding, that the address comes from memory.

In some sense this is not principally different from us obtaining the address of the function via GetProcAddress - we declare a variable (or type void*) that will contain the address of our function and then fill it in:

C++[Copy]
void* __imp_SomeApi;

__imp_SomeApi = GetProcAddress(hmod, "SomeApi");

After that, we can invoke it:

C++[Copy]
((FuncType)__imp_SomeApi)(...);

The only difference here is that during static linking the GetProcAddress (and GetModuleHandle before that) are called for us by the image loader (it doesn't call those specific functions, but their internal counterparts from ntdll) while the __imp_SomeApi variable is declared for us by the compiler.

But if the compiler doesn't know that the function is being imported (say, if we didn't declare it with the __declspec(dllimport) attribute), then it generates a relative call:

x86[Copy]
    call SomeApi

And the linker has to generate a proxy function in our module for it:

x86[Copy]
SomeApi proc
    jmp [__imp_SomeApi]
SomeApi endp

In this case, this is obvious that calling one call [__imp_SomeApi] instruction is better than calling two: call SomeApi -> jmp [__imp_SomeApi].

Modern versions of the compiler and linker have two options:

They both must be enabled to avoid the use of the proxy functions and to instruct the compiler to include the call [__imp_SomeApi] instruction instead. In that case the __declspec(dllimport) attribute can be omitted.

But it is always better and also correct to use the __declspec(dllimport) attribute.

CL.exe Specifics

The following are some internals of how Microsoft Visual C++ compiler, or cl.exe implements the __declspec(dllimport) attribute.

Say, we have the following piece of code:

C++[Copy]
EXTERN_C
VOID
WINAPI
SendSAS(_In_ BOOL AsUser);

and then call:

C++[Copy]
SendSAS(TRUE);

The linker (x64) will give us an error:

error LNK2001: unresolved external symbol SendSAS

This is logical, since we haven't coded the function with such a name.

But if we change it to:

C++[Copy]
EXTERN_C
DECLSPEC_IMPORT
VOID
WINAPI
SendSAS(_In_ BOOL AsUser);

The linker will give us a different error:

error LNK2001: unresolved external symbol __imp_SendSAS

So, what happens there?

When we code a function with the DECLSPEC_IMPORT attribute, the compiler declares a variable:

C++[Copy]
__imp_ ## __FUNCDNAME__

Or, in other words, it takes __FUNCDNAME__ name of the function (note that it uses __FUNCDNAME__, and not __FUNCTION__) and adds the __imp_ prefix in front of it. In general, it declares it as such:

x86[Copy]
extern  __imp___FUNCDNAME__ : DWORD     ; 32bit

or

x86-64[Copy]
extern  __imp___FUNCDNAME__ : QWORD     ; 64bit
For simplicity, and also as not to get confused in underscore symbols, let's use the word func instead of the __FUNCDNAME__.

So we may say that the compiler declares a variable as such:

C++[Copy]
extern void* __imp_func;
Then to call that function compiler generates the following code:
x86[Copy]
call [__imp_func]
The way __FUNCDNAME__ is translated into an actual character string is dictated by the name mangling used by the compiler for a specific calling convention. In case of x64 and extern "C", __FUNCDNAME__ will be equal to the name of the function itself.

As a example of this principle, we can implement such variable ourselves. For simplicity I will be using x64 builds, so that I don't have to deal with name mangling.

C++[Copy]
EXTERN_C_START
void* __imp_SendSAS;
EXTERN_C_END

// We need this line if we don't directly refer to:
#pragma comment(linker, "/include:__imp_SendSAS")

SendSAS(TRUE);

In that case, __imp_SendSAS will be obviously set to 0 and the following call with crash the process:

x86-64[Copy]
call [__imp_SendSAS]

But if we ourselves declare __imp_SendSAS then we also need to initialize it:

C++[Copy]
// Now we don't need this:
// #pragma comment(linker, "/include:__imp_SendSAS")

if (__imp_SendSAS = GetProcAddress(LoadLibraryW(L"Sas"), "SendSAS"))
{
   SendSAS(TRUE);
}

It is important to understand that in this case, the call to SendSAS() does not lead to a static linking to sas.dll.

I can give a better example with the use of some system API. Let's say OpenThemeDataForDpi

C++[Copy]
// OpenThemeDataForDpi
// Minimum supported client Windows 10, version 1703 [desktop apps only]
if (__imp_OpenThemeDataForDpi = GetProcAddress(LoadLibraryW(L"uxtheme"), "OpenThemeDataForDpi"))
{
	OpenThemeDataForDpi(HWND_DESKTOP, L"TaskbarPearl", GetDpiForWindow(HWND_DESKTOP));
}

If we look at the import table of our executable after compilation, it will not have the OpenThemeDataForDpi function.

It is also very important to declare it correctly. The following will lead to an error:

C++[Copy]
EXTERN_C void* __imp_OpenThemeDataForDpi;		// Error

Thus, it's better do it as such:

C++[Copy]
EXTERN_C {
	void* __imp_OpenThemeDataForDpi;     // (.bbs will not be in the file)
}

You can also do it this way:

C++[Copy]
EXTERN_C void* __imp_OpenThemeDataForDpi = 0;

The thing is that the compiler treats this form of declaration:

C++[Copy]
EXTERN_C void* __imp_OpenThemeDataForDpi;

By mistake as:

C++[Copy]
EXTERN_C extern void* __imp_OpenThemeDataForDpi;

The other option, instead of declaring everything manually and then initializing it, we can use lib (import) libraries. They have appropriate __imp_func symbols defined in them in the IMPORT_OBJECT_HEADER structs. (You can find its declaration in the winnt.h file.) The use of those special import objects leads to insertion of the import structures into the PE file.

But, even if we're using a .lib file, we can still use some symbols from it directly. Consider this code:

C++[Copy]
#pragma warning(disable : 4100)
BOOL
WINAPI
InitializeProcThreadAttributeListXP(
	_Out_writes_bytes_to_opt_(*lpSize,*lpSize) LPPROC_THREAD_ATTRIBUTE_LIST lpAttributeList,
	_In_ DWORD dwAttributeCount,
	_Reserved_ DWORD dwFlags,
	_When_(lpAttributeList == nullptr,_Out_) _When_(lpAttributeList != nullptr,_Inout_) PSIZE_T lpSize
)
{
	RtlNtStatusToDosError(STATUS_NOT_IMPLEMENTED);
	return FALSE;
}
#pragma warning(default : 4100)

EXTERN_C_START
void* __imp_InitializeProcThreadAttributeList = InitializeProcThreadAttributeListXP;
EXTERN_C_END

/////////////////////////////////////////
SIZE_T s;

InitializeProcThreadAttributeList(0, 1, 0, &s);
GetLastError(); // ERROR_ENVVAR_NOT_FOUND

if (PVOID pv = GetProcAddress(GetModuleHandleW(L"kernel32"), "InitializeProcThreadAttributeList"))
{
	__imp_InitializeProcThreadAttributeList = pv;
}

InitializeProcThreadAttributeList(0, 1, 0, &s);
GetLastError();    // ERROR_INSUFFICIENT_BUFFER for Vista+

So even if we're using kernel32.lib that has __imp_InitializeProcThreadAttributeList declared, and we also have __imp_InitializeProcThreadAttributeList symbol in our code, this doesn't lead to a conflict because we have that symbol in the COFF object, and in kernel32 it is in the import object.

This way someone can use this nuance, for instance, if some symbol exists in the import lib file, but the corresponding function exists only in some newer versions of the operating system.

Delay Load Imports and __imp_load_ Prefix

There's also one intricate detail about delay loaded imports. All such functions are also called via the call [m] instruction. But for every delay load function the compiler creates a function (and not a variable) with a name prefixed by __imp_load_, as in __imp_load_SomeApi.

So in other words, there a variable:

C++[Copy]
void* __imp_SomeApi;

But there's also a function:

C++[Copy]
type __imp_load_SomeApi(...) {
}

And originally __imp_SomeApi = &__imp_load_SomeApi;, but after the first call it becomes:

C++[Copy]
__imp_SomeApi = GetProcAddress(hmod, "SomeApi");

Thus, if we want, we can code it as such:

C++[Copy]
// Assuming that advapi32.dll is marked for delay loading

EXTERN_C_START

// This function is created by the compiler
BOOL WINAPI __imp_load_OpenProcessToken( _In_ HANDLE ProcessHandle, _In_ DWORD DesiredAccess, _Outptr_ PHANDLE TokenHandle );

// This variable is created by the compiler
extern PVOID __imp_OpenProcessToken; // = &__imp_load_OpenProcessToken

EXTERN_C_END

HANDLE hToken;
DbgPrint("%p %p\n", &__imp_load_OpenProcessToken, __imp_OpenProcessToken);
if (OpenProcessToken(NtCurrentProcess(), TOKEN_QUERY, &hToken ))
{
	DbgPrint("%p %p\n", &__imp_load_OpenProcessToken, __imp_OpenProcessToken);
	CloseHandle(hToken);
}

// If we want, we can call __imp_load_OpenProcessToken directly.
// But to do so, to be honest, there's not much sense ...
if (__imp_load_OpenProcessToken(NtCurrentProcess(), TOKEN_QUERY, &hToken ))
{
	CloseHandle(hToken);
}

Then in the debugger output window I can see something as follows:

00007FF7C46DB8AF 00007FF7C46DB8AF
'demo.exe': Loaded 'C:\Windows\System32\advapi32.dll', Symbols loaded (source information stripped).
00007FF7C46DB8AF 00007FFFFBA06220

And we can see that the value of __imp_OpenProcessToken has changed, although it initially coincided with __imp_load_OpenProcessToken (or 00007FF7C46DB8AF) and then became 00007FFFFBA06220 (or advapi32!OpenProcessToken.)

We can also do:

C++[Copy]
DbgPrint("%p %p\n", __imp_load_OpenProcessToken, __imp_OpenProcessToken);

Or, in other words, __imp_load_OpenProcessToken is the same as &__imp_load_OpenProcessToken, and which one to choose is the matter of your coding style.

Conclusion

These are just some internal details of the workings of the Microsoft compilers and linkers. If I missed something please leave a comment below.

Related Articles