This article contains functions and features that are not documented by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.
Intro
This blog post will give some explanations of the internal use of the __imp_
and __imp_load_
prefixes by Microsoft compilers.
Types of the CALL instruction
In the Intel architecture the compiler can generally use two types of the call
CPU instruction:
CALL rel32
- specifies relative displacement against the next instruction.CALL [m]
- specifies absolute indirect address.
If we make a call to a function inside some module (or if the distance to the function is within -2Gb/+2Gb, or -0x80000000/+0x7FFFFFFF
)
then using the CALL rel32
instruction becomes more efficient, than the CALL [m]
one. Here's why:
- 5 bytes versus 6.
CALL rel32
is base-independent and doesn't need any additional relocation.CALL [m]
on the other hand, requires an absolute address in memory in[m]
. This means that every such memory slot needs to be described in the relocation table (which adds +2 bytes), plus we also need to fix this address when the module is relocated.
That is why compiler tries to use CALL rel32
instruction to call functions inside the same module.
But when calling functions in another module, the situation changes.
First of all, it may not be possible to use CALL rel32
as we can't go further out than the -2Gb/+2Gb from that instruction.
Even in the 32-bit code (with 4Gb memory space) this may not be enough. (Although in Windows it is enough since we can't cross the boundary between the user and kernel space.)
But in the 64-bit system, some mapped modules often sit too far apart from each other in memory. For instance, one module can be mapped at 0x7FF729110000
and another one at 0x7FFFE0480000
with a relative offset of 0x8B7370000
between them, that exceeds -0x80000000/+0x7FFFFFFF
.
Then, even if all modules weren't mapped further out than -2Gb/+2Gb from each other (which is possible for 32-bit Windows), there's also another problem - relocation.
When we make a call with CALL rel32
inside a module, the distance between 2 functions (rel32) is unchangeable no matter what base address the module is mapped at.
That distance is known at the linking-time. But if this is a call between two modules that can be mapped at different base addresses, the distance between them is different
every time they load, and we can know it only during run-time. That is why every CALL rel32
instruction for function calls in another module needs to be described with
the use of relocation.
Of course, we will also need relocation for the CALL [m]
instructions. But it is only needed for the memory slot it points to (or [m]
) and not for the instruction itself.
There are (usually) a smaller number of such memory slots needed than the instructions themselves.
As an example, let's take a call to the CloseHandle
function. A program may contain multiple calls to that function from different places.
In other words, there're many instructions like:
But in that case the memory slot (with the address of the kernel32!CloseHandle
function) will be only one.
We can also group all the memory slots into a continuous linear array (usually not larger than a single memory page in size) and thus we will have to modify only that page
during relocation. Such region is called "Import Address Table" (IAT)
and it is described in the IMAGE_DIRECTORY_ENTRY_IAT
directory in the PE file.
Thus we have to call functions in two different ways:
CALL rel32
- if the function is located inside the module where thecall
instruction is.CALL [m]
- if the function is located inside a different module.
In this case we are not talking about virtual function calls, where most of the times compiler usesCALL [m]
instruction, where[m]
points to an address inside a virtual function table. But that is done for the purpose of flexibility and functionality and thus we're not concerned with saving space there.
Also, at times, one may notice the CALL reg
instruction in a well optimized code. The compiler may do this in a situation as such:
The code above can be optimized to:
mov rbx, [__imp_CloseHandle] ; we read once from [__imp_CloseHandle] and place the result into the non-volatile register: RBX
call rbx
call rbx
; ...
call rbx
Thus, to call a function from one module to another we can use different forms of the call
instruction.
But to pick one, the compiler needs to know where the function that is being called is location.
By default the compiler assumes that it is located in the same module, and generates the CALL rel32
instruction.
If we want to tell the compiler that some function is located in another module, there is a special __declspec(dllimport)
attribute.
When the compiler detects a function declared with it:
It implicitly defines a new variable:
And calls the function using call [__imp_Func]
instruction.
MASM assembler allows to write call __imp_Func
, or without square brackets. But I added them for better understanding, that the address comes from memory.
In some sense this is not principally different from us obtaining the address of the function via GetProcAddress
- we declare a variable (or type void*
)
that will contain the address of our function and then fill it in:
After that, we can invoke it:
The only difference here is that during static linking the GetProcAddress
(and GetModuleHandle
before that) are called for us by the image loader
(it doesn't call those specific functions, but their internal counterparts from ntdll
) while the __imp_SomeApi
variable is declared for us by the compiler.
But if the compiler doesn't know that the function is being imported (say, if we didn't declare it with the __declspec(dllimport)
attribute),
then it generates a relative call:
And the linker has to generate a proxy function in our module for it:
In this case, this is obvious that calling one call [__imp_SomeApi]
instruction is better than calling two: call SomeApi
-> jmp [__imp_SomeApi]
.
Modern versions of the compiler and linker have two options:
- /GL (Whole program optimization) - for the compiler.
- /LTCG (Link-time code generation) - for the linker.
They both must be enabled to avoid the use of the proxy functions and to instruct the compiler to include the
call [__imp_SomeApi]
instruction instead. In that case the__declspec(dllimport)
attribute can be omitted.But it is always better and also correct to use the
__declspec(dllimport)
attribute.
CL.exe Specifics
The following are some internals of how Microsoft Visual C++ compiler, or cl.exe
implements the __declspec(dllimport)
attribute.
Say, we have the following piece of code:
and then call:
The linker (x64) will give us an error:
error LNK2001: unresolved external symbol SendSAS
This is logical, since we haven't coded the function with such a name.
But if we change it to:
The linker will give us a different error:
error LNK2001: unresolved external symbol __imp_SendSAS
So, what happens there?
When we code a function with the DECLSPEC_IMPORT
attribute, the compiler declares a variable:
Or, in other words, it takes __FUNCDNAME__
name of the function
(note that it uses __FUNCDNAME__
, and not __FUNCTION__
) and adds the __imp_
prefix in front of it.
In general, it declares it as such:
or
For simplicity, and also as not to get confused in underscore symbols, let's use the wordfunc
instead of the__FUNCDNAME__
.
So we may say that the compiler declares a variable as such:
The way__FUNCDNAME__
is translated into an actual character string is dictated by the name mangling used by the compiler for a specific calling convention. In case of x64 and extern "C",__FUNCDNAME__
will be equal to the name of the function itself.
As a example of this principle, we can implement such variable ourselves. For simplicity I will be using x64 builds, so that I don't have to deal with name mangling.
EXTERN_C_START
void* __imp_SendSAS;
EXTERN_C_END
// We need this line if we don't directly refer to:
#pragma comment(linker, "/include:__imp_SendSAS")
SendSAS(TRUE);
In that case, __imp_SendSAS
will be obviously set to 0 and the following call with crash the process:
But if we ourselves declare __imp_SendSAS
then we also need to initialize it:
// Now we don't need this:
// #pragma comment(linker, "/include:__imp_SendSAS")
if (__imp_SendSAS = GetProcAddress(LoadLibraryW(L"Sas"), "SendSAS"))
{
SendSAS(TRUE);
}
It is important to understand that in this case, the call to SendSAS()
does not lead to a static linking to sas.dll
.
I can give a better example with the use of some system API. Let's say
OpenThemeDataForDpi
// OpenThemeDataForDpi
// Minimum supported client Windows 10, version 1703 [desktop apps only]
if (__imp_OpenThemeDataForDpi = GetProcAddress(LoadLibraryW(L"uxtheme"), "OpenThemeDataForDpi"))
{
OpenThemeDataForDpi(HWND_DESKTOP, L"TaskbarPearl", GetDpiForWindow(HWND_DESKTOP));
}
If we look at the import table of our executable after compilation, it will not have the OpenThemeDataForDpi
function.
It is also very important to declare it correctly. The following will lead to an error:
Thus, it's better do it as such:
You can also do it this way:
The thing is that the compiler treats this form of declaration:
By mistake as:
The other option, instead of declaring everything manually and then initializing it, we can use lib
(import) libraries.
They have appropriate __imp_func
symbols defined in them in the IMPORT_OBJECT_HEADER
structs. (You can find its declaration in the winnt.h
file.)
The use of those special import objects leads to insertion of the import structures into the PE file.
But, even if we're using a .lib
file, we can still use some symbols from it directly. Consider this code:
#pragma warning(disable : 4100)
BOOL
WINAPI
InitializeProcThreadAttributeListXP(
_Out_writes_bytes_to_opt_(*lpSize,*lpSize) LPPROC_THREAD_ATTRIBUTE_LIST lpAttributeList,
_In_ DWORD dwAttributeCount,
_Reserved_ DWORD dwFlags,
_When_(lpAttributeList == nullptr,_Out_) _When_(lpAttributeList != nullptr,_Inout_) PSIZE_T lpSize
)
{
RtlNtStatusToDosError(STATUS_NOT_IMPLEMENTED);
return FALSE;
}
#pragma warning(default : 4100)
EXTERN_C_START
void* __imp_InitializeProcThreadAttributeList = InitializeProcThreadAttributeListXP;
EXTERN_C_END
/////////////////////////////////////////
SIZE_T s;
InitializeProcThreadAttributeList(0, 1, 0, &s);
GetLastError(); // ERROR_ENVVAR_NOT_FOUND
if (PVOID pv = GetProcAddress(GetModuleHandleW(L"kernel32"), "InitializeProcThreadAttributeList"))
{
__imp_InitializeProcThreadAttributeList = pv;
}
InitializeProcThreadAttributeList(0, 1, 0, &s);
GetLastError(); // ERROR_INSUFFICIENT_BUFFER for Vista+
So even if we're using kernel32.lib
that has __imp_InitializeProcThreadAttributeList
declared, and we also have __imp_InitializeProcThreadAttributeList
symbol
in our code, this doesn't lead to a conflict because we have that symbol in the COFF object, and in kernel32 it is in the
import object.
This way someone can use this nuance, for instance, if some symbol exists in the import lib file, but the corresponding function exists only in some newer versions of the operating system.
Delay Load Imports and __imp_load_ Prefix
There's also one intricate detail about delay loaded imports. All such functions are also called via the call [m]
instruction.
But for every delay load function the compiler creates a function (and not a variable) with a name prefixed by __imp_load_
,
as in __imp_load_SomeApi
.
So in other words, there a variable:
But there's also a function:
And originally __imp_SomeApi = &__imp_load_SomeApi;
, but after the first call it becomes:
Thus, if we want, we can code it as such:
// Assuming that advapi32.dll is marked for delay loading
EXTERN_C_START
// This function is created by the compiler
BOOL WINAPI __imp_load_OpenProcessToken( _In_ HANDLE ProcessHandle, _In_ DWORD DesiredAccess, _Outptr_ PHANDLE TokenHandle );
// This variable is created by the compiler
extern PVOID __imp_OpenProcessToken; // = &__imp_load_OpenProcessToken
EXTERN_C_END
HANDLE hToken;
DbgPrint("%p %p\n", &__imp_load_OpenProcessToken, __imp_OpenProcessToken);
if (OpenProcessToken(NtCurrentProcess(), TOKEN_QUERY, &hToken ))
{
DbgPrint("%p %p\n", &__imp_load_OpenProcessToken, __imp_OpenProcessToken);
CloseHandle(hToken);
}
// If we want, we can call __imp_load_OpenProcessToken directly.
// But to do so, to be honest, there's not much sense ...
if (__imp_load_OpenProcessToken(NtCurrentProcess(), TOKEN_QUERY, &hToken ))
{
CloseHandle(hToken);
}
Then in the debugger output window I can see something as follows:
00007FF7C46DB8AF 00007FF7C46DB8AF
'demo.exe': Loaded 'C:\Windows\System32\advapi32.dll', Symbols loaded (source information stripped).
00007FF7C46DB8AF 00007FFFFBA06220
And we can see that the value of __imp_OpenProcessToken
has changed, although it initially coincided with __imp_load_OpenProcessToken
(or 00007FF7C46DB8AF
)
and then became 00007FFFFBA06220
(or advapi32!OpenProcessToken
.)
We can also do:
Or, in other words, __imp_load_OpenProcessToken
is the same as &__imp_load_OpenProcessToken
, and which one to choose is the matter of your coding style.
Conclusion
These are just some internal details of the workings of the Microsoft compilers and linkers. If I missed something please leave a comment below.