This article contains undocumented features that are not supported by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.
Preface
If you put some Windows function (or WinAPI) into your C or C++ code:
HANDLE hFile = ::CreateFileW(pFilePath, GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
::CloseHandle(hFile);
}
When you compile your program, the CreateFile
and
CloseHandle
functions from the Kernel32.dll
library,
will be linked to your program using the load-time linking.
At times though, you may not want, or may not be able to rely on load-time linking. Thus there's a way to do it
dynamically, or during run-time, using the
LoadLibrary
and
GetProcAddress
APIs:
HMODULE hKernel32 = ::LoadLibrary(L"Kernel32.dll");
if(hKernel32)
{
HANDLE (WINAPI *pfn_CreateFileW)(
LPCWSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile
);
BOOL (WINAPI *pfn_CloseHandle)(
HANDLE hObject
);
(FARPROC&)pfn_CreateFileW = ::GetProcAddress(hKernel32, "CreateFileW");
(FARPROC&)pfn_CloseHandle = ::GetProcAddress(hKernel32, "CloseHandle");
if(pfn_CreateFileW &&
pfn_CloseHandle)
{
HANDLE hFile = pfn_CreateFileW(pFilePath, GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
pfn_CloseHandle(hFile);
}
}
::FreeLibrary(hKernel32);
}
Notice that this technique, that resolved the addresses of CreateFileW
and CloseHandle
functions dynamically, can be hypothetically used to resolve any other
exported functions. But the original two APIs - LoadLibrary
and GetProcAddress
- still had to be resolved during the load-time linking,
or using the first method.
But what if you're writing a shellcode and don't have an option of doing any load-time linking of the
Windows APIs? Can you still get a pointer to the GetProcAddress
function?
The answer is yes. But for that we'll need to dive into the depths of Windows internals and use some assembly language to implement it.
General Technique
The general technique of obtaining a pointer to any Windows function is two-fold:
- Obtain the base address of the module (or `.dll`) that the function resides it. This is the address where the module was mapped into the process.
- Obtain the offset within that module for the function you need.
Note that the first step mentions "address where the module was mapped into". This means that the module in question has to be already loaded into the process (or "mapped".) There are three ways to ensure this:
- Call
LoadLibrary*
class of functions. - Use a module that is guaranteed to be loaded into the process. There's
ntdll.dll
that is guaranteed to be loaded into any user-mode process, andkernel32.dll
that is conditionally guaranteed. By that I mean, if the process is not a native system process, it is guaranteed to havekernel32.dll
module mapped. - Use any module that is linked for load-time linking into the process, provided that such module is not marked for delay-loading.
To determine if a binary executable was built as a native system module, checkIMAGE_OPTIONAL_HEADER::Subsystem
in its PE header. A native image will be marked asIMAGE_SUBSYSTEM_NATIVE
(or 1).
Alternatively you can use theWinAPI Search
tool for that as well:
WinAPI Search
utility, displaying "Show Info" window for a search result item within the IMAGE_SUBSYSTEM_NATIVE
module.
Techniques For The Shellcode
In a shellcode, originally our options are quite limited. Since by definition, our shellcode begins executing from any arbitrary location in the process, long after such process had finished loading and relocating, we can't rely on the load-time linking in our code.
Additionally, because we may be dealing with ASLR, we don't know for sure where all the loaded modules are in the process that we injected into.
And we can't search in memory for the base addresses of the loaded modules, either. We don't have an exception handler set up for our shellcode. Any read from an unmapped address will crash the process with our shellcode. We could set one up dynamically, but for that we need to interact with the operating system, or in other words, be able to call system APIs. But to do that, we need to know where the modules are. Do you see our dilemma here?
Thus, before we can do anything at all, we need to accomplish two goals:
- Be able to find the base address of one of the guaranteed-to-be-mapped modules.
- Be able to resolve an API in that mapped module by the module's base address and by the API name.
If we can somehow get to do both of those tasks, we can then obtain a pointer to the GetProcAddress
function and use it further to resolve any other system API we need.
But how do you get a pointer to GetProcAddress
? You can't just call:
Instead we have to delve into some Windows internals here, and approach it step-by-step.
Let's start from learning how to get the base address of kernel32.dll
. Luckily, it's one of those DLLs that is guaranteed to be
loaded into most processes.
Shellcode: Get Kernel32 Base Address
Luckily for us, the way any user-mode thread runs in Windows, its special segment register (GS
for 64-bit processes, and FS
for 32-bit) points to an internal structure,
called TEB
(or "Thread Environment Block"). It is quite poorly documented
by Microsoft, so often enough you can get a much better information about it by just Googling it. Or by extracting it yourself from a .pdb
file for one of the system
native modules, such as ntoskrnl.exe
or ntdll.dll
:
typedef struct _TEB {
_NT_TIB NtTib;
void * EnvironmentPointer;
_CLIENT_ID ClientId;
void * ActiveRpcHandle;
void * ThreadLocalStoragePointer;
_PEB * ProcessEnvironmentBlock;
ULONG LastErrorValue;
ULONG CountOfOwnedCriticalSections;
void * CsrClientThread;
void * Win32ThreadInfo;
ULONG User32Reserved[0x1a];
ULONG UserReserved[0x5];
void * WOW32Reserved;
ULONG CurrentLocale;
ULONG FpSoftwareStatusRegister;
void * ReservedForDebuggerInstrumentation[0x10];
void * SystemReserved1[0x1e];
CHAR PlaceholderCompatibilityMode;
UCHAR PlaceholderHydrationAlwaysExplicit;
CHAR PlaceholderReserved[0xa];
ULONG ProxiedProcessId;
_ACTIVATION_CONTEXT_STACK _ActivationStack;
UCHAR WorkingOnBehalfTicket[0x8];
LONG ExceptionCode;
UCHAR Padding0[0x4];
_ACTIVATION_CONTEXT_STACK * ActivationContextStackPointer;
ULONGLONG InstrumentationCallbackSp;
ULONGLONG InstrumentationCallbackPreviousPc;
ULONGLONG InstrumentationCallbackPreviousSp;
ULONG TxFsContext;
UCHAR InstrumentationCallbackDisabled;
UCHAR UnalignedLoadStoreExceptions;
UCHAR Padding1[0x2];
_GDI_TEB_BATCH GdiTebBatch;
_CLIENT_ID RealClientId;
void * GdiCachedProcessHandle;
ULONG GdiClientPID;
ULONG GdiClientTID;
void * GdiThreadLocalInfo;
ULONGLONG Win32ClientInfo[0x3e];
void * glDispatchTable[0xe9];
ULONGLONG glReserved1[0x1d];
void * glReserved2;
void * glSectionInfo;
void * glSection;
void * glTable;
void * glCurrentRC;
void * glContext;
ULONG LastStatusValue;
UCHAR Padding2[0x4];
_UNICODE_STRING StaticUnicodeString;
WCHAR StaticUnicodeBuffer[0x105];
UCHAR Padding3[0x6];
void * DeallocationStack;
void * TlsSlots[0x40];
_LIST_ENTRY TlsLinks;
void * Vdm;
void * ReservedForNtRpc;
void * DbgSsReserved[0x2];
ULONG HardErrorMode;
UCHAR Padding4[0x4];
void * Instrumentation[0xb];
_GUID ActivityId;
void * SubProcessTag;
void * PerflibData;
void * EtwTraceData;
void * WinSockData;
ULONG GdiBatchCount;
_PROCESSOR_NUMBER CurrentIdealProcessor;
ULONG IdealProcessorValue;
UCHAR ReservedPad0;
UCHAR ReservedPad1;
UCHAR ReservedPad2;
UCHAR IdealProcessor;
ULONG GuaranteedStackBytes;
UCHAR Padding5[0x4];
void * ReservedForPerf;
void * ReservedForOle;
ULONG WaitingOnLoaderLock;
UCHAR Padding6[0x4];
void * SavedPriorityState;
ULONGLONG ReservedForCodeCoverage;
void * ThreadPoolData;
void * * TlsExpansionSlots;
void * DeallocationBStore;
void * BStoreLimit;
ULONG MuiGeneration;
ULONG IsImpersonating;
void * NlsCache;
void * pShimData;
ULONG HeapData;
UCHAR Padding7[0x4];
void * CurrentTransactionHandle;
_TEB_ACTIVE_FRAME * ActiveFrame;
void * FlsData;
void * PreferredLanguages;
void * UserPrefLanguages;
void * MergedPrefLanguages;
ULONG MuiImpersonation;
USHORT volatile CrossTebFlags;
USHORT SpareCrossTebBits : 16; // 0xffff;
USHORT SameTebFlags;
USHORT SafeThunkCall : 01; // 0x0001;
USHORT InDebugPrint : 01; // 0x0002;
USHORT HasFiberData : 01; // 0x0004;
USHORT SkipThreadAttach : 01; // 0x0008;
USHORT WerInShipAssertCode : 01; // 0x0010;
USHORT RanProcessInit : 01; // 0x0020;
USHORT ClonedThread : 01; // 0x0040;
USHORT SuppressDebugMsg : 01; // 0x0080;
USHORT DisableUserStackWalk : 01; // 0x0100;
USHORT RtlExceptionAttached : 01; // 0x0200;
USHORT InitialThread : 01; // 0x0400;
USHORT SessionAware : 01; // 0x0800;
USHORT LoadOwner : 01; // 0x1000;
USHORT LoaderWorker : 01; // 0x2000;
USHORT SkipLoaderInit : 01; // 0x4000;
USHORT SpareSameTebBits : 01; // 0x8000;
void * TxnScopeEnterCallback;
void * TxnScopeExitCallback;
void * TxnScopeContext;
ULONG LockCount;
LONG WowTebOffset;
void * ResourceRetValue;
void * ReservedForWdf;
ULONGLONG ReservedForCrt;
_GUID EffectiveContainerId;
ULONGLONG LastSleepCounter;
ULONG DelayBackoff;
UCHAR Padding8[0x4];
} TEB, *PTEB;
The important thing for us is the ProcessEnvironmentBlock
member of the TEB
structure that points to another undocumented structure,
called PEB
(or "Process Environment Block"). It is another badly documented
system component:
typedef struct _PEB {
UCHAR InheritedAddressSpace;
UCHAR ReadImageFileExecOptions;
UCHAR BeingDebugged;
UCHAR BitField;
UCHAR ImageUsesLargePages : 01; // 0x01;
UCHAR IsProtectedProcess : 01; // 0x02;
UCHAR IsImageDynamicallyRelocated : 01; // 0x04;
UCHAR SkipPatchingUser32Forwarders : 01; // 0x08;
UCHAR IsPackagedProcess : 01; // 0x10;
UCHAR IsAppContainer : 01; // 0x20;
UCHAR IsProtectedProcessLight : 01; // 0x40;
UCHAR IsLongPathAwareProcess : 01; // 0x80;
UCHAR Padding0[0x4];
void * Mutant;
void * ImageBaseAddress;
_PEB_LDR_DATA * Ldr;
_RTL_USER_PROCESS_PARAMETERS * ProcessParameters;
void * SubSystemData;
void * ProcessHeap;
_RTL_CRITICAL_SECTION * FastPebLock;
_SLIST_HEADER * volatile AtlThunkSListPtr;
void * IFEOKey;
ULONG CrossProcessFlags;
ULONG ProcessInJob : 01; // 0x00000001;
ULONG ProcessInitializing : 01; // 0x00000002;
ULONG ProcessUsingVEH : 01; // 0x00000004;
ULONG ProcessUsingVCH : 01; // 0x00000008;
ULONG ProcessUsingFTH : 01; // 0x00000010;
ULONG ProcessPreviouslyThrottled : 01; // 0x00000020;
ULONG ProcessCurrentlyThrottled : 01; // 0x00000040;
ULONG ProcessImagesHotPatched : 01; // 0x00000080;
ULONG ReservedBits0 : 24; // 0xffffff00;
UCHAR Padding1[0x4];
void * KernelCallbackTable;
void * UserSharedInfoPtr;
ULONG SystemReserved;
ULONG AtlThunkSListPtr32;
void * ApiSetMap;
ULONG TlsExpansionCounter;
UCHAR Padding2[0x4];
_RTL_BITMAP * TlsBitmap;
ULONG TlsBitmapBits[0x2];
void * ReadOnlySharedMemoryBase;
void * SharedData;
void * * ReadOnlyStaticServerData;
void * AnsiCodePageData;
void * OemCodePageData;
void * UnicodeCaseTableData;
ULONG NumberOfProcessors;
ULONG NtGlobalFlag;
_LARGE_INTEGER CriticalSectionTimeout;
ULONGLONG HeapSegmentReserve;
ULONGLONG HeapSegmentCommit;
ULONGLONG HeapDeCommitTotalFreeThreshold;
ULONGLONG HeapDeCommitFreeBlockThreshold;
ULONG NumberOfHeaps;
ULONG MaximumNumberOfHeaps;
void * * ProcessHeaps;
void * GdiSharedHandleTable;
void * ProcessStarterHelper;
ULONG GdiDCAttributeList;
UCHAR Padding3[0x4];
_RTL_CRITICAL_SECTION * LoaderLock;
ULONG OSMajorVersion;
ULONG OSMinorVersion;
USHORT OSBuildNumber;
USHORT OSCSDVersion;
ULONG OSPlatformId;
ULONG ImageSubsystem;
ULONG ImageSubsystemMajorVersion;
ULONG ImageSubsystemMinorVersion;
UCHAR Padding4[0x4];
ULONGLONG ActiveProcessAffinityMask;
ULONG GdiHandleBuffer[0x3c];
void (* PostProcessInitRoutine)();
_RTL_BITMAP * TlsExpansionBitmap;
ULONG TlsExpansionBitmapBits[0x20];
ULONG SessionId;
UCHAR Padding5[0x4];
_ULARGE_INTEGER AppCompatFlags;
_ULARGE_INTEGER AppCompatFlagsUser;
void * pShimData;
void * AppCompatInfo;
_UNICODE_STRING CSDVersion;
_ACTIVATION_CONTEXT_DATA const * ActivationContextData;
_ASSEMBLY_STORAGE_MAP * ProcessAssemblyStorageMap;
_ACTIVATION_CONTEXT_DATA const * SystemDefaultActivationContextData;
_ASSEMBLY_STORAGE_MAP * SystemAssemblyStorageMap;
ULONGLONG MinimumStackCommit;
void * SparePointers[0x4];
ULONG SpareUlongs[0x3];
USHORT ActiveCodePage;
USHORT OemCodePage;
USHORT UseCaseMapping;
USHORT UnusedNlsField;
void * WerRegistrationData;
void * WerShipAssertPtr;
void * PatchLoaderData;
void * pImageHeaderHash;
ULONG TracingFlags;
ULONG HeapTracingEnabled : 01; // 0x00000001;
ULONG CritSecTracingEnabled : 01; // 0x00000002;
ULONG LibLoaderTracingEnabled : 01; // 0x00000004;
ULONG SpareTracingBits : 29; // 0xfffffff8;
UCHAR Padding6[0x4];
ULONGLONG CsrServerReadOnlySharedMemoryBase;
ULONGLONG TppWorkerpListLock;
_LIST_ENTRY TppWorkerpList;
void * WaitOnAddressHashTable[0x80];
void * TelemetryCoverageHeader;
ULONG CloudFileFlags;
ULONG CloudFileDiagFlags;
CHAR PlaceholderCompatibilityMode;
CHAR PlaceholderCompatibilityModeReserved[0x7];
_LEAP_SECOND_DATA * LeapSecondData;
ULONG LeapSecondFlags;
ULONG SixtySecondEnabled : 01; // 0x00000001;
ULONG Reserved : 31; // 0xfffffffe;
ULONG NtGlobalFlag2;
} PEB, *PPEB;
Then PEB::Ldr
member contains a pointer to PEB_LDR_DATA
struct:
typedef struct _PEB_LDR_DATA {
ULONG Length;
UCHAR Initialized;
void * SsHandle;
_LIST_ENTRY InLoadOrderModuleList;
_LIST_ENTRY InMemoryOrderModuleList;
_LIST_ENTRY InInitializationOrderModuleList;
void * EntryInProgress;
UCHAR ShutdownInProgress;
void * ShutdownThreadId;
} PEB_LDR_DATA, *PPEB_LDR_DATA;
And PEB_LDR_DATA::InMemoryOrderModuleList
points to a doubly-linked list of LDR_DATA_TABLE_ENTRY
structures for all loaded modules in the process:
typedef struct _LDR_DATA_TABLE_ENTRY {
_LIST_ENTRY InLoadOrderLinks;
_LIST_ENTRY InMemoryOrderLinks;
_LIST_ENTRY InInitializationOrderLinks;
void * DllBase;
void * EntryPoint;
ULONG SizeOfImage;
_UNICODE_STRING FullDllName;
_UNICODE_STRING BaseDllName;
UCHAR FlagGroup[0x4];
ULONG Flags;
ULONG PackagedBinary : 01; // 0x00000001;
ULONG MarkedForRemoval : 01; // 0x00000002;
ULONG ImageDll : 01; // 0x00000004;
ULONG LoadNotificationsSent : 01; // 0x00000008;
ULONG TelemetryEntryProcessed : 01; // 0x00000010;
ULONG ProcessStaticImport : 01; // 0x00000020;
ULONG InLegacyLists : 01; // 0x00000040;
ULONG InIndexes : 01; // 0x00000080;
ULONG ShimDll : 01; // 0x00000100;
ULONG InExceptionTable : 01; // 0x00000200;
ULONG ReservedFlags1 : 02; // 0x00000c00;
ULONG LoadInProgress : 01; // 0x00001000;
ULONG LoadConfigProcessed : 01; // 0x00002000;
ULONG EntryProcessed : 01; // 0x00004000;
ULONG ProtectDelayLoad : 01; // 0x00008000;
ULONG ReservedFlags3 : 02; // 0x00030000;
ULONG DontCallForThreads : 01; // 0x00040000;
ULONG ProcessAttachCalled : 01; // 0x00080000;
ULONG ProcessAttachFailed : 01; // 0x00100000;
ULONG CorDeferredValidate : 01; // 0x00200000;
ULONG CorImage : 01; // 0x00400000;
ULONG DontRelocate : 01; // 0x00800000;
ULONG CorILOnly : 01; // 0x01000000;
ULONG ChpeImage : 01; // 0x02000000;
ULONG ReservedFlags5 : 02; // 0x0c000000;
ULONG Redirected : 01; // 0x10000000;
ULONG ReservedFlags6 : 02; // 0x60000000;
ULONG CompatDatabaseProcessed : 01; // 0x80000000;
USHORT ObsoleteLoadCount;
USHORT TlsIndex;
_LIST_ENTRY HashLinks;
ULONG TimeDateStamp;
_ACTIVATION_CONTEXT * EntryPointActivationContext;
void * Lock;
_LDR_DDAG_NODE * DdagNode;
_LIST_ENTRY NodeModuleLink;
_LDRP_LOAD_CONTEXT * LoadContext;
void * ParentDllBase;
void * SwitchBackContext;
_RTL_BALANCED_NODE BaseAddressIndexNode;
_RTL_BALANCED_NODE MappingInfoIndexNode;
ULONGLONG OriginalBase;
_LARGE_INTEGER LoadTime;
ULONG BaseNameHashValue;
_LDR_DLL_LOAD_REASON LoadReason;
ULONG ImplicitPathOptions;
ULONG ReferenceCount;
ULONG DependentLoadFlags;
UCHAR SigningLevel;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;
Another important thing about the list of loaded modules is the internal order with which modules are loaded into it. The first module is always the process that we're
running in. The second one is ntdll.dll
, and the third one happens to be kernel32.dll
. So we can use this order for our advantage to quickly traverse through the list
to the third module, which will be kernel32.dll
.
One tricky thing to keep in mind is that each_LIST_ENTRY*
inLDR_DATA_TABLE_ENTRY::InMemoryOrderLinks
points to an offset ofInMemoryOrderLinks
member in the nextLDR_DATA_TABLE_ENTRY
structure.
Then finally, we can retrieve the base address of Kernel32
from the LDR_DATA_TABLE_ENTRY::DllBase
member.
Implementation In Assembly
The assembly function to implement what I outlined above is much less wordy. Also because it's a low level assembly language, we must write two versions of it, each for the corresponding bitness.
The C++ declaration for our assembly function should look like this:
64-bit Implementation
The 64-bit implementation is very simple. We can get away with literally using just one register, RAX
to do all the calculations and return the result in:
GetKernel32ModuleHandle PROC
mov rax, gs:[60h] ; PEB
mov rax, [rax + 18h] ; Ldr
mov rax, [rax + 20h] ; InMemoryOrderModuleList
mov rax, [rax] ; Skip 'this' module and get to ntdll
mov rax, [rax] ; Skip ntdll module and get to kernel32
mov rax, [rax + 20h] ; DllBase for kernel32 --- size_t offset = offsetof(LDR_DATA_TABLE_ENTRY, DllBase) - sizeof(LIST_ENTRY);
ret
GetKernel32ModuleHandle ENDP
Also note that this function should never fail and always return a valid result if called from within a non-native process.
32-bit Implementation
The 32-bit implementation is slightly more complex. We need to use the ASSUME
command to tell the MASM compiler not to get upset over our use of the
FS
segment register. And the rest is very similar to the 64-bit version, with the exception of struct offsets.
GetKernel32ModuleHandle PROC
ASSUME FS:NOTHING
mov eax, fs:[30h] ; PEB
mov eax, [eax + 0Ch] ; Ldr
mov eax, [eax + 14h] ; InMemoryOrderModuleList
mov eax, [eax] ; Skip 'this' module and get to ntdll
mov eax, [eax] ; Skip ntdll module and get to kernel32
mov eax, [eax + 10h] ; DllBase for kernel32 --- size_t offset = offsetof(LDR_DATA_TABLE_ENTRY, DllBase) - sizeof(LIST_ENTRY);
ret
GetKernel32ModuleHandle ENDP
And just as I said for the 64-bit version, this function should never fail either, and always return a valid result if called from a non-native process.
Shellcode: Get Address Of GetProcAddress Function
Now that we know the base address of the kernel32
module we can use it to traverse through its PE header
to retrieve the address of the GetProcAddress
function. This process is somewhat straightforward.
First get to the IMAGE_NT_HEADERS
,
then get to IMAGE_OPTIONAL_HEADER
.
In it, we need the first IMAGE_DATA_DIRECTORY
struct in the DataDirectory
array at an offset IMAGE_DIRECTORY_ENTRY_EXPORT
(or 0).
It will contain the export directory:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
VirtualAddress
will give us the mapped offset from the base to the IMAGE_EXPORT_DIRECTORY
that we will need to traverse:
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
DWORD AddressOfFunctions;
DWORD AddressOfNames;
DWORD AddressOfNameOrdinals;
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;
The NumberOfNames
will contain, unsurprisingly, the number of function names that are exported from the module, and AddressOfNames
will contain the mapped
offset to an array of DWORD
offsets to function names in memory. After that all we need to do is to traverse through that array, for the number of function names that we
determined earlier, and compare each name to our needed GetProcAddress
.
Once found, use the ordinal number of the function, obtained from an index of the function in the AddressOfNameOrdinals
array, to locate the function address offset
in another array, called AddressOfFunctions
. And that is it!
This all sounds way more complicated when you describe it in words. So let's put it in asm instead.
There's one caveat that we need to address here before settling on the function address that we found in the logic that I described above. Since about Windows 7, Microsoft introduced a new type of functions in the PE header that can be forwarded from one module to another. To distinguish such a function, its offset in theAddressOfFunctions
array will point outside of theIMAGE_EXPORT_DIRECTORY
bounds.
This case greatly complicates our simple example, and thus we won't cover it here. But, we will have to check for it and fail ifGetProcAddress
happens to be a forwarded function in the future.
So the C++ declaration for our assembly function will look like this:
As you can see, on the input we will have to pass the base address of the Kernel32
module that we obtained earlier from our call to GetKernel32ModuleHandle
.
And on the output, it will return a non-zero if it locates the address of the GetProcAddress
function, or NULL
if it fails.
64-bit Implementation
I need to point out that this is an unoptimized assembly code to improve its readability for the reader.
GetAddressOf_GetProcAddress PROC
; RCX = base address of kernel32.dll
test rcx, rcx
jz @nothing
mov eax, [rcx + 3Ch] ; e_lfanew
add rax, rcx ; rax = IMAGE_NT_HEADERS64
lea rax, [rax + 18h] ; rax = IMAGE_OPTIONAL_HEADER64 --- size_t offset = offsetof(IMAGE_NT_HEADERS64, OptionalHeader);
lea rax, [rax + 70h] ; rax = IMAGE_DATA_DIRECTORY --- size_t offset = offsetof(IMAGE_OPTIONAL_HEADER64, DataDirectory);
lea rax, [rax + 0h] ; rax = IMAGE_DATA_DIRECTORY for IMAGE_DIRECTORY_ENTRY_EXPORT
mov edx, [rax] ; rdx = VirtualAddress
lea rax, [rcx + rdx] ; rax = IMAGE_EXPORT_DIRECTORY
mov edx, [rax + 18h] ; rdx = NumberOfNames
mov r8d, [rax + 20h] ; r8 = AddressOfNames
lea r8, [rcx + r8]
mov r10, 41636f7250746547h ; GetProcA
mov r11, 0073736572646441h ; Address\0
test rdx, rdx
jz @nothing
@@1:
mov r9d, [r8]
lea r9, [rcx + r9] ; function name
cmp r10, [r9]
jnz @@2
cmp r11, [r9 + 7]
jnz @@2
; Found our function
neg rdx
mov r10d, [rax + 18h] ; r10 = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
lea rdx, [r10 + rdx] ; rdx = function index
mov r10d, [rax + 24h] ; r10 = AddressOfNameOrdinals
lea r10, [rcx + r10]
movzx rdx, word ptr [r10 + rdx * 2] ; rdx = index in the function table
mov r10d, [rax + 1Ch] ; r10 = AddressOfFunctions
lea r10, [rcx + r10]
mov r10d, [r10 + rdx * 4] ; r10 = offset of possible func addr
; Check for forwarded function
mov edx, [rax + 0] ; rdx = VirtualAddress
cmp r10, rdx
jb @nothing
mov r11d, [rax + 4] ; r11 = Size
add r11, rdx
cmp r10, r11
jae @nothing
lea rax, [rcx + r10] ; Got our func addr!
ret
@@2:
add r8, 4
dec rdx
jnz @@1
@nothing:
xor eax, eax
ret
GetAddressOf_GetProcAddress ENDP
32-bit Implementation
And a similar assembly code for the 32-bit implementation.
GetAddressOf_GetProcAddress PROC
ASSUME FS:NOTHING
;[esp + 04h] = base address of kernel32.dll
mov ecx, [esp + 04h]
push ebx
push esi
test ecx, ecx
jz @nothing
mov eax, [ecx + 3Ch] ; e_lfanew
lea eax, [eax + ecx + 78h] ; eax = IMAGE_DATA_DIRECTORY for IMAGE_DIRECTORY_ENTRY_EXPORT
mov edx, [eax] ; edx = VirtualAddress
lea eax, [ecx + edx] ; eax = IMAGE_EXPORT_DIRECTORY
mov edx, [eax + 18h] ; rdx = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
test edx, edx
jz @nothing
mov ebx, [eax + 20h] ; ebx = AddressOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfNames);
lea ebx, [ecx + ebx]
@@1:
mov esi, [ebx]
lea esi, [ecx + esi] ; function name
cmp dword ptr [esi], 50746547h ; GetP
jnz @@2
cmp dword ptr [esi + 4], 41636f72h ; rocA
jnz @@2
cmp dword ptr [esi + 8], 65726464h ; ddre
jnz @@2
cmp dword ptr [esi + 11], 00737365h ; ress\0
jnz @@2
; Found our function
neg edx
mov esi, [eax + 18h] ; esi = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
lea edx, [esi + edx] ; edx = function index
mov esi, [eax + 24h] ; r10 = AddressOfNameOrdinals ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfNameOrdinals);
lea esi, [ecx + esi]
movzx edx, word ptr [esi + edx * 2] ; edx = index in the function table
mov esi, [eax + 1Ch] ; esi = AddressOfFunctions ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfFunctions);
lea esi, [ecx + esi]
mov esi, [esi + edx * 4] ; esi = offset of possible func addr
; Check for forwarded function
mov edx, [eax] ; edx = VirtualAddress ---- size_t offset = offsetof(IMAGE_DATA_DIRECTORY, VirtualAddress);
cmp esi, edx
jb @nothing
mov ebx, [eax + 4] ; ebx = Size ---- size_t offset = offsetof(IMAGE_DATA_DIRECTORY, Size);
add ebx, edx
cmp esi, ebx
jae @nothing
lea eax, [ecx + esi] ; Got our func addr!
pop esi
pop ebx
ret
@@2:
add ebx, 4
dec edx
jnz @@1
@nothing:
xor eax, eax
pop esi
pop ebx
ret
GetAddressOf_GetProcAddress ENDP
Conclusion
As a final word, I'm assuming that you can see that it's pretty easy to combine the two functions that I showed above into one, if all you need to get is the address of the
GetProcAddress
function.
Otherwise the steps for obtaining an address to pretty much any API in the system from a shellcode could be as follows:
- Call
GetKernel32ModuleHandle
and remember the base address that it returns. - Call
GetAddressOf_GetProcAddress
on the base address that you got above, to get the address ofGetProcAddress
. - Call the actual
GetProcAddress
, using the pointer that you got above, on the base address from the first step to obtain the address ofLoadLibrary
function. - Now you have dynamically resolved addresses of
LoadLibrary
andGetProcAddress
functions, that you can use to resolve an address of any other API in the system.
For even more compactness you may also inline both functions into your shellcode.