
This article contains undocumented features that are not supported by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.
Preface
If you put some Windows function (or WinAPI) into your C or C++ code:
HANDLE hFile = ::CreateFileW(pFilePath, GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
::CloseHandle(hFile);
}When you compile your program, the CreateFile and
CloseHandle functions from the Kernel32.dll library,
will be linked to your program using the load-time linking.
At times though, you may not want, or may not be able to rely on load-time linking. Thus there's a way to do it
dynamically, or during run-time, using the
LoadLibrary and
GetProcAddress APIs:
HMODULE hKernel32 = ::LoadLibrary(L"Kernel32.dll");
if(hKernel32)
{
HANDLE (WINAPI *pfn_CreateFileW)(
LPCWSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile
);
BOOL (WINAPI *pfn_CloseHandle)(
HANDLE hObject
);
(FARPROC&)pfn_CreateFileW = ::GetProcAddress(hKernel32, "CreateFileW");
(FARPROC&)pfn_CloseHandle = ::GetProcAddress(hKernel32, "CloseHandle");
if(pfn_CreateFileW &&
pfn_CloseHandle)
{
HANDLE hFile = pfn_CreateFileW(pFilePath, GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
pfn_CloseHandle(hFile);
}
}
::FreeLibrary(hKernel32);
}Notice that this technique, that resolved the addresses of CreateFileW and CloseHandle functions dynamically, can be hypothetically used to resolve any other
exported functions. But the original two APIs - LoadLibrary and GetProcAddress - still had to be resolved during the load-time linking,
or using the first method.
But what if you're writing a shellcode and don't have an option of doing any load-time linking of the
Windows APIs? Can you still get a pointer to the GetProcAddress function?
The answer is yes. But for that we'll need to dive into the depths of Windows internals and use some assembly language to implement it.
General Technique
The general technique of obtaining a pointer to any Windows function is two-fold:
- Obtain the base address of the module (or `.dll`) that the function resides it. This is the address where the module was mapped into the process.
- Obtain the offset within that module for the function you need.
Note that the first step mentions "address where the module was mapped into". This means that the module in question has to be already loaded into the process (or "mapped".) There are three ways to ensure this:
- Call
LoadLibrary*class of functions. - Use a module that is guaranteed to be loaded into the process. There's
ntdll.dllthat is guaranteed to be loaded into any user-mode process, andkernel32.dllthat is conditionally guaranteed. By that I mean, if the process is not a native system process, it is guaranteed to havekernel32.dllmodule mapped. - Use any module that is linked for load-time linking into the process, provided that such module is not marked for delay-loading.
To determine if a binary executable was built as a native system module, checkIMAGE_OPTIONAL_HEADER::Subsystemin its PE header. A native image will be marked asIMAGE_SUBSYSTEM_NATIVE(or 1).
Alternatively you can use theWinAPI Searchtool for that as well:
WinAPI Search utility, displaying "Show Info" window for a search result item within the IMAGE_SUBSYSTEM_NATIVE module.
Techniques For The Shellcode
In a shellcode, originally our options are quite limited. Since by definition, our shellcode begins executing from any arbitrary location in the process, long after such process had finished loading and relocating, we can't rely on the load-time linking in our code.
Additionally, because we may be dealing with ASLR, we don't know for sure where all the loaded modules are in the process that we injected into.
And we can't search in memory for the base addresses of the loaded modules, either. We don't have an exception handler set up for our shellcode. Any read from an unmapped address will crash the process with our shellcode. We could set one up dynamically, but for that we need to interact with the operating system, or in other words, be able to call system APIs. But to do that, we need to know where the modules are. Do you see our dilemma here?
Thus, before we can do anything at all, we need to accomplish two goals:
- Be able to find the base address of one of the guaranteed-to-be-mapped modules.
- Be able to resolve an API in that mapped module by the module's base address and by the API name.
If we can somehow get to do both of those tasks, we can then obtain a pointer to the GetProcAddress function and use it further to resolve any other system API we need.
But how do you get a pointer to GetProcAddress? You can't just call:
Instead we have to delve into some Windows internals here, and approach it step-by-step.
Let's start from learning how to get the base address of kernel32.dll. Luckily, it's one of those DLLs that is guaranteed to be
loaded into most processes.
Shellcode: Get Kernel32 Base Address
Luckily for us, the way any user-mode thread runs in Windows, its special segment register (GS for 64-bit processes, and FS for 32-bit) points to an internal structure,
called TEB (or "Thread Environment Block"). It is quite poorly documented
by Microsoft, so often enough you can get a much better information about it by just Googling it. Or by extracting it yourself from a .pdb file for one of the system
native modules, such as ntoskrnl.exe or ntdll.dll:
typedef struct _TEB {
_NT_TIB NtTib;
void * EnvironmentPointer;
_CLIENT_ID ClientId;
void * ActiveRpcHandle;
void * ThreadLocalStoragePointer;
_PEB * ProcessEnvironmentBlock;
ULONG LastErrorValue;
ULONG CountOfOwnedCriticalSections;
void * CsrClientThread;
void * Win32ThreadInfo;
ULONG User32Reserved[0x1a];
ULONG UserReserved[0x5];
void * WOW32Reserved;
ULONG CurrentLocale;
ULONG FpSoftwareStatusRegister;
void * ReservedForDebuggerInstrumentation[0x10];
void * SystemReserved1[0x1e];
CHAR PlaceholderCompatibilityMode;
UCHAR PlaceholderHydrationAlwaysExplicit;
CHAR PlaceholderReserved[0xa];
ULONG ProxiedProcessId;
_ACTIVATION_CONTEXT_STACK _ActivationStack;
UCHAR WorkingOnBehalfTicket[0x8];
LONG ExceptionCode;
UCHAR Padding0[0x4];
_ACTIVATION_CONTEXT_STACK * ActivationContextStackPointer;
ULONGLONG InstrumentationCallbackSp;
ULONGLONG InstrumentationCallbackPreviousPc;
ULONGLONG InstrumentationCallbackPreviousSp;
ULONG TxFsContext;
UCHAR InstrumentationCallbackDisabled;
UCHAR UnalignedLoadStoreExceptions;
UCHAR Padding1[0x2];
_GDI_TEB_BATCH GdiTebBatch;
_CLIENT_ID RealClientId;
void * GdiCachedProcessHandle;
ULONG GdiClientPID;
ULONG GdiClientTID;
void * GdiThreadLocalInfo;
ULONGLONG Win32ClientInfo[0x3e];
void * glDispatchTable[0xe9];
ULONGLONG glReserved1[0x1d];
void * glReserved2;
void * glSectionInfo;
void * glSection;
void * glTable;
void * glCurrentRC;
void * glContext;
ULONG LastStatusValue;
UCHAR Padding2[0x4];
_UNICODE_STRING StaticUnicodeString;
WCHAR StaticUnicodeBuffer[0x105];
UCHAR Padding3[0x6];
void * DeallocationStack;
void * TlsSlots[0x40];
_LIST_ENTRY TlsLinks;
void * Vdm;
void * ReservedForNtRpc;
void * DbgSsReserved[0x2];
ULONG HardErrorMode;
UCHAR Padding4[0x4];
void * Instrumentation[0xb];
_GUID ActivityId;
void * SubProcessTag;
void * PerflibData;
void * EtwTraceData;
void * WinSockData;
ULONG GdiBatchCount;
_PROCESSOR_NUMBER CurrentIdealProcessor;
ULONG IdealProcessorValue;
UCHAR ReservedPad0;
UCHAR ReservedPad1;
UCHAR ReservedPad2;
UCHAR IdealProcessor;
ULONG GuaranteedStackBytes;
UCHAR Padding5[0x4];
void * ReservedForPerf;
void * ReservedForOle;
ULONG WaitingOnLoaderLock;
UCHAR Padding6[0x4];
void * SavedPriorityState;
ULONGLONG ReservedForCodeCoverage;
void * ThreadPoolData;
void * * TlsExpansionSlots;
void * DeallocationBStore;
void * BStoreLimit;
ULONG MuiGeneration;
ULONG IsImpersonating;
void * NlsCache;
void * pShimData;
ULONG HeapData;
UCHAR Padding7[0x4];
void * CurrentTransactionHandle;
_TEB_ACTIVE_FRAME * ActiveFrame;
void * FlsData;
void * PreferredLanguages;
void * UserPrefLanguages;
void * MergedPrefLanguages;
ULONG MuiImpersonation;
USHORT volatile CrossTebFlags;
USHORT SpareCrossTebBits : 16; // 0xffff;
USHORT SameTebFlags;
USHORT SafeThunkCall : 01; // 0x0001;
USHORT InDebugPrint : 01; // 0x0002;
USHORT HasFiberData : 01; // 0x0004;
USHORT SkipThreadAttach : 01; // 0x0008;
USHORT WerInShipAssertCode : 01; // 0x0010;
USHORT RanProcessInit : 01; // 0x0020;
USHORT ClonedThread : 01; // 0x0040;
USHORT SuppressDebugMsg : 01; // 0x0080;
USHORT DisableUserStackWalk : 01; // 0x0100;
USHORT RtlExceptionAttached : 01; // 0x0200;
USHORT InitialThread : 01; // 0x0400;
USHORT SessionAware : 01; // 0x0800;
USHORT LoadOwner : 01; // 0x1000;
USHORT LoaderWorker : 01; // 0x2000;
USHORT SkipLoaderInit : 01; // 0x4000;
USHORT SpareSameTebBits : 01; // 0x8000;
void * TxnScopeEnterCallback;
void * TxnScopeExitCallback;
void * TxnScopeContext;
ULONG LockCount;
LONG WowTebOffset;
void * ResourceRetValue;
void * ReservedForWdf;
ULONGLONG ReservedForCrt;
_GUID EffectiveContainerId;
ULONGLONG LastSleepCounter;
ULONG DelayBackoff;
UCHAR Padding8[0x4];
} TEB, *PTEB;The important thing for us is the ProcessEnvironmentBlock member of the TEB structure that points to another undocumented structure,
called PEB (or "Process Environment Block"). It is another badly documented
system component:
typedef struct _PEB {
UCHAR InheritedAddressSpace;
UCHAR ReadImageFileExecOptions;
UCHAR BeingDebugged;
UCHAR BitField;
UCHAR ImageUsesLargePages : 01; // 0x01;
UCHAR IsProtectedProcess : 01; // 0x02;
UCHAR IsImageDynamicallyRelocated : 01; // 0x04;
UCHAR SkipPatchingUser32Forwarders : 01; // 0x08;
UCHAR IsPackagedProcess : 01; // 0x10;
UCHAR IsAppContainer : 01; // 0x20;
UCHAR IsProtectedProcessLight : 01; // 0x40;
UCHAR IsLongPathAwareProcess : 01; // 0x80;
UCHAR Padding0[0x4];
void * Mutant;
void * ImageBaseAddress;
_PEB_LDR_DATA * Ldr;
_RTL_USER_PROCESS_PARAMETERS * ProcessParameters;
void * SubSystemData;
void * ProcessHeap;
_RTL_CRITICAL_SECTION * FastPebLock;
_SLIST_HEADER * volatile AtlThunkSListPtr;
void * IFEOKey;
ULONG CrossProcessFlags;
ULONG ProcessInJob : 01; // 0x00000001;
ULONG ProcessInitializing : 01; // 0x00000002;
ULONG ProcessUsingVEH : 01; // 0x00000004;
ULONG ProcessUsingVCH : 01; // 0x00000008;
ULONG ProcessUsingFTH : 01; // 0x00000010;
ULONG ProcessPreviouslyThrottled : 01; // 0x00000020;
ULONG ProcessCurrentlyThrottled : 01; // 0x00000040;
ULONG ProcessImagesHotPatched : 01; // 0x00000080;
ULONG ReservedBits0 : 24; // 0xffffff00;
UCHAR Padding1[0x4];
void * KernelCallbackTable;
void * UserSharedInfoPtr;
ULONG SystemReserved;
ULONG AtlThunkSListPtr32;
void * ApiSetMap;
ULONG TlsExpansionCounter;
UCHAR Padding2[0x4];
_RTL_BITMAP * TlsBitmap;
ULONG TlsBitmapBits[0x2];
void * ReadOnlySharedMemoryBase;
void * SharedData;
void * * ReadOnlyStaticServerData;
void * AnsiCodePageData;
void * OemCodePageData;
void * UnicodeCaseTableData;
ULONG NumberOfProcessors;
ULONG NtGlobalFlag;
_LARGE_INTEGER CriticalSectionTimeout;
ULONGLONG HeapSegmentReserve;
ULONGLONG HeapSegmentCommit;
ULONGLONG HeapDeCommitTotalFreeThreshold;
ULONGLONG HeapDeCommitFreeBlockThreshold;
ULONG NumberOfHeaps;
ULONG MaximumNumberOfHeaps;
void * * ProcessHeaps;
void * GdiSharedHandleTable;
void * ProcessStarterHelper;
ULONG GdiDCAttributeList;
UCHAR Padding3[0x4];
_RTL_CRITICAL_SECTION * LoaderLock;
ULONG OSMajorVersion;
ULONG OSMinorVersion;
USHORT OSBuildNumber;
USHORT OSCSDVersion;
ULONG OSPlatformId;
ULONG ImageSubsystem;
ULONG ImageSubsystemMajorVersion;
ULONG ImageSubsystemMinorVersion;
UCHAR Padding4[0x4];
ULONGLONG ActiveProcessAffinityMask;
ULONG GdiHandleBuffer[0x3c];
void (* PostProcessInitRoutine)();
_RTL_BITMAP * TlsExpansionBitmap;
ULONG TlsExpansionBitmapBits[0x20];
ULONG SessionId;
UCHAR Padding5[0x4];
_ULARGE_INTEGER AppCompatFlags;
_ULARGE_INTEGER AppCompatFlagsUser;
void * pShimData;
void * AppCompatInfo;
_UNICODE_STRING CSDVersion;
_ACTIVATION_CONTEXT_DATA const * ActivationContextData;
_ASSEMBLY_STORAGE_MAP * ProcessAssemblyStorageMap;
_ACTIVATION_CONTEXT_DATA const * SystemDefaultActivationContextData;
_ASSEMBLY_STORAGE_MAP * SystemAssemblyStorageMap;
ULONGLONG MinimumStackCommit;
void * SparePointers[0x4];
ULONG SpareUlongs[0x3];
USHORT ActiveCodePage;
USHORT OemCodePage;
USHORT UseCaseMapping;
USHORT UnusedNlsField;
void * WerRegistrationData;
void * WerShipAssertPtr;
void * PatchLoaderData;
void * pImageHeaderHash;
ULONG TracingFlags;
ULONG HeapTracingEnabled : 01; // 0x00000001;
ULONG CritSecTracingEnabled : 01; // 0x00000002;
ULONG LibLoaderTracingEnabled : 01; // 0x00000004;
ULONG SpareTracingBits : 29; // 0xfffffff8;
UCHAR Padding6[0x4];
ULONGLONG CsrServerReadOnlySharedMemoryBase;
ULONGLONG TppWorkerpListLock;
_LIST_ENTRY TppWorkerpList;
void * WaitOnAddressHashTable[0x80];
void * TelemetryCoverageHeader;
ULONG CloudFileFlags;
ULONG CloudFileDiagFlags;
CHAR PlaceholderCompatibilityMode;
CHAR PlaceholderCompatibilityModeReserved[0x7];
_LEAP_SECOND_DATA * LeapSecondData;
ULONG LeapSecondFlags;
ULONG SixtySecondEnabled : 01; // 0x00000001;
ULONG Reserved : 31; // 0xfffffffe;
ULONG NtGlobalFlag2;
} PEB, *PPEB;Then PEB::Ldr member contains a pointer to PEB_LDR_DATA struct:
typedef struct _PEB_LDR_DATA {
ULONG Length;
UCHAR Initialized;
void * SsHandle;
_LIST_ENTRY InLoadOrderModuleList;
_LIST_ENTRY InMemoryOrderModuleList;
_LIST_ENTRY InInitializationOrderModuleList;
void * EntryInProgress;
UCHAR ShutdownInProgress;
void * ShutdownThreadId;
} PEB_LDR_DATA, *PPEB_LDR_DATA;And PEB_LDR_DATA::InMemoryOrderModuleList points to a doubly-linked list of LDR_DATA_TABLE_ENTRY structures for all loaded modules in the process:
typedef struct _LDR_DATA_TABLE_ENTRY {
_LIST_ENTRY InLoadOrderLinks;
_LIST_ENTRY InMemoryOrderLinks;
_LIST_ENTRY InInitializationOrderLinks;
void * DllBase;
void * EntryPoint;
ULONG SizeOfImage;
_UNICODE_STRING FullDllName;
_UNICODE_STRING BaseDllName;
UCHAR FlagGroup[0x4];
ULONG Flags;
ULONG PackagedBinary : 01; // 0x00000001;
ULONG MarkedForRemoval : 01; // 0x00000002;
ULONG ImageDll : 01; // 0x00000004;
ULONG LoadNotificationsSent : 01; // 0x00000008;
ULONG TelemetryEntryProcessed : 01; // 0x00000010;
ULONG ProcessStaticImport : 01; // 0x00000020;
ULONG InLegacyLists : 01; // 0x00000040;
ULONG InIndexes : 01; // 0x00000080;
ULONG ShimDll : 01; // 0x00000100;
ULONG InExceptionTable : 01; // 0x00000200;
ULONG ReservedFlags1 : 02; // 0x00000c00;
ULONG LoadInProgress : 01; // 0x00001000;
ULONG LoadConfigProcessed : 01; // 0x00002000;
ULONG EntryProcessed : 01; // 0x00004000;
ULONG ProtectDelayLoad : 01; // 0x00008000;
ULONG ReservedFlags3 : 02; // 0x00030000;
ULONG DontCallForThreads : 01; // 0x00040000;
ULONG ProcessAttachCalled : 01; // 0x00080000;
ULONG ProcessAttachFailed : 01; // 0x00100000;
ULONG CorDeferredValidate : 01; // 0x00200000;
ULONG CorImage : 01; // 0x00400000;
ULONG DontRelocate : 01; // 0x00800000;
ULONG CorILOnly : 01; // 0x01000000;
ULONG ChpeImage : 01; // 0x02000000;
ULONG ReservedFlags5 : 02; // 0x0c000000;
ULONG Redirected : 01; // 0x10000000;
ULONG ReservedFlags6 : 02; // 0x60000000;
ULONG CompatDatabaseProcessed : 01; // 0x80000000;
USHORT ObsoleteLoadCount;
USHORT TlsIndex;
_LIST_ENTRY HashLinks;
ULONG TimeDateStamp;
_ACTIVATION_CONTEXT * EntryPointActivationContext;
void * Lock;
_LDR_DDAG_NODE * DdagNode;
_LIST_ENTRY NodeModuleLink;
_LDRP_LOAD_CONTEXT * LoadContext;
void * ParentDllBase;
void * SwitchBackContext;
_RTL_BALANCED_NODE BaseAddressIndexNode;
_RTL_BALANCED_NODE MappingInfoIndexNode;
ULONGLONG OriginalBase;
_LARGE_INTEGER LoadTime;
ULONG BaseNameHashValue;
_LDR_DLL_LOAD_REASON LoadReason;
ULONG ImplicitPathOptions;
ULONG ReferenceCount;
ULONG DependentLoadFlags;
UCHAR SigningLevel;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;Another important thing about the list of loaded modules is the internal order with which modules are loaded into it. The first module is always the process that we're
running in. The second one is ntdll.dll, and the third one happens to be kernel32.dll. So we can use this order for our advantage to quickly traverse through the list
to the third module, which will be kernel32.dll.
One tricky thing to keep in mind is that each_LIST_ENTRY*inLDR_DATA_TABLE_ENTRY::InMemoryOrderLinkspoints to an offset ofInMemoryOrderLinksmember in the nextLDR_DATA_TABLE_ENTRYstructure.
Then finally, we can retrieve the base address of Kernel32 from the LDR_DATA_TABLE_ENTRY::DllBase member.
Implementation In Assembly
The assembly function to implement what I outlined above is much less wordy. Also because it's a low level assembly language, we must write two versions of it, each for the corresponding bitness.
The C++ declaration for our assembly function should look like this:
64-bit Implementation
The 64-bit implementation is very simple. We can get away with literally using just one register, RAX to do all the calculations and return the result in:
GetKernel32ModuleHandle PROC
mov rax, gs:[60h] ; PEB
mov rax, [rax + 18h] ; Ldr
mov rax, [rax + 20h] ; InMemoryOrderModuleList
mov rax, [rax] ; Skip 'this' module and get to ntdll
mov rax, [rax] ; Skip ntdll module and get to kernel32
mov rax, [rax + 20h] ; DllBase for kernel32 --- size_t offset = offsetof(LDR_DATA_TABLE_ENTRY, DllBase) - sizeof(LIST_ENTRY);
ret
GetKernel32ModuleHandle ENDPAlso note that this function should never fail and always return a valid result if called from within a non-native process.
32-bit Implementation
The 32-bit implementation is slightly more complex. We need to use the ASSUME
command to tell the MASM compiler not to get upset over our use of the
FS segment register. And the rest is very similar to the 64-bit version, with the exception of struct offsets.
GetKernel32ModuleHandle PROC
ASSUME FS:NOTHING
mov eax, fs:[30h] ; PEB
mov eax, [eax + 0Ch] ; Ldr
mov eax, [eax + 14h] ; InMemoryOrderModuleList
mov eax, [eax] ; Skip 'this' module and get to ntdll
mov eax, [eax] ; Skip ntdll module and get to kernel32
mov eax, [eax + 10h] ; DllBase for kernel32 --- size_t offset = offsetof(LDR_DATA_TABLE_ENTRY, DllBase) - sizeof(LIST_ENTRY);
ret
GetKernel32ModuleHandle ENDPAnd just as I said for the 64-bit version, this function should never fail either, and always return a valid result if called from a non-native process.
Shellcode: Get Address Of GetProcAddress Function
Now that we know the base address of the kernel32 module we can use it to traverse through its PE header
to retrieve the address of the GetProcAddress function. This process is somewhat straightforward.
First get to the IMAGE_NT_HEADERS,
then get to IMAGE_OPTIONAL_HEADER.
In it, we need the first IMAGE_DATA_DIRECTORY struct in the DataDirectory array at an offset IMAGE_DIRECTORY_ENTRY_EXPORT (or 0).
It will contain the export directory:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;VirtualAddress will give us the mapped offset from the base to the IMAGE_EXPORT_DIRECTORY that we will need to traverse:
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
DWORD AddressOfFunctions;
DWORD AddressOfNames;
DWORD AddressOfNameOrdinals;
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;The NumberOfNames will contain, unsurprisingly, the number of function names that are exported from the module, and AddressOfNames will contain the mapped
offset to an array of DWORD offsets to function names in memory. After that all we need to do is to traverse through that array, for the number of function names that we
determined earlier, and compare each name to our needed GetProcAddress.
Once found, use the ordinal number of the function, obtained from an index of the function in the AddressOfNameOrdinals array, to locate the function address offset
in another array, called AddressOfFunctions. And that is it!
This all sounds way more complicated when you describe it in words. So let's put it in asm instead.
There's one caveat that we need to address here before settling on the function address that we found in the logic that I described above. Since about Windows 7, Microsoft introduced a new type of functions in the PE header that can be forwarded from one module to another. To distinguish such a function, its offset in theAddressOfFunctionsarray will point outside of theIMAGE_EXPORT_DIRECTORYbounds.
This case greatly complicates our simple example, and thus we won't cover it here. But, we will have to check for it and fail ifGetProcAddresshappens to be a forwarded function in the future.
So the C++ declaration for our assembly function will look like this:
As you can see, on the input we will have to pass the base address of the Kernel32 module that we obtained earlier from our call to GetKernel32ModuleHandle.
And on the output, it will return a non-zero if it locates the address of the GetProcAddress function, or NULL if it fails.
64-bit Implementation
I need to point out that this is an unoptimized assembly code to improve its readability for the reader.
GetAddressOf_GetProcAddress PROC
; RCX = base address of kernel32.dll
test rcx, rcx
jz @nothing
mov eax, [rcx + 3Ch] ; e_lfanew
add rax, rcx ; rax = IMAGE_NT_HEADERS64
lea rax, [rax + 18h] ; rax = IMAGE_OPTIONAL_HEADER64 --- size_t offset = offsetof(IMAGE_NT_HEADERS64, OptionalHeader);
lea rax, [rax + 70h] ; rax = IMAGE_DATA_DIRECTORY --- size_t offset = offsetof(IMAGE_OPTIONAL_HEADER64, DataDirectory);
lea rax, [rax + 0h] ; rax = IMAGE_DATA_DIRECTORY for IMAGE_DIRECTORY_ENTRY_EXPORT
mov edx, [rax] ; rdx = VirtualAddress
lea rax, [rcx + rdx] ; rax = IMAGE_EXPORT_DIRECTORY
mov edx, [rax + 18h] ; rdx = NumberOfNames
mov r8d, [rax + 20h] ; r8 = AddressOfNames
lea r8, [rcx + r8]
mov r10, 41636f7250746547h ; GetProcA
mov r11, 0073736572646441h ; Address\0
test rdx, rdx
jz @nothing
@@1:
mov r9d, [r8]
lea r9, [rcx + r9] ; function name
cmp r10, [r9]
jnz @@2
cmp r11, [r9 + 7]
jnz @@2
; Found our function
neg rdx
mov r10d, [rax + 18h] ; r10 = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
lea rdx, [r10 + rdx] ; rdx = function index
mov r10d, [rax + 24h] ; r10 = AddressOfNameOrdinals
lea r10, [rcx + r10]
movzx rdx, word ptr [r10 + rdx * 2] ; rdx = index in the function table
mov r10d, [rax + 1Ch] ; r10 = AddressOfFunctions
lea r10, [rcx + r10]
mov r10d, [r10 + rdx * 4] ; r10 = offset of possible func addr
; Check for forwarded function
mov edx, [rax + 0] ; rdx = VirtualAddress
cmp r10, rdx
jb @nothing
mov r11d, [rax + 4] ; r11 = Size
add r11, rdx
cmp r10, r11
jae @nothing
lea rax, [rcx + r10] ; Got our func addr!
ret
@@2:
add r8, 4
dec rdx
jnz @@1
@nothing:
xor eax, eax
ret
GetAddressOf_GetProcAddress ENDP32-bit Implementation
And a similar assembly code for the 32-bit implementation.
GetAddressOf_GetProcAddress PROC
ASSUME FS:NOTHING
;[esp + 04h] = base address of kernel32.dll
mov ecx, [esp + 04h]
push ebx
push esi
test ecx, ecx
jz @nothing
mov eax, [ecx + 3Ch] ; e_lfanew
lea eax, [eax + ecx + 78h] ; eax = IMAGE_DATA_DIRECTORY for IMAGE_DIRECTORY_ENTRY_EXPORT
mov edx, [eax] ; edx = VirtualAddress
lea eax, [ecx + edx] ; eax = IMAGE_EXPORT_DIRECTORY
mov edx, [eax + 18h] ; rdx = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
test edx, edx
jz @nothing
mov ebx, [eax + 20h] ; ebx = AddressOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfNames);
lea ebx, [ecx + ebx]
@@1:
mov esi, [ebx]
lea esi, [ecx + esi] ; function name
cmp dword ptr [esi], 50746547h ; GetP
jnz @@2
cmp dword ptr [esi + 4], 41636f72h ; rocA
jnz @@2
cmp dword ptr [esi + 8], 65726464h ; ddre
jnz @@2
cmp dword ptr [esi + 11], 00737365h ; ress\0
jnz @@2
; Found our function
neg edx
mov esi, [eax + 18h] ; esi = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
lea edx, [esi + edx] ; edx = function index
mov esi, [eax + 24h] ; r10 = AddressOfNameOrdinals ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfNameOrdinals);
lea esi, [ecx + esi]
movzx edx, word ptr [esi + edx * 2] ; edx = index in the function table
mov esi, [eax + 1Ch] ; esi = AddressOfFunctions ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfFunctions);
lea esi, [ecx + esi]
mov esi, [esi + edx * 4] ; esi = offset of possible func addr
; Check for forwarded function
mov edx, [eax] ; edx = VirtualAddress ---- size_t offset = offsetof(IMAGE_DATA_DIRECTORY, VirtualAddress);
cmp esi, edx
jb @nothing
mov ebx, [eax + 4] ; ebx = Size ---- size_t offset = offsetof(IMAGE_DATA_DIRECTORY, Size);
add ebx, edx
cmp esi, ebx
jae @nothing
lea eax, [ecx + esi] ; Got our func addr!
pop esi
pop ebx
ret
@@2:
add ebx, 4
dec edx
jnz @@1
@nothing:
xor eax, eax
pop esi
pop ebx
ret
GetAddressOf_GetProcAddress ENDPConclusion
As a final word, I'm assuming that you can see that it's pretty easy to combine the two functions that I showed above into one, if all you need to get is the address of the
GetProcAddress function.
Otherwise the steps for obtaining an address to pretty much any API in the system from a shellcode could be as follows:
- Call
GetKernel32ModuleHandleand remember the base address that it returns. - Call
GetAddressOf_GetProcAddresson the base address that you got above, to get the address ofGetProcAddress. - Call the actual
GetProcAddress, using the pointer that you got above, on the base address from the first step to obtain the address ofLoadLibraryfunction. - Now you have dynamically resolved addresses of
LoadLibraryandGetProcAddressfunctions, that you can use to resolve an address of any other API in the system.
For even more compactness you may also inline both functions into your shellcode.


