Blog Post

Deep Dive Into Assembly Language - Windows Shellcode - GetProcAddress

How to implement GetProcAddress in shellcode using x86-64 and x86 assembly language.

Deep Dive Into Assembly Language - Windows Shellcode - GetProcAddress - How to implement GetProcAddress in shellcode using x86-64 and x86 assembly language.
This article contains undocumented features that are not supported by the original manufacturer. By following advice in this article, you're doing so at your own risk. The methods presented in this article may rely on internal implementation and may not work in the future.

Preface

If you put some Windows function (or WinAPI) into your C or C++ code:

C++[Copy]
HANDLE hFile = ::CreateFileW(pFilePath, GENERIC_READ | GENERIC_WRITE, 
					FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile != INVALID_HANDLE_VALUE)
{
	::CloseHandle(hFile);
}

When you compile your program, the CreateFile and CloseHandle functions from the Kernel32.dll library, will be linked to your program using the load-time linking.

At times though, you may not want, or may not be able to rely on load-time linking. Thus there's a way to do it dynamically, or during run-time, using the LoadLibrary and GetProcAddress APIs:

C++[Copy]
HMODULE hKernel32 = ::LoadLibrary(L"Kernel32.dll");
if(hKernel32)
{
	HANDLE (WINAPI *pfn_CreateFileW)(
		LPCWSTR               lpFileName,
		DWORD                 dwDesiredAccess,
		DWORD                 dwShareMode,
		LPSECURITY_ATTRIBUTES lpSecurityAttributes,
		DWORD                 dwCreationDisposition,
		DWORD                 dwFlagsAndAttributes,
		HANDLE                hTemplateFile
	);
	BOOL (WINAPI *pfn_CloseHandle)(
		HANDLE hObject
	);

	(FARPROC&)pfn_CreateFileW = ::GetProcAddress(hKernel32, "CreateFileW");
	(FARPROC&)pfn_CloseHandle = ::GetProcAddress(hKernel32, "CloseHandle");

	if(pfn_CreateFileW &&
		pfn_CloseHandle)
	{
		HANDLE hFile = pfn_CreateFileW(pFilePath, GENERIC_READ | GENERIC_WRITE, 
						FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
		if(hFile != INVALID_HANDLE_VALUE)
		{
			pfn_CloseHandle(hFile);
		}
	}

	::FreeLibrary(hKernel32);
}

Notice that this technique, that resolved the addresses of CreateFileW and CloseHandle functions dynamically, can be hypothetically used to resolve any other exported functions. But the original two APIs - LoadLibrary and GetProcAddress - still had to be resolved during the load-time linking, or using the first method.

But what if you're writing a shellcode and don't have an option of doing any load-time linking of the Windows APIs? Can you still get a pointer to the GetProcAddress function?

The answer is yes. But for that we'll need to dive into the depths of Windows internals and use some assembly language to implement it.

General Technique

The general technique of obtaining a pointer to any Windows function is two-fold:

  1. Obtain the base address of the module (or `.dll`) that the function resides it. This is the address where the module was mapped into the process.
  2. Obtain the offset within that module for the function you need.

Note that the first step mentions "address where the module was mapped into". This means that the module in question has to be already loaded into the process (or "mapped".) There are three ways to ensure this:

  • Call LoadLibrary* class of functions.
  • Use a module that is guaranteed to be loaded into the process. There's ntdll.dll that is guaranteed to be loaded into any user-mode process, and kernel32.dll that is conditionally guaranteed. By that I mean, if the process is not a native system process, it is guaranteed to have kernel32.dll module mapped.
  • Use any module that is linked for load-time linking into the process, provided that such module is not marked for delay-loading.
To determine if a binary executable was built as a native system module, check IMAGE_OPTIONAL_HEADER::Subsystem in its PE header. A native image will be marked as IMAGE_SUBSYSTEM_NATIVE (or 1).

Alternatively you can use the WinAPI Search tool for that as well:
WinAPI Search Tool
WinAPI Search utility, displaying "Show Info" window for a search result item within the IMAGE_SUBSYSTEM_NATIVE module.

Techniques For The Shellcode

In a shellcode, originally our options are quite limited. Since by definition, our shellcode begins executing from any arbitrary location in the process, long after such process had finished loading and relocating, we can't rely on the load-time linking in our code.

Additionally, because we may be dealing with ASLR, we don't know for sure where all the loaded modules are in the process that we injected into.

And we can't search in memory for the base addresses of the loaded modules, either. We don't have an exception handler set up for our shellcode. Any read from an unmapped address will crash the process with our shellcode. We could set one up dynamically, but for that we need to interact with the operating system, or in other words, be able to call system APIs. But to do that, we need to know where the modules are. Do you see our dilemma here?

Thus, before we can do anything at all, we need to accomplish two goals:

  1. Be able to find the base address of one of the guaranteed-to-be-mapped modules.
  2. Be able to resolve an API in that mapped module by the module's base address and by the API name.

If we can somehow get to do both of those tasks, we can then obtain a pointer to the GetProcAddress function and use it further to resolve any other system API we need.

But how do you get a pointer to GetProcAddress? You can't just call:

C++[Copy]
FARPROC pfn_GetProcAddress = ::GetProcAddress(hKernel32, "GetProcAddress");

Instead we have to delve into some Windows internals here, and approach it step-by-step.

Let's start from learning how to get the base address of kernel32.dll. Luckily, it's one of those DLLs that is guaranteed to be loaded into most processes.

Shellcode: Get Kernel32 Base Address

Luckily for us, the way any user-mode thread runs in Windows, its special segment register (GS for 64-bit processes, and FS for 32-bit) points to an internal structure, called TEB (or "Thread Environment Block"). It is quite poorly documented by Microsoft, so often enough you can get a much better information about it by just Googling it. Or by extracting it yourself from a .pdb file for one of the system native modules, such as ntoskrnl.exe or ntdll.dll:

C++[Copy]
typedef struct _TEB {
	_NT_TIB NtTib;
	void * EnvironmentPointer;
	_CLIENT_ID ClientId;
	void * ActiveRpcHandle;
	void * ThreadLocalStoragePointer;
	_PEB * ProcessEnvironmentBlock;
	ULONG LastErrorValue;
	ULONG CountOfOwnedCriticalSections;
	void * CsrClientThread;
	void * Win32ThreadInfo;
	ULONG User32Reserved[0x1a];
	ULONG UserReserved[0x5];
	void * WOW32Reserved;
	ULONG CurrentLocale;
	ULONG FpSoftwareStatusRegister;
	void * ReservedForDebuggerInstrumentation[0x10];
	void * SystemReserved1[0x1e];
	CHAR PlaceholderCompatibilityMode;
	UCHAR PlaceholderHydrationAlwaysExplicit;
	CHAR PlaceholderReserved[0xa];
	ULONG ProxiedProcessId;
	_ACTIVATION_CONTEXT_STACK _ActivationStack;
	UCHAR WorkingOnBehalfTicket[0x8];
	LONG ExceptionCode;
	UCHAR Padding0[0x4];
	_ACTIVATION_CONTEXT_STACK * ActivationContextStackPointer;
	ULONGLONG InstrumentationCallbackSp;
	ULONGLONG InstrumentationCallbackPreviousPc;
	ULONGLONG InstrumentationCallbackPreviousSp;
	ULONG TxFsContext;
	UCHAR InstrumentationCallbackDisabled;
	UCHAR UnalignedLoadStoreExceptions;
	UCHAR Padding1[0x2];
	_GDI_TEB_BATCH GdiTebBatch;
	_CLIENT_ID RealClientId;
	void * GdiCachedProcessHandle;
	ULONG GdiClientPID;
	ULONG GdiClientTID;
	void * GdiThreadLocalInfo;
	ULONGLONG Win32ClientInfo[0x3e];
	void * glDispatchTable[0xe9];
	ULONGLONG glReserved1[0x1d];
	void * glReserved2;
	void * glSectionInfo;
	void * glSection;
	void * glTable;
	void * glCurrentRC;
	void * glContext;
	ULONG LastStatusValue;
	UCHAR Padding2[0x4];
	_UNICODE_STRING StaticUnicodeString;
	WCHAR StaticUnicodeBuffer[0x105];
	UCHAR Padding3[0x6];
	void * DeallocationStack;
	void * TlsSlots[0x40];
	_LIST_ENTRY TlsLinks;
	void * Vdm;
	void * ReservedForNtRpc;
	void * DbgSsReserved[0x2];
	ULONG HardErrorMode;
	UCHAR Padding4[0x4];
	void * Instrumentation[0xb];
	_GUID ActivityId;
	void * SubProcessTag;
	void * PerflibData;
	void * EtwTraceData;
	void * WinSockData;
	ULONG GdiBatchCount;
	_PROCESSOR_NUMBER CurrentIdealProcessor;
	ULONG IdealProcessorValue;
	UCHAR ReservedPad0;
	UCHAR ReservedPad1;
	UCHAR ReservedPad2;
	UCHAR IdealProcessor;
	ULONG GuaranteedStackBytes;
	UCHAR Padding5[0x4];
	void * ReservedForPerf;
	void * ReservedForOle;
	ULONG WaitingOnLoaderLock;
	UCHAR Padding6[0x4];
	void * SavedPriorityState;
	ULONGLONG ReservedForCodeCoverage;
	void * ThreadPoolData;
	void * * TlsExpansionSlots;
	void * DeallocationBStore;
	void * BStoreLimit;
	ULONG MuiGeneration;
	ULONG IsImpersonating;
	void * NlsCache;
	void * pShimData;
	ULONG HeapData;
	UCHAR Padding7[0x4];
	void * CurrentTransactionHandle;
	_TEB_ACTIVE_FRAME * ActiveFrame;
	void * FlsData;
	void * PreferredLanguages;
	void * UserPrefLanguages;
	void * MergedPrefLanguages;
	ULONG MuiImpersonation;
	USHORT volatile CrossTebFlags;
	USHORT SpareCrossTebBits : 16; // 0xffff;
	USHORT SameTebFlags;
	USHORT SafeThunkCall : 01; // 0x0001;
	USHORT InDebugPrint : 01; // 0x0002;
	USHORT HasFiberData : 01; // 0x0004;
	USHORT SkipThreadAttach : 01; // 0x0008;
	USHORT WerInShipAssertCode : 01; // 0x0010;
	USHORT RanProcessInit : 01; // 0x0020;
	USHORT ClonedThread : 01; // 0x0040;
	USHORT SuppressDebugMsg : 01; // 0x0080;
	USHORT DisableUserStackWalk : 01; // 0x0100;
	USHORT RtlExceptionAttached : 01; // 0x0200;
	USHORT InitialThread : 01; // 0x0400;
	USHORT SessionAware : 01; // 0x0800;
	USHORT LoadOwner : 01; // 0x1000;
	USHORT LoaderWorker : 01; // 0x2000;
	USHORT SkipLoaderInit : 01; // 0x4000;
	USHORT SpareSameTebBits : 01; // 0x8000;
	void * TxnScopeEnterCallback;
	void * TxnScopeExitCallback;
	void * TxnScopeContext;
	ULONG LockCount;
	LONG WowTebOffset;
	void * ResourceRetValue;
	void * ReservedForWdf;
	ULONGLONG ReservedForCrt;
	_GUID EffectiveContainerId;
	ULONGLONG LastSleepCounter;
	ULONG DelayBackoff;
	UCHAR Padding8[0x4];
} TEB, *PTEB;

The important thing for us is the ProcessEnvironmentBlock member of the TEB structure that points to another undocumented structure, called PEB (or "Process Environment Block"). It is another badly documented system component:

C++[Copy]
typedef struct _PEB {
	UCHAR InheritedAddressSpace;
	UCHAR ReadImageFileExecOptions;
	UCHAR BeingDebugged;
	UCHAR BitField;
	UCHAR ImageUsesLargePages : 01; // 0x01;
	UCHAR IsProtectedProcess : 01; // 0x02;
	UCHAR IsImageDynamicallyRelocated : 01; // 0x04;
	UCHAR SkipPatchingUser32Forwarders : 01; // 0x08;
	UCHAR IsPackagedProcess : 01; // 0x10;
	UCHAR IsAppContainer : 01; // 0x20;
	UCHAR IsProtectedProcessLight : 01; // 0x40;
	UCHAR IsLongPathAwareProcess : 01; // 0x80;
	UCHAR Padding0[0x4];
	void * Mutant;
	void * ImageBaseAddress;
	_PEB_LDR_DATA * Ldr;
	_RTL_USER_PROCESS_PARAMETERS * ProcessParameters;
	void * SubSystemData;
	void * ProcessHeap;
	_RTL_CRITICAL_SECTION * FastPebLock;
	_SLIST_HEADER * volatile AtlThunkSListPtr;
	void * IFEOKey;
	ULONG CrossProcessFlags;
	ULONG ProcessInJob : 01; // 0x00000001;
	ULONG ProcessInitializing : 01; // 0x00000002;
	ULONG ProcessUsingVEH : 01; // 0x00000004;
	ULONG ProcessUsingVCH : 01; // 0x00000008;
	ULONG ProcessUsingFTH : 01; // 0x00000010;
	ULONG ProcessPreviouslyThrottled : 01; // 0x00000020;
	ULONG ProcessCurrentlyThrottled : 01; // 0x00000040;
	ULONG ProcessImagesHotPatched : 01; // 0x00000080;
	ULONG ReservedBits0 : 24; // 0xffffff00;
	UCHAR Padding1[0x4];
	void * KernelCallbackTable;
	void * UserSharedInfoPtr;
	ULONG SystemReserved;
	ULONG AtlThunkSListPtr32;
	void * ApiSetMap;
	ULONG TlsExpansionCounter;
	UCHAR Padding2[0x4];
	_RTL_BITMAP * TlsBitmap;
	ULONG TlsBitmapBits[0x2];
	void * ReadOnlySharedMemoryBase;
	void * SharedData;
	void * * ReadOnlyStaticServerData;
	void * AnsiCodePageData;
	void * OemCodePageData;
	void * UnicodeCaseTableData;
	ULONG NumberOfProcessors;
	ULONG NtGlobalFlag;
	_LARGE_INTEGER CriticalSectionTimeout;
	ULONGLONG HeapSegmentReserve;
	ULONGLONG HeapSegmentCommit;
	ULONGLONG HeapDeCommitTotalFreeThreshold;
	ULONGLONG HeapDeCommitFreeBlockThreshold;
	ULONG NumberOfHeaps;
	ULONG MaximumNumberOfHeaps;
	void * * ProcessHeaps;
	void * GdiSharedHandleTable;
	void * ProcessStarterHelper;
	ULONG GdiDCAttributeList;
	UCHAR Padding3[0x4];
	_RTL_CRITICAL_SECTION * LoaderLock;
	ULONG OSMajorVersion;
	ULONG OSMinorVersion;
	USHORT OSBuildNumber;
	USHORT OSCSDVersion;
	ULONG OSPlatformId;
	ULONG ImageSubsystem;
	ULONG ImageSubsystemMajorVersion;
	ULONG ImageSubsystemMinorVersion;
	UCHAR Padding4[0x4];
	ULONGLONG ActiveProcessAffinityMask;
	ULONG GdiHandleBuffer[0x3c];
	void (* PostProcessInitRoutine)();
	_RTL_BITMAP * TlsExpansionBitmap;
	ULONG TlsExpansionBitmapBits[0x20];
	ULONG SessionId;
	UCHAR Padding5[0x4];
	_ULARGE_INTEGER AppCompatFlags;
	_ULARGE_INTEGER AppCompatFlagsUser;
	void * pShimData;
	void * AppCompatInfo;
	_UNICODE_STRING CSDVersion;
	_ACTIVATION_CONTEXT_DATA const * ActivationContextData;
	_ASSEMBLY_STORAGE_MAP * ProcessAssemblyStorageMap;
	_ACTIVATION_CONTEXT_DATA const * SystemDefaultActivationContextData;
	_ASSEMBLY_STORAGE_MAP * SystemAssemblyStorageMap;
	ULONGLONG MinimumStackCommit;
	void * SparePointers[0x4];
	ULONG SpareUlongs[0x3];
	USHORT ActiveCodePage;
	USHORT OemCodePage;
	USHORT UseCaseMapping;
	USHORT UnusedNlsField;
	void * WerRegistrationData;
	void * WerShipAssertPtr;
	void * PatchLoaderData;
	void * pImageHeaderHash;
	ULONG TracingFlags;
	ULONG HeapTracingEnabled : 01; // 0x00000001;
	ULONG CritSecTracingEnabled : 01; // 0x00000002;
	ULONG LibLoaderTracingEnabled : 01; // 0x00000004;
	ULONG SpareTracingBits : 29; // 0xfffffff8;
	UCHAR Padding6[0x4];
	ULONGLONG CsrServerReadOnlySharedMemoryBase;
	ULONGLONG TppWorkerpListLock;
	_LIST_ENTRY TppWorkerpList;
	void * WaitOnAddressHashTable[0x80];
	void * TelemetryCoverageHeader;
	ULONG CloudFileFlags;
	ULONG CloudFileDiagFlags;
	CHAR PlaceholderCompatibilityMode;
	CHAR PlaceholderCompatibilityModeReserved[0x7];
	_LEAP_SECOND_DATA * LeapSecondData;
	ULONG LeapSecondFlags;
	ULONG SixtySecondEnabled : 01; // 0x00000001;
	ULONG Reserved : 31; // 0xfffffffe;
	ULONG NtGlobalFlag2;
} PEB, *PPEB;

Then PEB::Ldr member contains a pointer to PEB_LDR_DATA struct:

C++[Copy]
typedef struct _PEB_LDR_DATA {
	ULONG Length;
	UCHAR Initialized;
	void * SsHandle;
	_LIST_ENTRY InLoadOrderModuleList;
	_LIST_ENTRY InMemoryOrderModuleList;
	_LIST_ENTRY InInitializationOrderModuleList;
	void * EntryInProgress;
	UCHAR ShutdownInProgress;
	void * ShutdownThreadId;
} PEB_LDR_DATA, *PPEB_LDR_DATA;

And PEB_LDR_DATA::InMemoryOrderModuleList points to a doubly-linked list of LDR_DATA_TABLE_ENTRY structures for all loaded modules in the process:

C++[Copy]
typedef struct _LDR_DATA_TABLE_ENTRY {
	_LIST_ENTRY InLoadOrderLinks;
	_LIST_ENTRY InMemoryOrderLinks;
	_LIST_ENTRY InInitializationOrderLinks;
	void * DllBase;
	void * EntryPoint;
	ULONG SizeOfImage;
	_UNICODE_STRING FullDllName;
	_UNICODE_STRING BaseDllName;
	UCHAR FlagGroup[0x4];
	ULONG Flags;
	ULONG PackagedBinary : 01; // 0x00000001;
	ULONG MarkedForRemoval : 01; // 0x00000002;
	ULONG ImageDll : 01; // 0x00000004;
	ULONG LoadNotificationsSent : 01; // 0x00000008;
	ULONG TelemetryEntryProcessed : 01; // 0x00000010;
	ULONG ProcessStaticImport : 01; // 0x00000020;
	ULONG InLegacyLists : 01; // 0x00000040;
	ULONG InIndexes : 01; // 0x00000080;
	ULONG ShimDll : 01; // 0x00000100;
	ULONG InExceptionTable : 01; // 0x00000200;
	ULONG ReservedFlags1 : 02; // 0x00000c00;
	ULONG LoadInProgress : 01; // 0x00001000;
	ULONG LoadConfigProcessed : 01; // 0x00002000;
	ULONG EntryProcessed : 01; // 0x00004000;
	ULONG ProtectDelayLoad : 01; // 0x00008000;
	ULONG ReservedFlags3 : 02; // 0x00030000;
	ULONG DontCallForThreads : 01; // 0x00040000;
	ULONG ProcessAttachCalled : 01; // 0x00080000;
	ULONG ProcessAttachFailed : 01; // 0x00100000;
	ULONG CorDeferredValidate : 01; // 0x00200000;
	ULONG CorImage : 01; // 0x00400000;
	ULONG DontRelocate : 01; // 0x00800000;
	ULONG CorILOnly : 01; // 0x01000000;
	ULONG ChpeImage : 01; // 0x02000000;
	ULONG ReservedFlags5 : 02; // 0x0c000000;
	ULONG Redirected : 01; // 0x10000000;
	ULONG ReservedFlags6 : 02; // 0x60000000;
	ULONG CompatDatabaseProcessed : 01; // 0x80000000;
	USHORT ObsoleteLoadCount;
	USHORT TlsIndex;
	_LIST_ENTRY HashLinks;
	ULONG TimeDateStamp;
	_ACTIVATION_CONTEXT * EntryPointActivationContext;
	void * Lock;
	_LDR_DDAG_NODE * DdagNode;
	_LIST_ENTRY NodeModuleLink;
	_LDRP_LOAD_CONTEXT * LoadContext;
	void * ParentDllBase;
	void * SwitchBackContext;
	_RTL_BALANCED_NODE BaseAddressIndexNode;
	_RTL_BALANCED_NODE MappingInfoIndexNode;
	ULONGLONG OriginalBase;
	_LARGE_INTEGER LoadTime;
	ULONG BaseNameHashValue;
	_LDR_DLL_LOAD_REASON LoadReason;
	ULONG ImplicitPathOptions;
	ULONG ReferenceCount;
	ULONG DependentLoadFlags;
	UCHAR SigningLevel;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

Another important thing about the list of loaded modules is the internal order with which modules are loaded into it. The first module is always the process that we're running in. The second one is ntdll.dll, and the third one happens to be kernel32.dll. So we can use this order for our advantage to quickly traverse through the list to the third module, which will be kernel32.dll.

One tricky thing to keep in mind is that each _LIST_ENTRY* in LDR_DATA_TABLE_ENTRY::InMemoryOrderLinks points to an offset of InMemoryOrderLinks member in the next LDR_DATA_TABLE_ENTRY structure.

Then finally, we can retrieve the base address of Kernel32 from the LDR_DATA_TABLE_ENTRY::DllBase member.

Implementation In Assembly

The assembly function to implement what I outlined above is much less wordy. Also because it's a low level assembly language, we must write two versions of it, each for the corresponding bitness.

The C++ declaration for our assembly function should look like this:

C++[Copy]
extern "C" {
	HMODULE GetKernel32ModuleHandle();
};

64-bit Implementation

The 64-bit implementation is very simple. We can get away with literally using just one register, RAX to do all the calculations and return the result in:

x86-64[Copy]
GetKernel32ModuleHandle PROC
	mov		rax, gs:[60h]       ; PEB
	mov		rax, [rax + 18h]    ; Ldr
	mov		rax, [rax + 20h]    ; InMemoryOrderModuleList
	mov		rax, [rax]          ; Skip 'this' module and get to ntdll
	mov		rax, [rax]          ; Skip ntdll module and get to kernel32
	mov		rax, [rax + 20h]    ; DllBase for kernel32 --- size_t offset = offsetof(LDR_DATA_TABLE_ENTRY, DllBase) - sizeof(LIST_ENTRY);

	ret
GetKernel32ModuleHandle ENDP

Also note that this function should never fail and always return a valid result if called from within a non-native process.

32-bit Implementation

The 32-bit implementation is slightly more complex. We need to use the ASSUME command to tell the MASM compiler not to get upset over our use of the FS segment register. And the rest is very similar to the 64-bit version, with the exception of struct offsets.

x86[Copy]
GetKernel32ModuleHandle PROC
	ASSUME FS:NOTHING

	mov		eax, fs:[30h]       ; PEB
	mov		eax, [eax + 0Ch]    ; Ldr
	mov		eax, [eax + 14h]    ; InMemoryOrderModuleList
	mov		eax, [eax]          ; Skip 'this' module and get to ntdll
	mov		eax, [eax]          ; Skip ntdll module and get to kernel32
	mov		eax, [eax + 10h]    ; DllBase for kernel32 --- size_t offset = offsetof(LDR_DATA_TABLE_ENTRY, DllBase) - sizeof(LIST_ENTRY);

	ret
GetKernel32ModuleHandle ENDP

And just as I said for the 64-bit version, this function should never fail either, and always return a valid result if called from a non-native process.

Shellcode: Get Address Of GetProcAddress Function

Now that we know the base address of the kernel32 module we can use it to traverse through its PE header to retrieve the address of the GetProcAddress function. This process is somewhat straightforward.

First get to the IMAGE_NT_HEADERS, then get to IMAGE_OPTIONAL_HEADER. In it, we need the first IMAGE_DATA_DIRECTORY struct in the DataDirectory array at an offset IMAGE_DIRECTORY_ENTRY_EXPORT (or 0). It will contain the export directory:

C++[Copy]
typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

VirtualAddress will give us the mapped offset from the base to the IMAGE_EXPORT_DIRECTORY that we will need to traverse:

C++[Copy]
typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;
    DWORD   AddressOfNames;
    DWORD   AddressOfNameOrdinals;
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

The NumberOfNames will contain, unsurprisingly, the number of function names that are exported from the module, and AddressOfNames will contain the mapped offset to an array of DWORD offsets to function names in memory. After that all we need to do is to traverse through that array, for the number of function names that we determined earlier, and compare each name to our needed GetProcAddress.

Once found, use the ordinal number of the function, obtained from an index of the function in the AddressOfNameOrdinals array, to locate the function address offset in another array, called AddressOfFunctions. And that is it!

This all sounds way more complicated when you describe it in words. So let's put it in asm instead.

There's one caveat that we need to address here before settling on the function address that we found in the logic that I described above. Since about Windows 7, Microsoft introduced a new type of functions in the PE header that can be forwarded from one module to another. To distinguish such a function, its offset in the AddressOfFunctions array will point outside of the IMAGE_EXPORT_DIRECTORY bounds.

This case greatly complicates our simple example, and thus we won't cover it here. But, we will have to check for it and fail if GetProcAddress happens to be a forwarded function in the future.

So the C++ declaration for our assembly function will look like this:

C++[Copy]
extern "C" {
	FARPROC GetAddressOf_GetProcAddress(HMODULE hKernel32);
};

As you can see, on the input we will have to pass the base address of the Kernel32 module that we obtained earlier from our call to GetKernel32ModuleHandle. And on the output, it will return a non-zero if it locates the address of the GetProcAddress function, or NULL if it fails.

64-bit Implementation

I need to point out that this is an unoptimized assembly code to improve its readability for the reader.

x86-64[Copy]
GetAddressOf_GetProcAddress PROC
	; RCX = base address of kernel32.dll
	test	rcx, rcx
	jz		@nothing

	mov		eax, [rcx + 3Ch]    ; e_lfanew
	add		rax, rcx            ; rax = IMAGE_NT_HEADERS64
	lea		rax, [rax + 18h]    ; rax = IMAGE_OPTIONAL_HEADER64  --- size_t offset = offsetof(IMAGE_NT_HEADERS64, OptionalHeader);
	lea		rax, [rax + 70h]    ; rax = IMAGE_DATA_DIRECTORY	 --- size_t offset = offsetof(IMAGE_OPTIONAL_HEADER64, DataDirectory);
	lea		rax, [rax + 0h]     ; rax = IMAGE_DATA_DIRECTORY for IMAGE_DIRECTORY_ENTRY_EXPORT

	mov		edx, [rax]          ; rdx = VirtualAddress
	lea		rax, [rcx + rdx]    ; rax = IMAGE_EXPORT_DIRECTORY

	mov		edx, [rax + 18h]    ; rdx = NumberOfNames
	mov		r8d, [rax + 20h]    ; r8 = AddressOfNames
	lea		r8, [rcx + r8]

	mov		r10, 41636f7250746547h   ;	GetProcA
	mov		r11, 0073736572646441h   ;	Address\0

	test	rdx, rdx
	jz		@nothing

@@1:
	mov		r9d, [r8]
	lea		r9, [rcx + r9]      ; function name

	cmp		r10, [r9]
	jnz		@@2
	cmp		r11, [r9 + 7]
	jnz		@@2
	
	; Found our function
	neg		rdx
	mov		r10d, [rax + 18h]   ; r10 = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
	lea		rdx, [r10 + rdx]    ; rdx = function index

	mov		r10d, [rax + 24h]   ; r10 = AddressOfNameOrdinals
	lea		r10, [rcx + r10]
	movzx	rdx, word ptr [r10 + rdx * 2]   ; rdx = index in the function table

	mov		r10d, [rax + 1Ch]   ; r10 = AddressOfFunctions
	lea		r10, [rcx + r10]

	mov		r10d, [r10 + rdx * 4]   ; r10 = offset of possible func addr

	; Check for forwarded function
	mov		edx, [rax + 0]          ; rdx = VirtualAddress
	cmp		r10, rdx
	jb		@nothing

	mov		r11d, [rax + 4]         ; r11 = Size
	add		r11, rdx
	cmp		r10, r11
	jae		@nothing

	lea		rax, [rcx + r10]        ; Got our func addr!

	ret

@@2:
	add		r8, 4
	dec		rdx
	jnz		@@1

@nothing:
	xor		eax, eax
	ret
GetAddressOf_GetProcAddress ENDP

32-bit Implementation

And a similar assembly code for the 32-bit implementation.

x86[Copy]
GetAddressOf_GetProcAddress PROC
	ASSUME FS:NOTHING
	;[esp + 04h] = base address of kernel32.dll

	mov		ecx, [esp + 04h]

	push	ebx
	push	esi

	test	ecx, ecx
	jz		@nothing

	mov		eax, [ecx + 3Ch]        ; e_lfanew
	lea		eax, [eax + ecx + 78h]  ; eax = IMAGE_DATA_DIRECTORY for IMAGE_DIRECTORY_ENTRY_EXPORT

	mov		edx, [eax]              ; edx = VirtualAddress
	lea		eax, [ecx + edx]        ; eax = IMAGE_EXPORT_DIRECTORY

	mov		edx, [eax + 18h]        ; rdx = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
	test	edx, edx
	jz		@nothing

	mov		ebx, [eax + 20h]        ; ebx = AddressOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfNames);
	lea		ebx, [ecx + ebx]

@@1:
	mov		esi, [ebx]
	lea		esi, [ecx + esi]        ; function name

	cmp		dword ptr [esi], 50746547h          ; GetP
	jnz		@@2
	cmp		dword ptr [esi + 4], 41636f72h      ; rocA
	jnz		@@2
	cmp		dword ptr [esi + 8], 65726464h      ; ddre
	jnz		@@2
	cmp		dword ptr [esi + 11], 00737365h     ; ress\0
	jnz		@@2

	; Found our function
	neg		edx
	mov		esi, [eax + 18h]        ; esi = NumberOfNames ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, NumberOfNames);
	lea		edx, [esi + edx]        ; edx = function index

	mov		esi, [eax + 24h]        ; r10 = AddressOfNameOrdinals ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfNameOrdinals);

	lea		esi, [ecx + esi]
	movzx	edx, word ptr [esi + edx * 2]   ; edx = index in the function table

	mov		esi, [eax + 1Ch]        ; esi = AddressOfFunctions ---- size_t offset = offsetof(IMAGE_EXPORT_DIRECTORY, AddressOfFunctions);
	lea		esi, [ecx + esi]

	mov		esi, [esi + edx * 4]    ; esi = offset of possible func addr

	; Check for forwarded function
	mov		edx, [eax]              ; edx = VirtualAddress ---- size_t offset = offsetof(IMAGE_DATA_DIRECTORY, VirtualAddress);
	cmp		esi, edx
	jb		@nothing

	mov		ebx, [eax + 4]          ; ebx = Size ---- size_t offset = offsetof(IMAGE_DATA_DIRECTORY, Size);
	add		ebx, edx
	cmp		esi, ebx
	jae		@nothing

	lea		eax, [ecx + esi]        ; Got our func addr!

	pop		esi
	pop		ebx
	ret

@@2:
	add		ebx, 4
	dec		edx
	jnz		@@1


@nothing:
	xor		eax, eax

	pop		esi
	pop		ebx
	ret
GetAddressOf_GetProcAddress ENDP

Conclusion

As a final word, I'm assuming that you can see that it's pretty easy to combine the two functions that I showed above into one, if all you need to get is the address of the GetProcAddress function.

Otherwise the steps for obtaining an address to pretty much any API in the system from a shellcode could be as follows:

  • Call GetKernel32ModuleHandle and remember the base address that it returns.
  • Call GetAddressOf_GetProcAddress on the base address that you got above, to get the address of GetProcAddress.
  • Call the actual GetProcAddress, using the pointer that you got above, on the base address from the first step to obtain the address of LoadLibrary function.
  • Now you have dynamically resolved addresses of LoadLibrary and GetProcAddress functions, that you can use to resolve an address of any other API in the system.

For even more compactness you may also inline both functions into your shellcode.

Related Articles