The (Anti-)EDR Compendium
EDR functionality and bypasses in 2024, with focus on undetected shellcode loader.
Currently, there is a big focus on memory encryption for implants:
- SWAPPALA / SLE(A)PING
- Thread Pool / Pool Party
- Gargoyle
- Ekko
- Cronos
- Foliage
Also, there is a lot of work involving call stack spoofing:
- ThreadStackSpoofer
- CallStackSpoofer
- AceLdr
- CallStackMasker
- Unwinder
- TitanLdr
This is cool, but what if i told you it is not strictly necessary? Read on.
This is part of a three article series:
- See Supermega for an introduction on how to use the SuperMega loader laboratory
- See How EDR works for a discussion of EDR detection principles (this)
- See Cordyceps EXE Injection for a discussion of Cordyceps approaches
The target audience is confused Red Teamers. Basic knowledge in anti-EDR and maldev is recommended.
I am not an EDR expert. I’ve just read “Evading EDR” by Matt Hand and Elastic Security-Labs and you should too.
This article gets updated regularly, and is not mobile friendly. Last update 01.10.2024.
I mentioned parts of this in a talk at HITB BKK 2024: My First and Last Shellcode Loader. Shortened, but with more background.
Intro
Whats an EDR
EDR is “Endpoint Detection and Response”. Its an agent deployed on each machine, which observes events generated by the OS to identify attacks. If it detects something, it will generate an alert and send it to the SIEM or SOAR, where it will be looked at by human analysts. “Reponse” means the actions performed after having identified a threat, like isolating the host, which is not part of this article. EPP is Endpoint Protection Platform, and will attempt to interrupt attacks instead of just detecting it.
The UI of MDE (Microsoft Defender for Endpoint):
We can see the EDR detected something, and attempts to give the analyst more information about the incident: Involved processes, their arguments and hashes, child processes etc. The analyst at the end has to make the decision if its a false positive or an active attack. But generally the RedTeam wants to avoid raising any alarms, and tries to stay under the radar.
EDR attempts to implement detections higher up on the pyramid of pain, mostly on TTP’s: Tools, Techniques, Procedures.
Idealized EDR
Knowing and understanding of even just one EDR is hard, and of all EDR’s impossible. The EDR written about here is an abstract version of an ideal EDR. Not so much what is being done today, but what is theoretically possible with the available Windows sensor/telemetry infrastructure. The closest inspiration is Windows Defender for Endpoint (MDE), which I used for testing.
I will not teach you how to bypass a specific EDR, but how to think conceptually about the attack surface to implement your own techniques. The actual inner working of an EDR is mostly unknown (except in case of Elastic), and is considered a blackbox. While we mostly know what kind of information an EDR receives, it is not so clear how the information is being used and correlated internally.
An a hacker, we are interested in the input and output of a system. This article should give an overview of the input.
Shellcode Loader
A loader will load a shellcode. The shellcode is usually our beacon, like CobaltStrike, Sliver, or Metasploit.
The loader contains the encrypted shellcode, loads it into memory, and executes it.
┌───────────┐ ┌────────────┐ ┌────────┐
│ │ │ │ │ │
│ Loader ├──►│ C2 Beacon ├───►│ Profit │
│ │ │ Shellcode │ │ │
│ │ │ │ │ │
└───────────┘ └────────────┘ └────────┘
Goal is to make this process not detected by EDR for Initial-Access (IA).
Shellcode Loader Example
When executing shellcode, it the usual steps are:
- Allocate a memory region with read-write permissions
- Copy shellcode into that region (decrypt it too)
- Change permissions of memory region to read-execute
- Execute the shellcode
Which looks like this in C, but is similar in most languages:
char *shellcode = "\xAA\xBB...";
char *dest = VirtualAlloc(NULL, 0x1234, 0x3000, p_RW);
memcpy(dest, shellcode, 0x1234)
VirtualProtect(dest, 0x1234, p_RX, &result)
(*(void(*)())(dest))(); // jump to dest: execute shellcode
┌──────────┐ ┌───────────────┐
│ │ ┌─────────────────┐ │ Memory Region │
│ │ │ Alloc │ │ │
│ │ │ ├────────►│ │
│ │ └─────────┬───────┘ │ │
│ │ │ │ │
│ │ ┌─────────▼───────┐ │ │
│ Payload │ │ Copy & Decrypt ├─────────► │
│ ├─────►│ │ │ │
│ │ └─────────┬───────┘ │ │
│ │ │ │ │
│ │ ┌─────────▼───────┐ │ │
│ │ │ Make Executable ├────────►│ │
│ │ │ │ │ │
│ │ └─────────┬───────┘ │ │
│ │ │ │ │
│ │ ┌─────────▼───────┐ │ │
│ │ │ Execute ├─────────► │
│ │ │ │ │ │
│ │ └─────────────────┘ │ │
└──────────┘ └───────────────┘
There are many variantions of this simple recipe, some of them focus on shellcode injection
on remote processes. Which works the same by using OpenProcess()
on
the destination process, and use this as the hProcess
argument for
the function calls like VirtualAlloc(hProcess, ...)
and WriteProcessMemory(hProcess, ...)
.
Cross-process access using hProcess
are more scrutinized by the EDR.
Another typical thing being done is to call the shellcode by
creating a new thread. Be it with CreateThread()
in your own address space,
or CreateRemoteThread()
for process injection or module stomping.
The copying itself, here performed by the userspace function memcpy()
, can also be done
with RtlCopyMemory()
or others.
EDR Detection
Bubbles Of Bane
There are three main techniques for detection (of loaders):
- File scanning: Signatures (“yara”) scan for files
- Memory scanning: Signatures (“yara”) scan for process memory
- Telemetry/Behaviour: Actions performed by the process (mostly via OS)
For example, Windows Defender Antivirus implements the AV scanning, while Windows Defender for Endpoint MDE is an EDR which heavily depends on telemetry to perform behaviour analysis. If it feels the need, it will scan the memory of processes too.
I call this the “Bubbles of Bane”:
┌───────────────────┐
│ Memory │
┌───────────┼─────┐ Scanning │
│ AV │ │ │
│ Signature │ │ │
│ Scanning │ │ │
│ ┌───┼─────┼────────┐ │
│ │ │ │ │ │
│ │ └─────┼────────┼────┘
│ │ │ │
└───────┼─────────┘ │
│ │
│ Telemetry │
│ Behaviour │
│ Analysis │
│ │
└──────────────────┘
Most .exe file implants generated out of the box by C2 frameworks are signatured, and therefore not useful. Therefore the first step is to either obfuscate the code, which is hard. For an example, see Harnessing the Power of Cobalt Strike Profiles for EDR Evasion .
Or alternatively to use a loader, which carries the implant as payload and loads it when executed. Most often this technique uses shellcode generated by the C2 (alternatively, can use the generated DLL output of the C2, or the EXE. It is possible to convert it into either Shellcode or a DLL, for example with Donut). The advantage using a loader is that the payload can be encrypted, so the only thing which needs to be obfuscated from AV file signature scanning is the actual loader itself.
Public loaders are usually signatured sooner or later. But they are easy to write in basically all langues Windows understands (C, .net C#, vba, vbs, powershell, jscript…). Simple self written-loaders are surprisingly effective, as this article will show.
Instead of scanning a file, the EDR can also scan the memory of processes. This defeats loaders, as the payload code has to be unencrypted in memory to be executed. To avoid detection in memory, the process needs to encrypt its memory regions when sleeping. Then at the time the EDR scans the process, nothing suspicious should be in memory. Memory scanning is a performance intensive operation, and only being done if the EDR thinks its worthwile. This is based on the telemetry collected (or in regular intervals “on-demand”, like once a day).
Typical memory scanners are pe-sieve and moneta
Most of the detection usecases depend on telemetry: Important function calls into Windows generate events which are processed, correlated and analysed by the EDR. Like changing of permissions of memory regions, creating processes and threads, copying memory and similar.
For example, if we use a loader to bypass AV, and simply allocate a memory region for our shellcode, we dont generate much telemetry for the EDR. But the payload will be detectable by a memory scanner. If we introduce memory encryption to bypass memory scanner, then we generate more telemetry, which in turn can be used to detect the memory encryption.
Bubbles of Bane with Ekko memory encryption:
┌───────────────────┐
│ Memory │
┌───────────┼─────┐ Scanning │
│ AV │ │ │
│ Signature │ │ │
│ Scanning │ │ │
│ ┌───┼─────┼────────┐ │
│ │ │ │ [EKKO] │ │
│ │ └─────┼────────┼────┘
│ │ │ │
└───────┼─────────┘ │
│ │
│ Telemetry │
│ Behaviour │
│ Analysis │
│ │
└──────────────────┘
AV Signature Scanning
When a file is being written to disk, it will be scanned by the AV. The AV has a database of signatures with know-bad malware (like yara rules). File write events are generated by the OS and delivered to the AV via AMSI or kernel minifilter.
The signature scanning is based on the static content of the file. The PE headers will be parsed, and the content of the PE sections content scanned. It happens before the EXE will be executed. Upon positive detection, the file will be removed before execution.
A signature will look similar to a yara rule:
// https://github.com/Yara-Rules/rules/blob/master/malware/APT_APT17.yar (shortened)
rule APT17_Sample_FXSST_DLL
{
meta:
...
strings:
$x1 = "Microsoft? Windows? Operating System" fullword wide
$x2 = "fxsst.dll" fullword ascii
$y1 = "DllRegisterServer" fullword ascii
$y2 = ".cSV" fullword ascii
$s1 = "VirtualProtect"
$s2 = "Sleep"
$s3 = "GetModuleFileName"
condition:
uint16(0) == 0x5a4d and filesize < 800KB and ( 1 of ($x*) or all of ($y*) ) and all of ($s*)
}
A general solution would be code obfuscation, which I will not cover in this article. It generally cannot be reliably applied on compiled code, but needs to be incorporated into the compiling process. That means each tool needs to implement it by itself.
It would solve all our problems: No signatures on-disk or in-memory, and no need to load it, therefore no telemetry.
┌───────────────────┐
│ Memory │
┌───────────┼─────┐ Scanning │
│ AV │ │ │
│ Signature │ │ │
│ Scanning │ │ │
│ ┌───┼─────┼────────┐ │
│ │ │Obfus│ │ │
│ │ │catio│ │ │
│ │ │n │ │ │
│ │ └─────┼────────┼────┘
│ │ │ │
└───────┼─────────┘ │
│ │
│ Telemetry │
│ Behaviour │
│ Analysis │
│ │
└──────────────────┘
https://retooling.io/blog/an-unexpected-journey-into-microsoft-defenders-signature-world https://avred.r00ted.ch
AV Emulation
The AV component will also perform emulation of the target binary.
Emulation means that the AV will read and interpret the ASM instructions in the .text section by itself. It does not execute them natively, it is not virtualized execution, and also not qemu/bochs full emulation. Its a CPU emulation, including common Windows syscalls and subsystems.
In pseudocode:
asm_bytes = [
0xB8, 0x04, 0x00, 0x00, 0x00, # mov eax, 4
0xBB, 0x06, 0x00, 0x00, 0x00, # mov ebx, 6
0x01, 0xD8 # add eax, ebx
]
asm_instructions = disassembler.disasm(asm_bytes);
# asm_instructions = [
# { name = "mov", src = "4", dst="eax" }
# { name = "mov", src = "6", dst="ebx" }
# { name = "add", src = "ebx", dst="eax" }
# ]
for instruction in asm_instructions:
if instruction.name == "add":
register[instruction.dst] += register[instruction.src]
if instruction.name == "mov":
...
AV emulation creates their own “interpreter” for X86 assembly, and re-implements part
of Windows OS syscalls, and with it a virtual file system (FileOpen()
),
virtual registry for RegOpen()
, fake processes etc. The ntdll.dll
function
GetUserNameA()
may be implemented to always return “JohnDoe”.
Example experience for a RedTeamer:
- Write a loader
- Insert Metasploit shellcode
- File being detected when dropped on disk
Then:
- Write a second loader
- Encrypt metasploit shellcode with strong AES
- its still detected when dropped on disk
The AV Emulator will execute/emulate the loader. After a while execution stops, and the Metasploit shellcode is found unencrypted in memory. AV will then detect the signatures of it in memory.
There are an infinite amount of possibilities to detect an Emulator. But generally the emulation is not running forever, but restricted by:
What | Typical Limit |
---|---|
Time | ? |
Number of instructions | ? |
Number of API calls | ? |
Amount of memory used | ? |
Reference:
Receive Events
The EDR receives events of stuff processes are doing via the OS:
Process
┌────────────────┐ ┌─────────────┐
│ │ │ │
│ │ │ Windows │
│ │ │ kernel │
├────────────────┤ Syscalls │ │
│ (Hooked) ├───────────────────►│ │
│ │ │ │
│ ntdll.dll ├─────────────────┐ │ │
│ NtApi │ Usermode │ │ │
├────────────────┤ Hooks │ └──────┬──────┘
│ │ │ │
│ │ │ │ kernel
│ │ │ │ callbacks
│ │ │ │
│ │ ▼ ▼
│ │ ┌────────────────────────┐
│ │ │ EDR │
│ │ └────────────────────────┘
└────────────────┘
There are two main channels to receive data:
- Usermode (hooked API)
- Kernel callbacks (ETW, ETW-TI, kernel-mode driver)
These sensors will create events about what is happening in the system, when something is added/removed/changed like:
- Files
- Registry Keys
- Processes, Threads
- Memory Regions
The EDR will contain rules to match the events for malicious behaviour. Rules can be either:
- Precise/Brittle: Detect one specific thing well (low False-Positive FP), easy to bypass
- Robust: More generic detection, harder to bypass, higher FP, more exceptions
Note that the EDR does not see data modification inside the process
by itself. Or in other words,
a process calling a function RtlCopyMemory()
of ntdll.dll
will potentially generate telemetry, as ntdll.dll
can be hooked.
Doing the same with a byte-wise copy in a for-loop will not result in any telemetry.
Telemetry is gained from both hooked ntdll.dll
and from the kernel.
Usermode hooks can be trivially removed, but this generates telemetry.
The kernelspace events are more trustworthy, and cannot be removed.
Note that the main execution unit for Windows is the thread, not a process. But to keep it simple, i will use process mostly.
The graphic is a bit oversimplified, and can be extended with more sensors, which are the input of an EDR:
┌──────────────┐
│ │
┌─────────────┐ EtwWrite() ┌──────────┐ Kernel callbacks │ │
│ Process ├───────────►│ ├─────────────────────►│ │
│ │ │ │ │ │
│ │ │ │ │ │
├─────────────┤ │ OS │ ETW │ │
┌───────┤ ntdll.dll │ │ ├─────────────────────►│ │
│ │ │ syscall │ │ │ │
│ ┌───►│ ├───────────►│ │ ETW-TI │ EDR │
│ │ ├─────────────┤ │ ├─────────────────────►│ │
│ │ │ │ └──────────┘ │ │
│ │ ├─────────────┤ │ │
│ │ │ amsi.dll │ pipe AMSI │ │
│ └────┤ ├─────────────────────────────────────────────►│ │
│ │ │ │ │
└──────►│ │ │ │
├─────────────┤ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
└─────────────┘ └──────────────┘
EDR input is therefore:
- Usermode hooks / AMSI
- Kernel callbacks
- ETW
- ETW-TI
And I will discuss each of them individually.
Usermode Hooks
While the official kernel interface for Linux are syscalls, for Windows its ntdll.dll
.
This is called the Native API (NtAPI). ntdll.dll
will call the correct syscall for us.
The Windows Application Program Interface (WinAPI), the other DLL’s like kernel32.dll
, all use
or call the NtAPI (ntdll.dll
) at the end. Note that syscall numbers may change between
Windows versions, and therefore hardcoding them is not reliable.
WinAPI NtApi Kernel
┌─────────────────────────────────────────┐ ┌───────────────────────────────────┐
│ │ │ │
│ │ │ │
│ ┌────────────────┐ ┌────────────────┐ │ │ ┌─────────────────────────┐ │ ┌───────────────────────┐
│ │ │ │ │ │ │ │ │syscall│ │ │
│ │ kernel32.dll ├──►│ kernelbase.dll ├─┼──┤►│ ntdll.dll ├───────┤►│Kernel │
│ │ OpenProcess │ │ OpenProcess │ │ │ │ NtOpenProcess │ │ │NtOpenProcess │
│ │ │ │ │ │ │ │ │ │ │ │
│ └────────────────┘ └────────────────┘ │ │ └─────────────────────────┘ │ └───────────────────────┘
│ │ │ │
│ │ │ │
│ ┌────────────────┐ ┌────────────────┐ │ │ ┌─────────────────────────┐ │ ┌───────────────────────┐
│ │ │ │ │ │ │ │ │syscall│ │ │
│ │ kernel32.dll ├──►│ kernelbase.dll ├─┼──┤►│ ntdll.dll ├───────┼─►Kernel │
│ │ VirtualAllocEx │ │ VirtualAllocEx │ │ │ │ NtAllocateVirtualMemory │ │ │NtAllocateVirtualMemory│
│ │ │ │ │ │ │ │ │ │ │ │
│ └────────────────┘ └────────────────┘ │ │ └─────────────────────────┘ │ └───────────────────────┘
│ │ │ │
│ │ │ │
└─────────────────────────────────────────┘ └───────────────────────────────────┘
▲ ▲ ▲
│ │ │
│ │ │
Usermode Hooks Usermode Hooks Kernel
Specific Generic Callbacks
Example NtAPI function in ntdll.dll
, performing a syscall with ASM instruction syscall
:
SysNtCreateFile proc
mov r10, rcx
mov eax, 55h
syscall
ret
SysNtCreateFile endp
Typical WinAPI call, with a hook:
┌─────────────────┐
│ │
┌───────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │ │
│ │ │ │ │ │ │ OS │
│ Application.exe │ │ kernel32.dll │ │ ntdll.dll │ syscall │ │
│ ├──►│ ├──►│ ├────────────►│ │
│ .text │ │ CreateFile() │ │ NtCreateFile() │ │ kernel │
│ │ │ │ │ │ │ │
└───────────────────┘ └─────────────────┘ └─────────┬─────────┘ │ │
│hook │ │
│ │ │
┌────────▼────────────────┐ │ │
│ │ │ │
│ amsi.dll │ │ │
│ │ │ │
│ NtCreateFile_Hook() │ │ │
└─────────────────────────┘ │ │
│ └─────────────────┘
▼
EDR
Userspace hooks are just patches in ntdll.dll
exported functions, which call
into another DLL before the function is executed. Windows provides functionality to directly hook functions.
Original Function On-Disk: EDR Hooked Function In-Memory:
---------------------- -----------------------
mov r10, rcx mov r10, rcx
>mov eax, 50h jmp 0x7ffaeadea621
test byte ptr [0x7FFE0h], 1 test byte ptr [0x7FFE0h], 1
jne 0x17e76540ea5 jne 0x17e76540ea5
syscall syscall
ret ret
Examples of commonly hooked ntdll.dll
functions:
Function name | Related attacker techniques |
---|---|
NtOpenProcess | Process Injection |
NtAllocateVirtualMemory | Process Injection |
NtWriteVirtualMemory | Process Injection |
NtCreateThreadEx | Process Injection |
NtSuspendThread | APC Shellcode Injection |
NtResumeThread | APC Shellcode Injection |
NtQueueApcThread | APC Shellcode Injection |
The EDR receives the function call names and its parameters as telemetry.
This is accomplished by using kernel callsbacks (PsSetCreateProcessNotifyRoutine
)
to get notified whenever a new process is created at an early stage, and then
inject a DLL into the process (like amsi.dll
), patching the original ntdll.dll
functions
to take a detour into amsi.dll
by using Asyncronous Procedure Calls (kKAPC injection).
After ntdll.dll
is patched, each function call will therefore be intercepted by amsi.dll
.
EDR function hooking with KAPC will create a APC which performs the hooking. The technique “Early Bird APC injection” uses the same APC mechanism, which can therefore run before the KAPC hooking has been performed.
Usermode hooks can be bypassed with:
- Direct syscalls (avoid calling
ntdll.dll
) - Indirect syscalls (calling
ntdll.dll
functions, but after the hook) - Patching / restoring
ntdll.dll
(removing the hooks completely)
Usermode hooks are easy to bypass, as they are completely located in
“our own” memory space, where we can freely mess with it.
But restoring ntdll.dll
itself would generate telemetry, which is the
reason why direct syscalls are being used for this.
An EDR should not depend solely on usermode hooks, but only use them for auxiliary telemetry. But they provide more information than kernel callbacks. Kernel callbacks only “see” the syscall/ntdll.dll function, not the original function which was originally initiated. This is useful, as it generates more generic detections, without depending on hooking all the weird and unusual DLL functions. But it may generate more false positives, as it more difficult to identify “non-malicious” behaviour with just the syscalls.
For example, CreateFileA()
, CreateFileW()
, OpenFile()
and CreateFileTransacted()
will all call NtCreateFile()
at the end.
Note that the callstack can show which function in the chain has been initially called. Usermode hooks are used less and less, and not by all EDRs ( source):
Kernel telemetry
The Windows OS provides information about processes in form of notification
callback routines. Especially about process-, thread- and image-creation.
It is generated by the kernel itself, there is no way to surpress these
like with usermode hooks (without kernel privileges).
These callbacks are initiated in the context of the relevant process and thread.
Therefore the events have information about the origin process.
There are various different sources of kernel mode instrumentation:
- ETW (Windows Event Tracing infrastructure)
- ETW-TI (Thread Intelligence)
- Kernel Callbacks (PsSetCreateProcessNotifyRoutine etc.)
- NDIS / Minifilter drivers (for filesystem)
Kernel callbacks are:
- PsSetCreateProcessNotifyRoutine: Process creation, termination
- PsSetCreateThreadNotifyRoutine: Thread creation, deletion
- PsSetLoadImageNotifyRoutine: Windows image loader
- ObRegisterCallbacks: Object Manager callbacks, like NtOpenProcess, NtOpenThread, NtOpenFile, …
Reference:
An example event is PS_CREATE_NOTIFY
callback, which gives the EDR different
pieces of information:
Field | Notes |
---|---|
ParentProcessId | |
CreatingThreadId | |
*FileObject | The .exe on disk |
ImageFileName | Parameter of created process |
CommandLine | Parameter of created process |
CreationStatus |
Sysmon can capture this event from the kernel, and will produce the following:
Process Create:
RuleName: -
UtcTime: 2024-04-28 22:08:22.025
ProcessGuid: {a23eae89-bd56-5903-0000-0010e9d95e00}
ProcessId: 6228
Image: C:\Windows\System32\wbem\WmiPrvSE.exe
FileVersion: 10.0.22621.1 (WinBuild.160101.0800)
Description: WMI Provider Host
Product: Microsoft® Windows® Operating System
Company: Microsoft Corporation
OriginalFileName: Wmiprvse.exe
CommandLine: C:\Windows\system32\wbem\wmiprvse.exe -secured -Embedding
CurrentDirectory: C:\Windows\system32\
User: NT AUTHORITY\NETWORK SERVICE
LogonGuid: {a23eae89-b357-5903-0000-002005eb0700}
LogonId: 0x7EB05
TerminalSessionId: 1
IntegrityLevel: System
Hashes: SHA1=91180ED89976D16353404AC982A422A707F2AE37,MD5=7528CCABACCD5C1748E63E192097472A,SHA256=196CABED59111B6C4BBF78C84A56846D96CBBC4F06935A4FD4E6432EF0AE4083,IMPHASH=144C0DFA3875D7237B37631C52D608CB
ParentProcessGuid: {a23eae89-bd28-5903-0000-00102f345d00}
ParentProcessId: 580
ParentImage: C:\Windows\System32\svchost.exe
ParentCommandLine: C:\Windows\system32\svchost.exe -k DcomLaunch -p
ParentUser: NT AUTHORITY\SYSTEM
Note that only the fields ImageFilename
, CommandLine
, ParentProcessId
translate directly to the Image
, CommandLine
, ParentProcessId
of the
kernel event. But most of the other information is gathered by Sysmon additionally.
These additional information are gathered by querying the kernel,
for example by issuing GetProcessInformation
on the ProcessId
.
Or in other ways, like parsing the PEB of the process.
Not all information provided is equally trustworthy.
A ETW ImageLoad
event from Microsoft-Windows-kernel-Process
recorded with SilkETW:
{
ProviderGuid: "22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716",
ProviderName: "Microsoft-Windows-kernel-Process",
EventName: "ImageLoad",
ThreadID: 9584,
ProcessID: 7536,
ProcessName: "notepad",
YaraMatch: [],
Opcode: 0,
OpcodeName: "Info",
TimeStamp: "2024-07-08T19:06:10.8845667+01:00",
PointerSize: 8,
EventDataLength: 142,
XmlEventData: {
ProviderName: "Microsoft-Windows-kernel-Process",
FormattedMessage: "Process 7’536 had an image loaded with name \Device\HarddiskVolume2\Windows\System32\notepad.exe. ",
EventName: "ImageLoad"
ProcessID: "7’536",
PID: "7536",
TID: "9584",
PName: "",
DefaultBase: "0x7ff631650000",
ImageName: "\Device\HarddiskVolume2\Windows\System32\notepad.exe",
ImageBase: "0x7ff631650000",
ImageCheckSum: "265’248",
ImageSize: "0x38000",
MSec: "9705.0646",
TimeDateStamp: "1’643’917’504",
}
}
Memory Regions
Upon starting an .exe, the sections in the PE .exe file get copied into memory, completely as a block.
.text
contains the assembly code, while the .data
and similar contains data for the program.
New memory regions can be created using VirtualAlloc()
or similar.
EXE
Program Process
┌──────────┐ ┌──────────────┐
│ │ │ │
│ Header ├───────────►│ Header │
│ │ │ │
├──────────┤ ├──────────────┤
│ │ │ │
│ │ ├──────────────┤
│ .text ├─────┐ │ │ Backed
│ │ │ │ │ RX
│ │ └─────►│ .text │
├──────────┤ │ │
│ │ │ │
│ .data ├────┐ ├──────────────┤
│ │ │ │ │
│ │ │ │ │
└──────────┘ │ ├──────────────┤
│ │ │ Backed
│ │ │ RW
└──────►│ .data │
│ │
├──────────────┤
│ │
│ │
├──────────────┤
│ │
│ Virtual │ Unbacked
│ Alloc() │ RW
│ │
└──────────────┘
The memory regions coming from the PE image are called backed regions. They are trustworthy, as they are 1:1 copies from the PE file, which is scanned on-disk by the AV. The memory regions are “backed” by the file on-disk. It can also be called IMAGE regeions.
If the process allocates additional memory by allocating it, it is “unbacked”. Also called USER memory or PRIVATE. There is no file backend, so its “unbacked”.
Generelly it can be though of, memory regions having the property of:
- USER/PRIVATE/Unbacked: Bad, potentially malicious, shellcode
- IMAGE/Backed: Good, pretty trusted
This is mainly as shellcode from exploits or process injection usually lives in PRIVATE memory. Also threads should start from backed regions. PRIVATE RWX memory is even more suspicious.
Here some trustworthy memory regions of type IMG (IMAGE, backed):
Here some untrustworthy memory regions of type PRV (PRIVATE, unbacked):
One property of memory pages is Copy-On-Write (COW). A memory scanner is able to check
if the memory page was written to, which is unusual for read-only .text sections and others, as
these should be shared between processes.
This is used by Moneta via PSAPI_WORKING_SET_EX_BLOCK
from PSAPI_WORKING_SET_EX_INFORMATION
structure.
Data-only attacks, e.g. for AMSI-patch or ETW-patch, are preferred.
References:
- https://www.trustedsec.com/blog/windows-processes-nefarious-anomalies-and-you-memory-regions
- https://www.arashparsa.com/bypassing-pesieve-and-moneta-the-easiest-way-i-could-find/
- https://www.outflank.nl/blog/2023/10/05/solving-the-unhooking-problem/
- https://www.ired.team/offensive-security/code-injection-process-injection/ntcreatesection-+-ntmapviewofsection-code-injection
Memory Scanning
Memory signature scanning will detect malicious code in-memory, in either .text or data sections (stack, heap, .data etc.).
Event
│
Process ▼
┌───────────┐ ┌───────────┐
│ │ │ │
│ │ │ │
│ │ │ │
├───────────┤ │ │
│ │ Read │ │
│ .text ◄────────┤ EDR │
│ (bad) │ Scan │ │
├───────────┤ │ │
│ │ │ │
│ ◄────────┤ │
│ .data │ │ │
│ (bad) │ └───────────┘
│ │
└───────────┘
Its basically same like AV signature scanning; grep or yara' the memory content against known malicious signatures.
Memory scanning is performance intensive. It is not done constantly, but depends on a trigger.
Query Process Information
The EDR, upon receiving events, will also attempts to enrich it:
- Process information (like executable name and command line arguments)
- Memory scan (possibly)
- Process image file scan (rarely)
┌──────────────┐
│ │
┌─────────────┐ EtwWrite() ┌──────────┐ Kernel callbacks │ │
│ Process ├───────────►│ ├─────────────────────►│ │
│ │ │ │ │ │
│ │ │ │ │ │
├─────────────┤ │ OS │ ETW │ │
┌───────┤ ntdll.dll │ │ ├─────────────────────►│ │
│ │ │ syscall │ │ │ │
│ ┌───►│ ├───────────►│ │ ETW-TI │ EDR │
│ │ ├─────────────┤ │ ├─────────────────────►│ │
│ │ │ │ └──────────┘ │ │
│ │ ├─────────────┤ │ │
│ │ │ amsi.dll │ pipe AMSI │ │
│ └────┤ ├─────────────────────────────────────────────►│ │
│ │ │ │ │
└──────►│ │ │ │
├─────────────┤ │ │
│ │ │ │
│ │ │ │
│ ┌──────────┤ Process Info │ │
│ │ │◄─────────────────────────────────────────────┤ │
│ │ PEB │ │ │
│ │ Eprocess │ │ │
│ │ │ └──┬──┬────────┘
│ │ │ │ │
│ └──────────┤ Memory Scan │ │
│ │◄────────────────────────────────────────────────┘ │
└───────▲─────┘ │
│ │
File │ │
┌──────┴────┐ File Scan │
│ │◄────────────────────────────────────────────────────┘
│ │
│ │
│ │
└───────────┘
The EDR does not only receive events, but will also actively query the OS for more
information. For example, when receiving a PS_CREATE_NOTIFY
event, the EDR
will gain more information about the process creating the event, like
by using GetProcessInformation()
or OpenProcess()
, access the PEB, arguments, or memory
regions. Or accessing the ImageFileName
and scan the origin EXE image file.
Note that the EDR is a normal process, even if SYSTEM or PPL’d, and having its own dedicated kernel driver. With its SYSTEM privileges it can gather information about pretty much all other processes.
Here is an example
of a PsSetCreateProcessNotifyRoutine
handler function:
void CreateProcessNotifyRoutine(HANDLE ppid, HANDLE pid, BOOLEAN create) {
if (create) {
PEPROCESS process = NULL;
PUNICODE_STRING processName = NULL;
// Retrieve the process name from the EPROCESS structure
PsLookupProcessByProcessId(pid, &process);
SeLocateProcessImageName(process, &processName);
DbgPrint("MyDumbEDR: %d (%wZ) launched", pid, processName);
}
}
The handler function only received the pid
of the process. To also display
the image name, a few functions have to be called, which access PEB or
EPROCESS structure.
Data stored in the PEB (Process Environment Block, at GS:[0x60]
). It is in usermode, and
can be manipulated freely.
- ImageBase Address
- loaded DLLs
- process parameters:
- image name
- arguments
- environment variables
- working directory
EPROCESS is a kernel data structure, and cannot be manipulated directly (sometimes indirectly):
- process create and exit time
- process id
- parent process id
- address of PEB
- image filename
- similar to process parameters image name in the PEB
- also available in the SectionObject
Process Information Data Structures
The PEB:
typedef struct _PEB {
BYTE Reserved1[2];
BYTE BeingDebugged;
BYTE Reserved2[1];
PVOID Reserved3[2];
PPEB_LDR_DATA Ldr;
PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
PVOID Reserved4[3];
PVOID AtlThunkSListPtr;
PVOID Reserved5;
ULONG Reserved6;
PVOID Reserved7;
ULONG Reserved8;
ULONG AtlThunkSListPtr32;
PVOID Reserved9[45];
BYTE Reserved10[96];
PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
BYTE Reserved11[128];
PVOID Reserved12[1];
ULONG SessionId;
} PEB, *PPEB;
Whereas ProcessParameters
is:
typedef struct _RTL_USER_PROCESS_PARAMETERS {
BYTE Reserved1[16];
PVOID Reserved2[10];
UNICODE_STRING ImagePathName;
UNICODE_STRING CommandLine;
} RTL_USER_PROCESS_PARAMETERS, *PRTL_USER_PROCESS_PARAMETERS;
Callstack Analysis
When a process calls a windows function, it is possible to find out the parent functions which lead to this call. This is called the callstack.
The EDR can chose to inspect the process initiating a function or API call, and analyze the call stack for suspicious things:
Process
┌──────────────────────────────────────────────────────────────────────┐ ┌─────────────────┐
│ │ │ OS kernel │
│ ┌───────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ Application.exe │ │ kernel32.dll │ │ ntdll.dll │ │syscall │ │
│ │ ├──►│ ├──►│ ├─┼──────────►│ NtWriteFile() │
│ │ .text │ │ CreateFile() │ │ NtCreateFile() │ │ │ │
│ │ │ │ │ │ │ │ └────┬────────────┘
│ └───────────────────┘ └─────────────────┘ └───────────────────┘ │ │
│ │ │Notify
│ Stack │ │
│ ┌──────────────────────────────────┐ │ ▼
│ │ Application.exe: SomeFunction() │ │ Inspect ┌─────────────────┐
│ │ kernel32.dll: CreateFile() │◄─────┼───────────┤ │
│ │ ntdll.dll: NtCreateFile() │ │ │ │
│ └──────────────────────────────────┘ │ │ │
│ │ │ EDR │
│ │ │ │
│ │ │ │
└──────────────────────────────────────────────────────────────────────┘ └─────────────────┘
It is possible to detect a wide variety of attacks and bypasses with this technique. But its somewhat performance-intensive.
A callstack’s origin should be from an memory region from backed
memory, go through a supporting DLL (e.g. user32.dll
), then
ntdll.dll
, and where finally the actual syscall
instruction is executed.
Elastic has callstack analysis rules to identify:
- Direct syscalls
- Callback-based evasion
- Module Stomping
- Library loading from unbacked region
- Process created from unbacked region
If call comes from a unbacked region, it is most likely from a shellcode.
Call stack analysis is usually not applied to all API functions. Elastic mentions the following:
- VirtualAlloc, VirtualProtect
- MapViewOfFile, MapViewOfFile2
- VirtualAllocEx, VirtualProtectEx
- QueueUserAPC
- SetThreadContext
- WriteProcessMemory, ReadProcessMemory
Reference:
- https://www.elastic.co/security-labs/upping-the-ante-detecting-in-memory-threats-with-kernel-call-stacks
- https://www.elastic.co/security-labs/doubling-down-etw-callstacks
Thread State Analysis
Threads can be sleeping for different reasons. Investigating the state, and how the thread got there due his callstack, we find indicators for sleeping beacons, or memory encryption.
Clean (spoofed) callstack for NtDelayExecution()
:
If memory encryption is being used, the thread is usually put to sleep by calling either:
- Kernelbase.dll!SleepEx
- ntdll.dll!NtDelayExecution
Suspicious things for calls to these sleep functions:
- Calls to virtual memory in the callstack
- Source in non-backed memory regions
Refernece:
Performance Impact
Performance of the EDR is of utmost importance. If developer machines are slow when installing 10'000 NPM packages, people will move to Apple where protections are less, and Microsoft cant allow that. This is such a problem that Microsoft introduced asyncronous Dev Drive scanning.
The least performance intensive operation would be if the detection can be applied directly to a rare event (lets say, opening of a process handle to lsass.exe). Memory scans can involve iterating or yara-scanning megabytes of .text sections, which is very expensive. Scanning files is the most expensive, even with SSDs.
Most detections are in between those: One or multiple events with suspicious information, which leads to some more correlation. These then may kick-off the memory scanning.
Performance Impact | What |
---|---|
1 | Event |
3 | Event Correlation |
10 | Query process |
100 | Memory Scan |
1000 | File Scan |
What could trigger a memory scan?
What | Triggers scan? | Notes |
---|---|---|
VirtualAlloc() | No | too common, except when RWX |
WriteProcessMemory() | No | very common |
memcpy() | No | Not visible for EDR |
VirtualProtect | No? | RWX or RW->RX may be trigger |
CreateRemoteThread() | Yes | Should trigger memory scan |
VirtualAlloc()
and WriteProcessMemory()
are very commonly called
functions. CreateRemoteThread()
is not only less often called, it is
also a more clear indicator of potentially malicious behaviour.
EDR Attacks
The EDR receives events from a large amount of sensors, with various trustworthyness. Also much of the information required is not available in the event itself, but has to be access in or via the kernel (KPROCESS, EPROCESS) or the process memory space itself (e.g. PEB including command line arguments, parent process id).
Many attacks depend on the fact of TOCTOU vulnerability: time of check, time of use.
Command Line Spoofing
EDR’s can check for potentially malicious command line arguments, for newly spawned
processes, for example
when using mimikatz: mimikatz.exe "privilege::debug" "lsadump::sam"
. Even
if we rename mimikatz.exe
, the arguments privilege::debug
is a pretty clear indicator
with low false positive rate.
But in windows, its possible to spoof command line arguments. The process' command line arguments are stored in the PEB of the respective process. Additionally when we create a new process, the process-creation function will also contain the initial arguments (of the exe to be started).
So we have basically two places for command line arguments:
- In the PEB of the child process
- On child create function:
CreateProcessW(..., "command line args", ...)
In the PEB:
typedef struct _PEB {
...
PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
...
}
typedef struct _RTL_USER_PROCESS_PARAMETERS {
...
UNICODE_STRING ImagePathName;
UNICODE_STRING CommandLine;
} *PRTL_USER_PROCESS_PARAMETERS;
As the PEB is modifiable by its process, data in it cannot be trusted.
EDR queries an existing process for its command line, and usually trusts it blindly:
┌────────────────────┐ ┌─────────────────┐
│ Process │ │ │
│ │ │ │
│ PEB │ │ │
│ ┌──────────────┤ │ │
│ │ │ │ EDR │
│ │ ImageName │◄─────────────────┤ │
│ │ CommandLine │ │ │
│ │ │ │ │
│ └──────────────┤ │ │
│ │ │ │
└────────────────────┘ └─────────────────┘
But it can be verified.
When a parent process calls CreateProcess()
to create a child process:
┌─────────┐ ┌──────────┐ ┌───────────┐
│ Process │ │ │ │ Child │
│ │ CreateProcess() │ OS │ Spawns │ Process │
│ ├─────────────────►│ ├──────────►│ │
│ │ ▲ │ │ │ │
│ │ │ └──────────┘ │PEB │
│ │ │ ├─────────┐ │
│ │ │ ┌───────┐ │ Command │ │
│ │ │ │ │ ┌────►│ Line │ │
│ │ └────────┤ EDR ├───────┘ ├─────────┘ │
│ │ │ │ │ │
└─────────┘ └───────┘ └───────────┘
The EDR can compare the command line in CreateProcess()
and then the PEB of the
resulting child process, and alert if they dont match.
Intercepting the function call arguments in CreateProcessW(..., "command line args", ...)
does not really help much either, as we can create the process in a suspended state
with fake arguments, overwrite them with the correct ones remotely, and then resume the process.
- Parent: Create new suspended process with fake arguments
- EDR: receives event with fake arguments
- Parent: Overwrite PEB of child with real arguments
- Parent: Continue (start) child process (using real arguments)
- Child process: Overwrite its PEB with fake arguments again
- EDR: querying the process gets the fake arguments
If the EDR thinks the child process is malicious in the future, it will provide information to the analyst, including the process' command line arguments, taken from the PEB. So the child process needs to overwrite the PEB again, as a “cleanup”.
Command line arguments for processes are therefore pretty untrustworthy.
PPID Spoofing
In Windows, unlike Linux, there is no dependency between parent- and
child process, as there is (was) no fork()
.
The child gets certain attributes from the parent, including the PID of the
parent. It will also be stored in the EPROCESS structure of the process.
The function CreateProcessW()
can be instructed to provide its own
attributes, including the parent process of the child, in the STARTUPINFOEX
structure.
So already upon creation, we can give the child a wrong parent PID.
CreateProcessW()
interface:
BOOL CreateProcessW(
[in, optional] LPCWSTR lpApplicationName,
[in, out, optional] LPWSTR lpCommandLine,
[in, optional] LPSECURITY_ATTRIBUTES lpProcessAttributes,
[in, optional] LPSECURITY_ATTRIBUTES lpThreadAttributes,
[in] BOOL bInheritHandles,
[in] DWORD dwCreationFlags,
[in, optional] LPVOID lpEnvironment,
[in, optional] LPCWSTR lpCurrentDirectory,
[in] LPSTARTUPINFOW lpStartupInfo, // PPID spoofing here
[out] LPPROCESS_INFORMATION lpProcessInformation
);
The actual PPID spoofing is just setting attributes in struct STARTUPINFOEX
and give this
as lpStartupInfo
parameter:
{
STARTUPINFOEXA si;
HANDLE fakeParent = OpenProcess(.., <pid of fake parent process>);
..
UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_PARENT_PROCESS, &fakeParent, ..);
CreateProcessA(NULL, (LPSTR)"notepad", .., EXTENDED_STARTUPINFO_PRESENT, .., &si.StartupInfo, ..);
}
Where:
typedef struct _STARTUPINFOEXA {
STARTUPINFOA StartupInfo;
LPPROC_THREAD_ATTRIBUTE_LIST lpAttributeList; // attributes, one is the ppid
} STARTUPINFOEXA, *LPSTARTUPINFOEXA;
It will be stored in the EPROCESS kernel structure:
typedef struct _EPROCESS
{
KPROCESS Pcb;
...
HANDLE InheritedFromUniqueProcessId; // PPID
...
}
This can be retrieved by the EDR with NtQueryInformationProcess()
:
__kernel_entry NTSTATUS NtQueryInformationProcess(
[in] HANDLE ProcessHandle,
[in] PROCESSINFOCLASS ProcessInformationClass,
[out] PVOID ProcessInformation, // PROCESS_BASIC_INFORMATION
[in] ULONG ProcessInformationLength,
[out, optional] PULONG ReturnLength
);
typedef struct _PROCESS_BASIC_INFORMATION {
NTSTATUS ExitStatus;
PPEB PebBaseAddress;
ULONG_PTR AffinityMask;
KPRIORITY BasePriority;
ULONG_PTR UniqueProcessId;
ULONG_PTR InheritedFromUniqueProcessId; // PID
} PROCESS_BASIC_INFORMATION;
PPID spoofing can be detected, as upon process creation, an event is delivered to the
EDR about the new process. This event is usually in the context of the origin process, or
the process is referenced in it. The EDR can then compare the content of the
STARTUPINFOEX
structure with the process the event comes from (e.g.
by just comparing the PID of both). Here EDR sees the CreateProcess()
call with
PPID=y (2), and the effective PID of the process initiating this call (1)
having PID=x.
┌─────────┐ ┌──────────┐ ┌───────────┐
│ Process │ CreateProcess() │ │ │ Child │
│ │ PPID=y │ OS │ Spawns │ Process │
│ ├─────────────────►│ ├──────────►│ │
│ │ ▲ │ │ │ │
│ │ │ └──────────┘ │EPROCESS │
│ ┌───────┤ 1 │2 ├─────────┐ │
│ │PID=x │◄─────────┤ ┌───────┐ 3 │ PPID=y │ │
│ │ │ │ │ │ ┌────►│ │ │
│ └───────┤ └────────┤ EDR ├───────┘ ├─────────┘ │
│ │ │ │ │ │
└─────────┘ └───────┘ └───────────┘
So the EDR has:
- Parent: PID
- Parent: PPID in its issued
CreateProcess()
call destined for the child - Child: Its PPID
And compare those, especially 1) and 2). Or later 1/2 and 3. It is not always completely clear for the events received, where the origin PID comes from (for example with ETW).
Note that InheritedFromUniqueProcessId
is stored in EPROCESS, but still
cannot be trusted, as it can be set from userspace.
ETW-patch
A ETW patch will overwrite EtwEventWrite()
in ntdll.dll
, so the
process will not emit any ETW events by itself anymore. This is mostly
useful for Powershell and .NET related events. It usually involves:
- VirtualProtect .text: RX -> RW
- Overwrite memory (replace function body with a
return 0
) - VirtualProtect .text: RW -> RX
Process
┌──────────────────────┐
│ │
│ │
├──────────────────────┤
│ │ ntdll.dll RW -> patch -> RX
│ .text ├──────────────┐
│ │ │
├──────────────────────┤ │ ┌─────────┐
│ │ │ │ │
│ │ │ ◄─────┤ EDR │
│ │ │ │ sus? │
├──────────────────────┤ │ │ │
│ ntdll.dll │ │ └─────────┘
│ │ │
│ - EtwEventWrite() │◄─────────────┘
│ │
│ │
├──────────────────────┤
│ │
│ │
│ │
└──────────────────────┘
Probably changing permissions of ntdll.dll
to modify it will generate more telemetry than
patching ETW is avoiding. Its memory permissions need to be changed from RX to RW and then
back to RX again.
Note that this will only affect the events generated by the patched process. ETW cannot be deactivated globally.
ETW events are mostly used for managed processes (DotNet, C#) and Powershell. ETW was used a lot by Sysmon before, so ETW-patch was anti-Sysmon.
References:
- https://jsecurity101.medium.com/understanding-etw-patching-9f5af87f9d7b
- https://jsecurity101.medium.com/refining-detection-new-perspectives-on-etw-patching-telemetry-e6c94e55a9ad
AMSI-AV patching
AMSI will scan scripts executed in supported Windows interpreters, like Powershell, MS Office VBA runtime, or .NET. Or in other words, the application itself asks the OS to perform an AV scan via AMSI on some file or buffer it intends to execute.
To disable AMSI runtime code scanning, for example patch amsi.dll!AmsiOpenSession
to remove
telemetry. Alternatives are AmsiScanString() / AmsiScanBuffer()
.
The process is identical to ETW-patch: Make code section writeable, break the functions, restore original permissions again.
Process
┌──────────────────────┐
│ │
│ │
├──────────────────────┤
│ │ ntdll.dll RW -> patch -> RX
│ .text ├──────────────┐
│ │ │
├──────────────────────┤ │ ┌─────────┐
│ │ │ │ │
│ │ │ ◄─────┤ EDR │
│ │ │ │ sus? │
├──────────────────────┤ │ │ │
│ ntdll.dll │ │ └─────────┘
│ │ │
│ - AmsiOpenSession() │◄─────────────┘
│ │
│ │
├──────────────────────┤
│ │
│ │
│ │
└──────────────────────┘
Disabling the AMSI-AV function is usually done by a loader, before executing well signatured malicious managed code or Powershell scripts. The loader is being scanned, but the .NET/Powershell loaded at runtime wont be.
This is useful for when loading a signatured malicious powershell script in powershell, which otherwise would be scanned by the AMSI interface. A famous site to generate obfuscated AMSI-AV patches is https://amsi.fail.
AMSI-hooks patching
AMSI-hook patching (or AMSi patching) is just removing the EDR’s ntll.dll
patches which call into amsi.dll
.
It is basically identical to ETW-patch or AMSI-AV patch, as it just modifies ntdll.dll
again. It can generate additional telemetry, for example when loading a clean
version of ntll.dll
from disk.
Process
┌──────────────────────┐
│ │
│ │
├──────────────────────┤
│ │ ntdll.dll RW -> patch -> RX
│ .text ├──────────────┐
│ │ │
├──────────────────────┤ │ ┌─────────┐
│ │ │ │ │
│ │ │ ◄─────┤ EDR │
│ │ │ │ sus? │
├──────────────────────┤ │ │ │
│ ntdll.dll │ │ └─────────┘
│ │ │
│ │◄─────────────┘
│ │
│ │
├──────────────────────┤
│ │
│ │
│ │
└──────────────────────┘
References:
AMSI Bypass
AMSI bypass can either mean to bypass the AMSI-AV interface as described above.
Or it means to call OS kernel functions without invoking the ntdll.dll
hooks in it.
This can be done by using direct syscalls: If you know the correct syscall number,
you can invoke it directly, without involving ntdll.dll
.
Or for indirect syscalls: re-use parts of the ntdll.dll
functions, AFTER the hook-invocation.
In both cases, the AMSI-hooks are bypassed, and the EDR will not get any telemetry.
If this is the normal function call graph with hooked ntdll.dll
:
┌─────────────┐
│ │
┌───────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │ │
│ │ │ │ │ ntdll.dll: │ │ OS │
│ Application.exe │ │ kernel32.dll │ │ NtCreateFile() │ │ │
│ ├──►│ ├──►│ │ │ │
│ │ │ CreateFile() │ │ │ │ Kernel │
│ │ │ │ │ │ │ │
└───────────────────┘ └─────────────────┘ │ │ │ │
│ │ │ │
┌────────┼───jmp callback │ │ │
│ │ │ syscall │ │
│ ┌──────┼──►syscall ├─────────────────► │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ └───────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────┐ │ │
│ └─┤ │ │ │
│ │ amsi.dll: │ └─────────────┘
└──►│ HookedNtCreateFile() │
└──────────┬──────────────┘
│ notify
▼
┌────────────┐
│ EDR │
│ :-) │
└────────────┘
Here with:
- Direct syscall: Just do the syscall yourself (with the correct syscall number)
- Indirect syscall: Re-use parts of hooked
ntdll.dll
, invocate syscall but not the hook
direct
syscall
┌────────────────────────────────────────────────────────┐ ┌─────────────┐
│ │ │ │
┌───────────────┴───┐ ┌─────────────────┐ ┌───────────────────┐ │ │ │
│ │ │ │ │ ntdll.dll: │ │ │ OS │
│ Application.exe │ │ kernel32.dll │ │ NtCreateFile(): │ │ │ │
│ ├──►│ ├──►│ │ │ │ │
│ │ │ CreateFile() │ │ │ │ │ Kernel │
│ │ │ │ │ │ │ │ │
└──────────────┬────┘ └─────────────────┘ │ │ │ syscall │ │
│ │ │ └──────────► │ │
│ │ jmp callback │ │ │
│ │ │ syscall │ │
└──────────────────────────────┼──►syscall ├─────────────────► │ │
indirect │ │ │ │
syscall │ │ │ │
└───────────────────┘ │ │
│ │
┌────────────────────────┐ │ │
│amsi.dll │ └─────────────┘
│ │
│HookedNtCreateFile() │
└────────────────────────┘
no notify
┌────────────┐
│ EDR │
│ :-( │
└────────────┘
Or replace ntdll.dll
completely with an unhooked version from disk, like in RefleXXion.
References:
- https://eversinc33.com/posts/avoiding-direct-syscalls.html
- https://www.outflank.nl/blog/2019/06/19/red-team-tactics-combining-direct-system-calls-and-srdi-to-bypass-av-edr/
- https://passthehashbrowns.github.io/hiding-your-syscalls
- https://github.com/JustasMasiulis/inline_syscall
- https://github.com/jthuraisamy/SysWhispers2
- https://github.com/klezVirus/SysWhispers3
- https://alice.climent-pommeret.red/posts/direct-syscalls-hells-halos-syswhispers2
Image Spoofing
Similar to spoofing arguments, an attacker may also want to “spoof” the exe: Start a non-malicious exe like notepad.exe, which the EDR records, then replace the content of the process with malicious one like mimikatz. This attempts to trick the EDR into thinking something nonmalicious has been started. This bypasses simple EDR’s.
The source .exe file is called the Image for a process.
Process hollowing:
Event: CreateProcess("notepad.exe")
▲
│
│
│ notepad.exe
┌───────────┐ │ ┌───────────┐
│ │ Start │ │ │
│ │ Suspended │ │ │
│ ├───────────┴─►│ │
│ │ │ │
│ │ ├───────────┤
│ │ Overwrite │ .text │
│ │ Memory │ │
│ ├──────────────┤► │
│ │ ├───────────┤
│ │ │ │
│ │ │ │
│ │ │ │
│ │ Resume │ │
│ ├─────────────►│ │
│ │ │ │
└───────────┘ └───────────┘
There are some other techniques:
- Process Hollowing: Overwrite process memory of suspended process with
WriteProcessMemory()
- Process Doppelgänging: Overwrite a file with Transactional NTFS (TxF), start the process, then roll back the transaction so the original file is restored
- Process Herpaderping: Write malicious code to a exe, create process, quickly replace malicious content with non-malicious one before it gets scanned
- Process Ghosting: Create empty file, semi-delete it, write malicious data, create process from it
Memory scanning will scan the memory of processes using signatures, like an AV. Therefore malicious code like CobaltStrike can still be identified, even if injected in a genuine process.
Or by comparing the process memory content with the exe file content. The original exe
name is stored in the PEB (peb.ProcessParameters.ImagePathName
), or the kernel’s
EPROCESS structure (eprocess.ImageFilename[15]
, eprocess.SeAuditProcessCreationInfo.ImageFileName
).
Comparing the content of memory with that of a file is performance intensive.
Alternatively, the EDR can gather telemetry which identifies the manipulations. Or the supporting techniques like direct syscalls, e.g. with call stack analysis.
Technique | Used API |
---|---|
Hollowing | CreateProcess, NtUnmapViewOfSection, VirtualAllocEx, WriteProcessMemory, SetThreadContext, ResumeThread |
Doppelgänging | CreateTransaction, CreateFileTransacted, NtCreateProcessEx |
Herpaderping | NtCreateSection, NtCreateProcessEx, NtCreateThreadEx |
Ghosting | CreateFileA, NtOpenFile, NtSetInformationFile, NtCreateSection, NtCreateProcess, WriteRemoteMem, NtCreateThreadEx |
Hollowing references:
- https://www.ired.team/offensive-security/code-injection-process-injection/process-hollowing-and-pe-image-relocations
- https://github.com/m0n0ph1/Process-Hollowing
- https://www.darkrelay.com/post/demystifying-hollow-process-injection
Module Stomping
This is similar to Image Spoofing, but with DLL’s.
Module stomping writes the shellcode into the .text section of a unused DLL in a remote process, and creates new thread starting starting there.
Event: LoadLibrary("genuine.dll")
▲
│
│
│ genuine.dll
┌───────────┐ │ ┌───────────┐
│ │ Load │ │ │
│ │ DLL │ │ │
│ ├───────────┴─►│ │
│ │ │ │
│ │ ├───────────┤
│ │ Overwrite │ .text │
│ │ Memory │ │
│ ├──────────────┤► │
│ │ ├───────────┤
│ │ │ │
│ │ │ │
│ │ │ │
│ │ Start │ │
│ ├─────────────►│ │
│ │ │ │
└───────────┘ └───────────┘
Same as Image Spoofing, it can be detected by:
- Memory signature scanning
- Memory/file comparison of .text section
- Telemetry of the stomping
- Identifying supporting techniques like direct/indirect syscalls with telemetry
References:
- https://www.blackhillsinfosec.com/dll-jmping/
- https://blog.f-secure.com/hiding-malicious-code-with-module-stomping/
- https://blog.f-secure.com/hiding-malicious-code-with-module-stomping-part-2/
- https://trustedsec.com/blog/loading-dlls-reflections
- https://williamknowles.io/living-dangerously-with-module-stomping-leveraging-code-coverage-analysis-for-injecting-into-legitimately-loaded-dlls/
- https://notes.dobin.ch/#root/PBXfEsTWGbEg/yFUsQJlBd3r0/iMYKnoX7AZ7w/W5TwpJ5or9DW-dRWk
Memory Encryption
It is possible to encrypt all suspicious regions before sleeping, and decrypt it again when the process resumes. This is not trivial, and requires great care, weird Windows functionality, and support from the payload (e.g. the beacon itself). It can create a lot of telemetry, but much of it is not well capturable by the EDR.
Event
│
│
│
Process Process ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
├───────────┤ ├───────────┤ │ │
│ │ │ │ Read │ │
│ .text ├─────────►│ .text ◄────────┤ EDR │
│ │ │ Encrypted│ Scan │ │
├───────────┤ ├───────────┤ │ │
│ │ │ │ │ │
│ │ │ ◄────────┤ │
│ .data │ │ .data │ │ │
│ │ │ Encrypted│ └───────────┘
│ │ │ │
└───────────┘ └───────────┘
A beacon usually Sleep()
for a certain amount of time. If it uses memory
encryption, any scans performed during this time will just see encrypted memory.
Callstack spoofing
The callstack is basically a function call hierarchy: a list of functions,
each called by the one before it. When a process calls a syscall (or a hooked ntdll.dll
function),
this list can be retrieved by the EDR and analyzed.
When using direct syscalls, indirect syscalls, or other shenanigans, the callstack looks “wrong” by default, which can be identified by the EDR.
Callstack spoofing makes sure that the callstack looks genuine again. It is a supporting technique: e.g. an AMSI-bypass can be detected by using callstacks, so we need to improve the AMSI-bypass so the callstack looks more natural.
The actual callstack spoofing usually doesnt generate telemetry, and can be implemented pretty savely. But by re-using existing callstack-spoofing implementations, it can be identified by signature scanning (be it on-disk, or in-memory).
Suspicious callstack for NtDelayExecution()
:
Clean (spoofed) callstack for NtDelayExecution()
:
Anti-Detection depends on faking the callstack, copying a clean one, or just hide the malicious callstack. Many techniques exist to check the integrity of the callstack, often by correlating with other information. The thread start address should originate from a reasonable location for example.
In a normal thread, the user mode start address is typically the third function call in the thread’s stack – after ntdll!RtlUserThreadStart and kernel32!BaseThreadInitThunk. So, when the thread has been hijacked, this is going to be obvious in the call stack For “early bird” APC injection, the base of the call stack will be ntdll!LdrInitializeThunk, ntdll!NtTestAlert, ntdll!KiUserApcDispatcher and then the injected code.
References:
- https://sabotagesec.com/gotta-catch-em-all-catching-your-favorite-c2-in-memory-using-stack-thread-telemetry/
- https://trustedsec.com/blog/windows-processes-nefarious-anomalies-and-you-threads
- https://www.mdsec.co.uk/2022/07/part-1-how-i-met-your-beacon-overview/
- https://gist.github.com/jaredcatkinson/23905d34537ce4b5b1818c3e6405c1d2
- https://whiteknightlabs.com/2024/04/30/sleeping-safely-in-thread-pools/
- https://oldboy21.github.io/posts/2024/06/sleaping-issues-swappala-and-reflective-dll-friends-forever/
- https://oldboy21.github.io/posts/2024/05/swappala-why-change-when-you-can-hide/
- https://kyleavery.com/posts/avoiding-memory-scanners/
Remote Processes
The attacker can choose if he wants to mess with his own process, or
another one of the system. The Windows functions described here can be
mostly also used on another process, just by using OpenProcess()
first.
This is mostly used for process injection. It is very useful to migrate into another process, like teams.exe. It C2 can be hidden in the normal communication of the application, its JavaScript so a lot of RW->RX allocations.
Messing with remote processes is more scrutinized by the EDR, it is
safer to just stay in your own process. Instead for migration, use DLL
sideloading, or other techniques which do not depend on OpenProcess()
something.
This includes:
- VirtualAllocEx() / VirtualFreeEx()
- ReadProcessMemory() / WriteProcessMemory()
- CreateRemoteThread()
- QueryInformationProcess() / NtQueryInformationProcess()
Process Child Process
┌──────────────┐ ┌─────────────┐
│ │ │ │
│ │ OpenProcess() │ │
│ ├────────────────────►│ │
│ │ handle │ │
│ HANDLE │◄────────────────────┤ │
│ │ │ │
│ │ VirtualAlloc(handle)│ │
│ ├────────────────────►│ │
└──────────────┘ └─────────────┘
Suspended processes
A very common approach is to create a suspended process with argument CREATE_SUSPEND
,
then mess with it, then let it execute/resume.
CreateProcessA("C:\\Windows\\System32\\calc.exe", NULL, NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi);
...
ResumeThread(pi.hProcess);
Many techniques depend on this functionality. Currently using suspended processes doesnt seem to bother the EDR much, but this may change it the future.
For example we can create a new process in suspended state, and queue an APC to execute our shellcode, which may make it invisible to an EDR (as it may be executed before KAPC injection).
Process Child Process
┌──────────────┐ ┌─────────────┐
│ │ │ │
│ │ CreateProcessA(suspended) │ │
│ ├────────────────────────────►│ │
│ │ │ │
│ HANDLE │◄────────────────────────────┤ │
│ │ │ │
│ │ VirtualAllocEx() │ │
│ │ WriteProcessMemory() │ │
│ │ QueueUserApc() │ │
│ ├────────────────────────────►│ │
│ │ │ │
│ │ │ │
│ │ ResumeThread() │ │
│ ├─────────────────────────────┤ │
└──────────────┘ └─────────────┘
Outro
EDR Wisdoms
- Use threatcheck or avred to identify which part of your stuff gets identified by AV, and patch it
- Memory scanning is performance intensive, and usually requires a trigger to be performed
- Usermode AMSI is less and less relevant, and therefore AMSI-hooks patching too
Mistakes writing loaders
-
Using function calls to copy memory
-
Putting more than minimal amount of effort into handling entropy
-
Putting more than minimal amount of effort into handling encryption
-
Generate too much telemetry
-
Threads not starting in backed memory
-
Marking RX pages RW again
-
Having unclean callstacks
Proposed Loader
Proposed loader layout:
┌──────────┐
│ encrypted│
│ Payload │
│ │
└────┬─────┘
│
│
▼
┌───────────┐ ┌──────────────┐ ┌─────────────┐ ┌───────────┐ ┌──────────┐ ┌────────────┐
│ EXE │ │ Execution │ │ Anti │ │EDR │ │ Alloc RW │ │ Payload │
│ File ├───►│ Guardrails ├───►│ Emulation ├───►│conditioner├──►│ Decode/Cp├────►│ Execution │
│ │ │ │ │ │ │ │ │ RX │ │ │
│ │ │ │ │ │ │ │ │ Exec │ │ │
└───────────┘ └──────────────┘ └─────────────┘ └───────────┘ └──────────┘ └────────────┘
- EXE File: All code should be contained in the .text section (IMAGE)
- Execution Guardrails: Only let it execute on the intended target (Anti-Middleboxes)
- Anti-Emulation: Stop AV emulating our binary (mem usage, cpu cycles count, time trickery…)
- EDR Feng-Shui: Condition EDR by doing a lot of our Alloc/Copy/VirtualProtect loop with nonmalicious data and free
- Payload: Encrypted (how doesnt matter)
- Alloc/Decode/Virtualprotect/Exec: As normal as possible (avoid using DLL functions here). Avoid RWX.
- Payload Execution: As normal as possible (jmp to payload, avoid creating new threads)
Not part
Detections based on:
- File access
- Registry access
- Network access
Low level techniques which are not discussed:
- Software breakpoints
- Hardware breakpoints
- VEH
- APC injection