The (Anti-)EDR Compendium

EDR functionality and bypasses in 2024, with focus on undetected shellcode loader.

Currently, there is a big focus on memory encryption for implants:

SWAPPALA / SLE(A)PING
Thread Pool / Pool Party
Gargoyle
Ekko
Cronos
Foliage

Also, there is a lot of work involving call stack spoofing:

ThreadStackSpoofer
CallStackSpoofer
AceLdr
CallStackMasker
Unwinder
TitanLdr

This is cool, but what if i told you it is not strictly necessary? Read on.

This is part of a three article series:

See Supermega for an introduction on how to use the SuperMega loader laboratory
See How EDR works for a discussion of EDR detection principles (this)
See Cordyceps EXE Injection for a discussion of Cordyceps approaches

The target audience is confused Red Teamers. Basic knowledge in anti-EDR and maldev is recommended.

I am not an EDR expert. I’ve just read Evading EDR by Matt Hand and Elastic Security-Labs and you should too. A absolutely great resources for offensive techniques is Evasive Malware by Kyle Cucci.

This article gets updated regularly, and is not mobile friendly. Last updates 20.01.2025, 01.10.2024.

I mentioned parts of this in a talk at HITB BKK 2024: My First and Last Shellcode Loader. Shortened, but with more background.

Intro

Whats an EDR

EDR is “Endpoint Detection and Response”. Its an agent deployed on each machine, which observes events generated by the OS to identify attacks. If it detects something, it will generate an alert and send it to the SIEM or SOAR, where it will be looked at by human analysts. “Reponse” means the actions performed after having identified a threat, like isolating the host, which is not part of this article. EPP is Endpoint Protection Platform, and will attempt to interrupt attacks instead of just detecting it.

The UI of MDE (Microsoft Defender for Endpoint): MDE UI Overviewo

We can see the EDR detected something, and attempts to give the analyst more information about the incident: Involved processes, their arguments and hashes, child processes etc. The analyst at the end has to make the decision if its a false positive or an active attack. But generally the RedTeam wants to avoid raising any alarms, and tries to stay under the radar.

EDR attempts to implement detections higher up on the pyramid of pain, mostly on TTP’s: Tools, Techniques, Procedures.

Pyramid Of Pain

Idealized EDR

Knowing and understanding of even just one EDR is hard, and of all EDR’s impossible. The EDR written about here is an abstract version of an ideal EDR. Not so much what is being done today, but what is theoretically possible with the available Windows sensor/telemetry infrastructure. The closest inspiration is Windows Defender for Endpoint (MDE), which I used for testing.

I will not teach you how to bypass a specific EDR, but how to think conceptually about the attack surface to implement your own techniques. The actual inner working of an EDR is mostly unknown (except in case of Elastic), and is considered a blackbox. While we mostly know what kind of information an EDR receives, it is not so clear how the information is being used and correlated internally.

An a hacker, we are interested in the input and output of a system. This article should give an overview of the input.

Shellcode Loader

A loader will load a shellcode. The shellcode is usually our beacon, like CobaltStrike, Sliver, or Metasploit.

The loader contains the encrypted shellcode, loads it into memory, and executes it.

┌───────────┐   ┌────────────┐    ┌────────┐
│           │   │            │    │        │
│  Loader   ├──►│ C2 Beacon  ├───►│ Profit │
│           │   │ Shellcode  │    │        │
│           │   │            │    │        │
└───────────┘   └────────────┘    └────────┘

Goal is to make this process not detected by EDR for Initial-Access (IA).

Shellcode Loader Example

When executing shellcode, it the usual steps are:

Allocate a memory region with read-write permissions
Copy shellcode into that region (decrypt it too)
Change permissions of memory region to read-execute
Execute the shellcode

Which looks like this in C, but is similar in most languages:

    char *shellcode = "\xAA\xBB...";
    char *dest = VirtualAlloc(NULL, 0x1234, 0x3000, p_RW);
    memcpy(dest, shellcode, 0x1234)
    VirtualProtect(dest, 0x1234, p_RX, &result)
    (*(void(*)())(dest))();  // jump to dest: execute shellcode

┌──────────┐                                  ┌───────────────┐
│          │      ┌─────────────────┐         │ Memory Region │
│          │      │ Alloc           │         │               │
│          │      │                 ├────────►│               │
│          │      └─────────┬───────┘         │               │
│          │                │                 │               │
│          │      ┌─────────▼───────┐         │               │
│ Payload  │      │ Copy & Decrypt  ├─────────►               │
│          ├─────►│                 │         │               │
│          │      └─────────┬───────┘         │               │
│          │                │                 │               │
│          │      ┌─────────▼───────┐         │               │
│          │      │ Make Executable ├────────►│               │
│          │      │                 │         │               │
│          │      └─────────┬───────┘         │               │
│          │                │                 │               │
│          │      ┌─────────▼───────┐         │               │
│          │      │ Execute         ├─────────►               │
│          │      │                 │         │               │
│          │      └─────────────────┘         │               │
└──────────┘                                  └───────────────┘

There are many variantions of this simple recipe, some of them focus on shellcode injection on remote processes. Which works the same by using OpenProcess() on the destination process, and use this as the hProcess argument for the function calls like VirtualAlloc(hProcess, ...) and WriteProcessMemory(hProcess, ...). Cross-process access using hProcess are more scrutinized by the EDR.

Another typical thing being done is to call the shellcode by creating a new thread. Be it with CreateThread() in your own address space, or CreateRemoteThread() for process injection or module stomping.

The copying itself, here performed by the userspace function memcpy(), can also be done with RtlCopyMemory() or others.

EDR Detection

Bubbles Of Bane

There are three main techniques for detection (of loaders):

File scanning: Signatures (“yara”) scan for files
Memory scanning: Signatures (“yara”) scan for process memory
Telemetry/Behaviour: Actions performed by the process (mostly via OS)

For example, Windows Defender Antivirus implements the AV scanning, while Windows Defender for Endpoint MDE is an EDR which heavily depends on telemetry to perform behaviour analysis. If it feels the need, it will scan the memory of processes too.

I call this the “Bubbles of Bane”:

            ┌───────────────────┐
            │         Memory    │
┌───────────┼─────┐   Scanning  │
│ AV        │     │             │
│ Signature │     │             │
│ Scanning  │     │             │
│       ┌───┼─────┼────────┐    │
│       │   │     │        │    │
│       │   └─────┼────────┼────┘
│       │         │        │     
└───────┼─────────┘        │     
        │                  │     
        │    Telemetry     │     
        │    Behaviour     │     
        │    Analysis      │     
        │                  │     
        └──────────────────┘

Most .exe file implants generated out of the box by C2 frameworks are signatured, and therefore not useful. Therefore the first step is to either obfuscate the code inside the exe (so that the signatures dont trigger), which is hard. For an example, see Harnessing the Power of Cobalt Strike Profiles for EDR Evasion .

Or alternatively, to use a loader, which carries the implant encrypted as payload and decrypt & loads it when executed. Most often this technique uses shellcode generated by the C2 (alternatively, as a DLL, or the EXE. It is possible to convert shellcode, .exe, .dll into each other, for example with Donut). The advantage using a loader is that the payload can be encrypted, so the only thing which needs to be obfuscated from AV file signature scanning is the actual loader itself.

Public loaders are usually signatured sooner or later. But they are easy to write in basically all langues Windows understands (C, .net C#, vba, vbs, powershell, jscript…). Simple self-written loaders are surprisingly effective, as this article will show.

Instead of scanning a file, the EDR can also scan the memory of processes. This defeats loaders, as the payload code has to be unencrypted in memory to be executed. To avoid detection in memory, the process needs to encrypt its memory regions when sleeping. Then at the time the EDR scans the process, nothing suspicious should be in memory. Memory scanning is a performance intensive operation, and only being done if the EDR thinks its worthwile. This is based on the telemetry collected (or in regular intervals “on-demand”, like once a day).

Typical memory scanners are pe-sieve and moneta

Most of the detection usecases depend on telemetry: Important function calls into Windows generate events which are processed, correlated and analysed by the EDR. Like changing of permissions of memory regions, creating processes and threads, copying memory and similar.

For example, if we use a loader to bypass AV, and simply allocate a memory region for our shellcode, we dont generate much telemetry for the EDR. But the payload will be detectable by a memory scanner. If we introduce memory encryption to bypass memory scanner, then we generate more telemetry, which in turn can be used to detect the memory encryption. Solving one of the three banes impacts the others, and you just turn in circles around the bubble of bane.

Bubbles of Bane with Ekko memory encryption:

            ┌───────────────────┐
            │         Memory    │
┌───────────┼─────┐   Scanning  │
│ AV        │     │             │
│ Signature │     │             │
│ Scanning  │     │             │
│       ┌───┼─────┼────────┐    │
│       │   │     │ [EKKO] │    │
│       │   └─────┼────────┼────┘
│       │         │        │     
└───────┼─────────┘        │     
        │                  │     
        │    Telemetry     │     
        │    Behaviour     │     
        │    Analysis      │     
        │                  │     
        └──────────────────┘

In case of Ekko, it solves the memory scanning, but creates telemetry (e.g. CreateTimerQueueTimer()). The implementation itself on github may be signatured too, which is solvable by obfuscating the code (like with help of avred) or re-implementation of the technique by yourself.

AV Signature Scanning

When a file is being written to disk, it will be scanned by the AV. The AV has a database of signatures with know-bad malware (like yara rules). File write events are generated by the OS and delivered to the AV via AMSI or kernel minifilter.

The signature scanning is based on the static content of the file. The PE headers will be parsed, and the content of the PE sections content scanned. It happens before the EXE will be executed. Upon positive detection, the file will be removed before execution.

A signature will look similar to a yara rule:

// https://github.com/Yara-Rules/rules/blob/master/malware/APT_APT17.yar (shortened)
rule APT17_Sample_FXSST_DLL 
{
    meta:
        ...        
    strings:
        $x1 = "Microsoft? Windows? Operating System" fullword wide
        $x2 = "fxsst.dll" fullword ascii
        $y1 = "DllRegisterServer" fullword ascii
        $y2 = ".cSV" fullword ascii
        $s1 = "VirtualProtect"
        $s2 = "Sleep"
        $s3 = "GetModuleFileName"
   
   condition:
        uint16(0) == 0x5a4d and filesize < 800KB and ( 1 of ($x*) or all of ($y*) ) and all of ($s*)
}

A general solution would be code obfuscation, which I will not cover in this article. It generally cannot be reliably applied on shellcode, but usually needs to be incorporated into the compiling process. That means each tool needs to implement it by itself.

It would solve all our problems: No signatures on-disk or in-memory, and no need to load it, therefore no telemetry.

            ┌───────────────────┐
            │         Memory    │
┌───────────┼─────┐   Scanning  │
│ AV        │     │             │
│ Signature │     │             │
│ Scanning  │     │             │
│       ┌───┼─────┼────────┐    │
│       │   │Obfus│        │    │
│       │   │catio│        │    │
│       │   │n    │        │    │
│       │   └─────┼────────┼────┘
│       │         │        │     
└───────┼─────────┘        │     
        │                  │     
        │    Telemetry     │     
        │    Behaviour     │     
        │    Analysis      │     
        │                  │     
        └──────────────────┘

https://retooling.io/blog/an-unexpected-journey-into-microsoft-defenders-signature-world https://i.blackhat.com/EU-21/Wednesday/EU-21-Mougey-Windows-Defender-demystifying-and-bypassing-asr-by-understanding-the-avs-signatures.pdf

Note: Reliable obfuscation of shellcode seems to be achievable, but usually depend on using RWX memory regions. But these are suspicious. A good example is Shikata-ga-nai (go implementation). A RX obfuscator is deoptimizer.

AV Emulation

The AV component will also perform emulation of the target binary. This happens before execution, probably after the AV signature scan, and as part of that Antivirus (not EDR).

Emulation means that the AV will read and interpret the ASM instructions in the .text section by itself. It does not execute them natively, it is not virtualized execution, and also not qemu/bochs full emulation. Its a CPU emulation, including common Windows syscalls, and subsystems like the filesystem or registry.

In pseudocode:

    asm_bytes = [
        0xB8, 0x04, 0x00, 0x00, 0x00,   # mov eax, 4
        0xBB, 0x06, 0x00, 0x00, 0x00,   # mov ebx, 6
        0x01, 0xD8                      # add eax, ebx
    ]

    asm_instructions = disassembler.disasm(asm_bytes);
    # asm_instructions = [
    #     { name = "mov", src = "4", dst="eax" }
    #     { name = "mov", src = "6", dst="ebx" }
    #     { name = "add", src = "ebx", dst="eax" }
    # ]

    for instruction in asm_instructions: 
      if instruction.name == "add":
        register[instruction.dst] += register[instruction.src]
      if instruction.name == "mov":
        ...

AV emulation creates their own “interpreter” for X86 assembly, and re-implements part of Windows OS syscalls, and with it a virtual file system (FileOpen()), virtual registry for RegOpen(), fake processes etc. The ntdll.dll function GetUserNameA() may be implemented to always return “JohnDoe” in the emulator.

Example experience for a RedTeamer:

Write a loader
Insert Metasploit shellcode
File being detected when dropped on disk

Then:

Write a second loader
Encrypt metasploit shellcode with strong AES
its still detected when dropped on disk

The AV Emulator will execute/emulate the loader. After a while execution stops, and the Metasploit shellcode is found unencrypted in memory. AV will then detect the signatures of it in memory.

There are an infinite amount of possibilities to detect an Emulator. But generally the emulation is not running forever, but restricted by:

What	Typical Limit
Time	?
Number of instructions	?
Number of API calls	?
Amount of memory used	?

Reference:

Windows Offender: Reverse Engineering Windows Defender’s Antivirus Emulator (video)

Receive Events

Once the .exe binary passed the AV signature scan and AV emulator (-memory) scans, it is allowed to execute.

┌─────────────────┐   ┌──────────────────┐    ┌──────────────────────┐    ┌───────────┐
│                 │   │  Antivirus Scan  │    │  Antivirus Emulation │    │ Execution │
│   File Write /  │   │                  │    │                      │    │           │
│   File Read     ├──►│  File            ├───►│  Memory              ├───►│ Make      │
│                 │   │  Signature       │    │  Signature           │    │ Process   │
│                 │   │  Scanning        │    │  Scanning            │    │           │
└─────────────────┘   └──────────────────┘    └──────────────────────┘    └───────────┘

For a process to do anything useful like opening files, creating network connections, or display a window, it needs to interact with the Windows Operating System Kernel (Windows OS, Kernel). There is no way around that. Microsoft enabled the Windows Kernel to create data about what its doing, and for software to consume this data, or events. EDR is such a consumer.

The EDR receives events of stuff processes are doing via the OS:

 Process                                                 
┌────────────────┐                    ┌─────────────┐    
│                │                    │             │    
│                │                    │  Windows    │    
│                │                    │  kernel     │    
├────────────────┤  Syscalls          │             │    
│ (Hooked)       ├───────────────────►│             │    
│                │                    │             │    
│ ntdll.dll      ├─────────────────┐  │             │    
│ NtApi          │   Usermode      │  │             │    
├────────────────┤   Hooks         │  └──────┬──────┘    
│                │                 │         │           
│                │                 │         │ kernel
│                │                 │         │ ETW     
│                │                 │         │           
│                │                 ▼         ▼           
│                │          ┌────────────────────────┐   
│                │          │      EDR               │   
│                │          └────────────────────────┘   
└────────────────┘

There are two main channels to receive data:

Usermode (hooked API)
Kernel callbacks (ETW, ETW-TI, kernel-mode callbacks)

These sensors will create events about what is happening in the system, when something is added/removed/changed like:

Files
Registry Keys
Processes, Threads
Memory Regions

The EDR will contain rules to match the events for malicious behaviour. Rules can be either:

Precise/Brittle: Detect one specific thing well (low False-Positive FP), easy to bypass
Robust: More generic detection, harder to bypass, higher FP, more exceptions

Note that the EDR does not see data modification inside the process by itself. Or in other words, a process calling a function RtlCopyMemory() of ntdll.dll will potentially generate telemetry, as ntdll.dll can be hooked. Doing the same with a byte-wise copy in a for-loop will not result in any telemetry at all.

Telemetry is gained from both hooked ntdll.dll and from the kernel, mostly via ETW. Usermode hooks can be trivially removed, but this generates telemetry by itself. The kernel events are more trustworthy, and cannot be removed or bypassed.

Note that the main execution unit for Windows is the thread, not a process. But to keep it simple, i will use process mostly.

The graphic is a bit oversimplified, and can be extended with more sensors, which are the input of an EDR:

                                                                     ┌──────────────┐          
                                                                     │              │          
        ┌─────────────┐ EtwWrite() ┌──────────┐   Kernel callbacks   │              │          
        │ Process     ├───────────►│          ├─────────────────────►│              │          
        │             │            │          │                      │              │          
        │             │            │          │                      │              │          
        ├─────────────┤            │   OS     │   ETW                │              │          
┌───────┤  ntdll.dll  │            │          ├─────────────────────►│              │          
│       │             │ syscall    │          │                      │              │          
│  ┌───►│             ├───────────►│          │   ETW-TI             │    EDR       │          
│  │    ├─────────────┤            │          ├─────────────────────►│              │          
│  │    │             │            └──────────┘                      │              │          
│  │    ├─────────────┤                                              │              │          
│  │    │ amsi.dll    │ pipe                      AMSI               │              │          
│  └────┤             ├─────────────────────────────────────────────►│              │          
│       │             │                                              │              │          
└──────►│             │                                              │              │          
        ├─────────────┤                                              │              │          
        │             │                                              │              │          
        │             │                                              │              │          
        │             │                                              │              │          
        │             │                                              │              │          
        └─────────────┘                                              └──────────────┘

EDR input is therefore:

Usermode hooks / AMSI
Kernel callbacks
ETW
ETW-TI

And I will discuss each of them individually.

Usermode Hooks

While the official kernel interface for Linux are syscalls, for Windows its ntdll.dll (which will issue the syscall for us instead). This is called the Native API (NtAPI). ntdll.dll will call the correct syscall for us. The Windows Application Program Interface (WinAPI), the other DLL’s like kernel32.dll, all use or call the NtAPI (ntdll.dll) at the end. Note that syscall numbers may change between Windows versions, and therefore hardcoding them is not reliable.

 WinAPI                                       NtApi                                 Kernel                  
┌─────────────────────────────────────────┐  ┌───────────────────────────────────┐                          
│                                         │  │                                   │                          
│                                         │  │                                   │                          
│ ┌────────────────┐   ┌────────────────┐ │  │ ┌─────────────────────────┐       │ ┌───────────────────────┐
│ │                │   │                │ │  │ │                         │syscall│ │                       │
│ │ kernel32.dll   ├──►│ kernelbase.dll ├─┼──┤►│ ntdll.dll               ├───────┤►│Kernel                 │
│ │ OpenProcess    │   │ OpenProcess    │ │  │ │ NtOpenProcess           │       │ │NtOpenProcess          │
│ │                │   │                │ │  │ │                         │       │ │                       │
│ └────────────────┘   └────────────────┘ │  │ └─────────────────────────┘       │ └───────────────────────┘
│                                         │  │                                   │                          
│                                         │  │                                   │                          
│ ┌────────────────┐   ┌────────────────┐ │  │ ┌─────────────────────────┐       │ ┌───────────────────────┐
│ │                │   │                │ │  │ │                         │syscall│ │                       │
│ │ kernel32.dll   ├──►│ kernelbase.dll ├─┼──┤►│ ntdll.dll               ├───────┼─►Kernel                 │
│ │ VirtualAllocEx │   │ VirtualAllocEx │ │  │ │ NtAllocateVirtualMemory │       │ │NtAllocateVirtualMemory│
│ │                │   │                │ │  │ │                         │       │ │                       │
│ └────────────────┘   └────────────────┘ │  │ └─────────────────────────┘       │ └───────────────────────┘
│                                         │  │                                   │                          
│                                         │  │                                   │                          
└─────────────────────────────────────────┘  └───────────────────────────────────┘                          
       ▲                                         ▲                                      ▲                   
       │                                         │                                      │                   
       │                                         │                                      │                   
    Usermode Hooks                            Usermode Hooks                         Kernel                 
    Specific                                  Generic                                Callbacks

Example NtAPI function in ntdll.dll, performing a syscall with ASM instruction syscall:

	SysNtCreateFile proc
			mov r10, rcx
			mov eax, 55h
			syscall
			ret
	SysNtCreateFile endp

Typical WinAPI call, with a hook:

                                                                                ┌─────────────────┐
                                                                                │                 │
┌───────────────────┐   ┌─────────────────┐   ┌───────────────────┐             │                 │
│                   │   │                 │   │                   │             │      OS         │
│  Application.exe  │   │ kernel32.dll    │   │  ntdll.dll        │  syscall    │                 │
│                   ├──►│                 ├──►│                   ├────────────►│                 │
│  .text            │   │ CreateFile()    │   │  NtCreateFile()   │             │      kernel     │
│                   │   │                 │   │                   │             │                 │
└───────────────────┘   └─────────────────┘   └─────────┬─────────┘             │                 │
                                                        │hook                   │                 │
                                                        │                       │                 │
                                               ┌────────▼────────────────┐      │                 │
                                               │                         │      │                 │
                                               │ amsi.dll                │      │                 │
                                               │                         │      │                 │
                                               │ NtCreateFile_Hook()     │      │                 │
                                               │                         │      │                 │
                                               └─────────────────────────┘      │                 │
                                                         │                      └─────────────────┘
                                                         ▼
                                                        EDR

Userspace hooks are just patches in ntdll.dll exported functions, which call into another DLL function, before the actual function is executed. Windows provides functionality to directly hook functions / IAT.

 Original Function On-Disk:              EDR Hooked Function In-Memory:
 ----------------------                  -----------------------

 mov     r10, rcx                        mov     r10, rcx
>mov     eax, 50h                        jmp     0x7ffaeadea621
 test    byte ptr [0x7FFE0h], 1          test    byte ptr [0x7FFE0h], 1
 jne     0x17e76540ea5                   jne     0x17e76540ea5
 syscall                                 syscall
 ret                                     ret

Examples of commonly hooked ntdll.dll functions:

Function name	Related attacker techniques
NtOpenProcess	Process Injection
NtAllocateVirtualMemory	Process Injection
NtWriteVirtualMemory	Process Injection
NtCreateThreadEx	Process Injection
NtSuspendThread	APC Shellcode Injection
NtResumeThread	APC Shellcode Injection
NtQueueApcThread	APC Shellcode Injection

The EDR receives the function call names and its parameters as telemetry.

This is accomplished by using kernel callsbacks (PsSetCreateProcessNotifyRoutine) to get notified whenever a new process is created at an early stage, and then inject a DLL into the process (like amsi.dll), creating an APC, which calls a function of the DLL which patches the original ntdll.dll functions (or usually, the DLL Import Address Table) to take a detour into amsi.dll by using Asyncronous Procedure Calls (kKAPC injection).

After ntdll.dll is patched, each function call will therefore be intercepted by amsi.dll.

EDR function hooking with KAPC will create a APC which performs the hooking. The technique “Early Bird APC injection” uses the same APC mechanism, which can therefore run before the KAPC hooking has been performed.

These usermode hooks can be bypassed with:

Direct syscalls (avoid calling ntdll.dll)
Indirect syscalls (calling ntdll.dll functions, but after the hook)
Patching / restoring ntdll.dll (removing the hooks completely)

Usermode hooks are easy to bypass, as they are completely located in “our own” memory space, where we can freely mess with it. But restoring ntdll.dll itself would generate telemetry by itself, which requires the usage of for example direct syscalls.

An EDR should not depend solely on usermode hooks, but only use them for auxiliary telemetry. They provide more information than kernel telemetry like ETW. The kernel only “sees” the syscall/ntdll.dll function, not the original function which was originally initiated. This is useful, as it generates more generic detections, without depending on hooking all the weird and unusual DLL functions. But it may generate more false positives, as it more difficult to identify “non-malicious” behaviour with just the syscalls, as there is less information available.

For example, CreateFileA(), CreateFileW(), OpenFile() and CreateFileTransacted() will all call NtCreateFile() at the end.

Note that the callstack can show which function in the chain has been initially called. Usermode hooks are used less and less, and not by all EDRs ( source):

EDR Usermode Hooks

Kernel telemetry

The Windows OS provides information about processes in form of notification callback routines. Especially about process-, thread- and image-creation. It is generated by the kernel itself, there is no way to surpress these like with usermode hooks (without kernel privileges).
These callbacks are initiated in the context of the relevant process and thread. Therefore the events have information about the origin process.

There are various different sources of kernel mode instrumentation:

ETW (Windows Event Tracing infrastructure)
ETW-TI (Thread Intelligence)
Kernel Callbacks (PsSetCreateProcessNotifyRoutine etc.)
NDIS / Minifilter drivers (for filesystem)

Kernel callbacks are:

PsSetCreateProcessNotifyRoutine: Process creation, termination
PsSetCreateThreadNotifyRoutine: Thread creation, deletion
PsSetLoadImageNotifyRoutine: Windows image loader
ObRegisterCallbacks: Object Manager callbacks, like NtOpenProcess, NtOpenThread, NtOpenFile, …

Reference:

https://blog.whiteflag.io/blog/from-windows-drivers-to-a-almost-fully-working-edr/

An example event is PS_CREATE_NOTIFY callback, which gives the EDR different pieces of information:

Field	Notes
ParentProcessId
CreatingThreadId
*FileObject	The .exe on disk
ImageFileName	Parameter of created process
CommandLine	Parameter of created process
CreationStatus

Sysmon can capture this event from the kernel, and will produce the following:

Process Create:
RuleName: - 
UtcTime: 2024-04-28 22:08:22.025

ProcessGuid: {a23eae89-bd56-5903-0000-0010e9d95e00}
ProcessId: 6228
Image: C:\Windows\System32\wbem\WmiPrvSE.exe
FileVersion: 10.0.22621.1 (WinBuild.160101.0800)
Description: WMI Provider Host
Product: Microsoft® Windows® Operating System
Company: Microsoft Corporation
OriginalFileName: Wmiprvse.exe
CommandLine: C:\Windows\system32\wbem\wmiprvse.exe -secured -Embedding
CurrentDirectory: C:\Windows\system32\

User: NT AUTHORITY\NETWORK SERVICE
LogonGuid: {a23eae89-b357-5903-0000-002005eb0700}
LogonId: 0x7EB05
TerminalSessionId: 1
IntegrityLevel: System
Hashes: SHA1=91180ED89976D16353404AC982A422A707F2AE37,MD5=7528CCABACCD5C1748E63E192097472A,SHA256=196CABED59111B6C4BBF78C84A56846D96CBBC4F06935A4FD4E6432EF0AE4083,IMPHASH=144C0DFA3875D7237B37631C52D608CB

ParentProcessGuid: {a23eae89-bd28-5903-0000-00102f345d00}
ParentProcessId: 580
ParentImage: C:\Windows\System32\svchost.exe
ParentCommandLine: C:\Windows\system32\svchost.exe -k DcomLaunch -p
ParentUser: NT AUTHORITY\SYSTEM

Note that only the fields ImageFilename, CommandLine, ParentProcessId translate directly to the Image, CommandLine, ParentProcessId of the kernel event. But most of the other information is gathered by Sysmon additionally. These additional information are gathered by querying the kernel, for example by issuing GetProcessInformation on the ProcessId. Or in other ways, like parsing the PEB of the process. Not all information provided is equally trustworthy.

A ETW ImageLoad event from Microsoft-Windows-kernel-Process recorded with SilkETW:

{
  ProviderGuid: "22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716",
  ProviderName: "Microsoft-Windows-kernel-Process",
  EventName: "ImageLoad",
  ThreadID: 9584,
  ProcessID: 7536,
  ProcessName: "notepad",

  YaraMatch: [],
  Opcode: 0,
  OpcodeName: "Info",
  TimeStamp: "2024-07-08T19:06:10.8845667+01:00",
  PointerSize: 8,
  EventDataLength: 142,

  XmlEventData: {
    ProviderName: "Microsoft-Windows-kernel-Process",
    FormattedMessage: "Process 7’536 had an image loaded with name \Device\HarddiskVolume2\Windows\System32\notepad.exe. ",
    
    EventName: "ImageLoad"
    ProcessID: "7’536",
    PID: "7536",
    TID: "9584",
    
    PName: "",
    DefaultBase: "0x7ff631650000",
    ImageName: "\Device\HarddiskVolume2\Windows\System32\notepad.exe",
    ImageBase: "0x7ff631650000",
    ImageCheckSum: "265’248",
    ImageSize: "0x38000",

    MSec: "9705.0646",
    TimeDateStamp: "1’643’917’504",
  }
}

Memory Regions

Upon starting an .exe, the sections in the PE .exe file get copied into memory, completely as blocks.

.text contains the assembly code, while the .data and similar contains data for the program.

New memory regions can be created using VirtualAlloc() or similar.

 EXE                                               
 Program                 Process                   
                                                   
┌──────────┐            ┌──────────────┐           
│          │            │              │           
│  Header  ├───────────►│ Header       │           
│          │            │              │           
├──────────┤            ├──────────────┤           
│          │            │              │           
│          │            ├──────────────┤           
│  .text   ├─────┐      │              │     Backed 
│          │     │      │              │     RX    
│          │     └─────►│ .text        │           
├──────────┤            │              │           
│          │            │              │           
│  .data   ├────┐       ├──────────────┤           
│          │    │       │              │           
│          │    │       │              │           
└──────────┘    │       ├──────────────┤           
                │       │              │    Backed  
                │       │              │    RW     
                └──────►│ .data        │           
                        │              │           
                        ├──────────────┤           
                        │              │           
                        │              │           
                        ├──────────────┤           
                        │              │           
                        │ Virtual      │    Unbacked
                        │ Alloc()      │    RW     
                        │              │           
                        └──────────────┘

The memory regions coming from the PE file image are called backed regions. They are trustworthy, as they are 1:1 copies from the PE file, which is scanned on-disk by the AV. The memory regions are “backed” by the file on-disk. It can also be called IMAGE regions.

If the process wants additional memory by allocating it, it is “unbacked”. Also called USER memory or PRIVATE. There is no file “backing” this memory region.

So memory regions having the properties:

USER/PRIVATE/Unbacked: Bad, potentially malicious, shellcode
IMAGE/Backed: Good, pretty trusted

This is mainly as shellcode from exploits or process injection usually lives in PRIVATE memory. Also threads should start from backed regions. PRIVATE RWX memory is even more suspicious.

Here some trustworthy memory regions of type IMG (IMAGE, backed): Memory Regions Good

Here some untrustworthy memory regions of type PRV (PRIVATE, unbacked): Memory Regions Bad

One cool property of memory pages is Copy-On-Write (COW). A memory scanner is able to check if the memory page was written to, which is unusual for read-only .text sections and others, as these should be shared between processes. This is used by Moneta via PSAPI_WORKING_SET_EX_BLOCK from PSAPI_WORKING_SET_EX_INFORMATION structure. Data-only attacks, e.g. for AMSI-patch or ETW-patch, are preferred.

References:

Memory Scanning

Memory signature scanning will detect malicious code in-memory, in either .text or data sections (stack, heap, .data etc.).

                        Event       
                          │       
 Process                  ▼       
┌───────────┐        ┌───────────┐
│           │        │           │
│           │        │           │
│           │        │           │
├───────────┤        │           │
│           │  Read  │           │
│ .text     ◄────────┤    EDR    │
│  (bad)    │  Scan  │           │
├───────────┤        │           │
│           │        │           │
│           ◄────────┤           │
│ .data     │        │           │
│  (bad)    │        └───────────┘
│           │                     
└───────────┘

Its basically same like AV signature scanning; grep or yara' the memory content against known malicious signatures.

Memory scanning is performance intensive. It is not done constantly, but depends on a trigger.

Query Process Information

The EDR, upon receiving events, will also attempts to enrich it:

Process information (like executable name and command line arguments)
Memory scan (possibly)
Process image file scan (rarely)

                                                                                              
                                                                     ┌──────────────┐          
                                                                     │              │          
        ┌─────────────┐ EtwWrite() ┌──────────┐   Kernel callbacks   │              │          
        │ Process     ├───────────►│          ├─────────────────────►│              │          
        │             │            │          │                      │              │          
        │             │            │          │                      │              │          
        ├─────────────┤            │   OS     │   ETW                │              │          
┌───────┤  ntdll.dll  │            │          ├─────────────────────►│              │          
│       │             │ syscall    │          │                      │              │          
│  ┌───►│             ├───────────►│          │   ETW-TI             │    EDR       │          
│  │    ├─────────────┤            │          ├─────────────────────►│              │          
│  │    │             │            └──────────┘                      │              │          
│  │    ├─────────────┤                                              │              │          
│  │    │ amsi.dll    │ pipe                      AMSI               │              │          
│  └────┤             ├─────────────────────────────────────────────►│              │          
│       │             │                                              │              │          
└──────►│             │                                              │              │          
        ├─────────────┤                                              │              │          
        │             │                                              │              │          
        │             │                                              │              │          
        │  ┌──────────┤                          Process Info        │              │          
        │  │          │◄─────────────────────────────────────────────┤              │          
        │  │ PEB      │                                              │              │          
        │  │ Eprocess │                                              │              │          
        │  │          │                                              └──┬──┬────────┘          
        │  │          │                                                 │  │                   
        │  └──────────┤                          Memory Scan            │  │                   
        │             │◄────────────────────────────────────────────────┘  │                   
        └───────▲─────┘                                                    │                   
                │                                                          │                   
         File   │                                                          │                   
         ┌──────┴────┐                            File Scan                │                   
         │           │◄────────────────────────────────────────────────────┘                   
         │           │                                                                         
         │           │                                                                         
         │           │                                                                         
         └───────────┘

The EDR does not only receive events, but will also actively query the OS for more information. For example, when receiving a PS_CREATE_NOTIFY event, the EDR will gain more information about the process creating the event, like by using GetProcessInformation() or OpenProcess(), for execuatble name, accessing the PEB, arguments, or list memory regions. Like accessing the ImageFileName and scan the origin EXE PE file.

Note that the EDR is a normal process, even if SYSTEM or PPL’d, and having its own dedicated kernel driver. With its SYSTEM privileges it can gather information about pretty much all other processes. An EDR will probably mostly depend on querying processes from its userspace agent part, as doing it in kernel would be… dangerous (see the CrowdStrike incident).

Here is an example of a PsSetCreateProcessNotifyRoutine handler function:

void CreateProcessNotifyRoutine(HANDLE ppid, HANDLE pid, BOOLEAN create) {
    if (create) {
        PEPROCESS process = NULL;
        PUNICODE_STRING processName = NULL;

        // Retrieve the process name from the EPROCESS structure
        PsLookupProcessByProcessId(pid, &process);
        SeLocateProcessImageName(process, &processName);

        DbgPrint("MyDumbEDR: %d (%wZ) launched", pid, processName);
    }
}

The handler function only received the pid of the process. To also display the image name, a few functions have to be called, which access PEB or EPROCESS structure.

Data stored in the PEB (Process Environment Block, at GS:[0x60]). It is in usermode, and can be manipulated freely.

ImageBase Address
loaded DLLs
process parameters:
- image name
- arguments
- environment variables
- working directory

EPROCESS is a kernel data structure, and cannot be manipulated directly (sometimes indirectly):

process create and exit time
process id
parent process id
address of PEB
image filename
- similar to process parameters image name in the PEB
- also available in the SectionObject

Process Information Data Structures

The PEB:

    BOOLEAN InheritedAddressSpace;
    BOOLEAN ReadImageFileExecOptions;
    BOOLEAN BeingDebugged;
    union
    {
        BOOLEAN BitField;
        struct
        {
            BOOLEAN ImageUsesLargePages : 1;
            BOOLEAN IsProtectedProcess : 1;
            BOOLEAN IsImageDynamicallyRelocated : 1;
            BOOLEAN SkipPatchingUser32Forwarders : 1;
            BOOLEAN IsPackagedProcess : 1;
            BOOLEAN IsAppContainer : 1;
            BOOLEAN IsProtectedProcessLight : 1;
            BOOLEAN IsLongPathAwareProcess : 1;
        };
    };

    HANDLE Mutant;

    PVOID ImageBaseAddress;
    PPEB_LDR_DATA Ldr;
    PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
    PVOID SubSystemData;
    PVOID ProcessHeap;
    PRTL_CRITICAL_SECTION FastPebLock;
    PSLIST_HEADER AtlThunkSListPtr;
    PVOID IFEOKey;

    union
    {
        ULONG CrossProcessFlags;
        struct
        {
            ULONG ProcessInJob : 1;
            ULONG ProcessInitializing : 1;
            ULONG ProcessUsingVEH : 1;
            ULONG ProcessUsingVCH : 1;
            ULONG ProcessUsingFTH : 1;
            ULONG ProcessPreviouslyThrottled : 1;
            ULONG ProcessCurrentlyThrottled : 1;
            ULONG ProcessImagesHotPatched : 1; // REDSTONE5
            ULONG ReservedBits0 : 24;
        };
    };
    union
    {
        PVOID KernelCallbackTable;
        PVOID UserSharedInfoPtr;
    };
    ULONG SystemReserved;
    ULONG AtlThunkSListPtr32;
    PAPI_SET_NAMESPACE ApiSetMap;
    ULONG TlsExpansionCounter;
    PRTL_BITMAP TlsBitmap;
    ULONG TlsBitmapBits[2]; // TLS_MINIMUM_AVAILABLE

    PVOID ReadOnlySharedMemoryBase;
    PSILO_USER_SHARED_DATA SharedData; // HotpatchInformation
    PVOID* ReadOnlyStaticServerData;

    PVOID AnsiCodePageData; // PCPTABLEINFO
    PVOID OemCodePageData; // PCPTABLEINFO
    PVOID UnicodeCaseTableData; // PNLSTABLEINFO

    ULONG NumberOfProcessors;
    ULONG NtGlobalFlag;

    ULARGE_INTEGER CriticalSectionTimeout;
    SIZE_T HeapSegmentReserve;
    SIZE_T HeapSegmentCommit;
    SIZE_T HeapDeCommitTotalFreeThreshold;
    SIZE_T HeapDeCommitFreeBlockThreshold;

    ULONG NumberOfHeaps;
    ULONG MaximumNumberOfHeaps;
    PVOID* ProcessHeaps; // PHEAP

    PVOID GdiSharedHandleTable; // PGDI_SHARED_MEMORY
    PVOID ProcessStarterHelper;
    ULONG GdiDCAttributeList;

    PRTL_CRITICAL_SECTION LoaderLock;

    ULONG OSMajorVersion;
    ULONG OSMinorVersion;
    USHORT OSBuildNumber;
    USHORT OSCSDVersion;
    ULONG OSPlatformId;
    ULONG ImageSubsystem;
    ULONG ImageSubsystemMajorVersion;
    ULONG ImageSubsystemMinorVersion;
    KAFFINITY ActiveProcessAffinityMask;
    GDI_HANDLE_BUFFER GdiHandleBuffer;
    PVOID PostProcessInitRoutine;

    PRTL_BITMAP TlsExpansionBitmap;
    ULONG TlsExpansionBitmapBits[32]; // TLS_EXPANSION_SLOTS

    ULONG SessionId; 

    [...]
}

Whereas ProcessParameters is:

typedef struct _RTL_USER_PROCESS_PARAMETERS {
  BYTE           Reserved1[16];
  PVOID          Reserved2[10];
  UNICODE_STRING ImagePathName;
  UNICODE_STRING CommandLine;
} RTL_USER_PROCESS_PARAMETERS, *PRTL_USER_PROCESS_PARAMETERS;

Containing the .exe path and commandline.

Callstack Analysis

When a process calls a function, it is possible to find out the parent functions which lead to this call, recursively. This is called the callstack.

Exaple callstack

The EDR can chose to inspect the process initiating a function or API call, and analyze the call stack for suspicious things:

 Process
┌──────────────────────────────────────────────────────────────────────┐           ┌─────────────────┐ 
│                                                                      │           │  OS  kernel     │ 
│  ┌───────────────────┐   ┌─────────────────┐   ┌───────────────────┐ │           │                 │ 
│  │                   │   │                 │   │                   │ │           │                 │ 
│  │  Application.exe  │   │ kernel32.dll    │   │  ntdll.dll        │ │syscall    │                 │ 
│  │                   ├──►│                 ├──►│                   ├─┼──────────►│ NtWriteFile()   │ 
│  │  .text            │   │ CreateFile()    │   │  NtCreateFile()   │ │           │                 │ 
│  │                   │   │                 │   │                   │ │           └────┬────────────┘ 
│  └───────────────────┘   └─────────────────┘   └───────────────────┘ │                │              
│                                                                      │                │Notify        
│                             Stack                                    │                │              
│                            ┌──────────────────────────────────┐      │                ▼              
│                            │ Application.exe: SomeFunction()  │      │  Inspect  ┌─────────────────┐ 
│                            │ kernel32.dll: CreateFile()       │◄─────┼───────────┤                 │ 
│                            │ ntdll.dll: NtCreateFile()        │      │           │                 │ 
│                            └──────────────────────────────────┘      │           │                 │ 
│                                                                      │           │     EDR         │ 
│                                                                      │           │                 │ 
│                                                                      │           │                 │ 
└──────────────────────────────────────────────────────────────────────┘           └─────────────────┘

It is possible to detect a wide variety of attacks and bypasses with this technique.

A callstack’s origin should be from an memory region from backed memory, go through a supporting DLL (e.g. user32.dll), then ntdll.dll, and where finally the actual syscall instruction is executed.

If call comes from a unbacked region, it is most likely from a shellcode (or non-malicious mabe JIT like JavaScript).

Elastic has callstack analysis rules to identify:

Direct syscalls
Callback-based evasion
Module Stomping
Library loading from unbacked region
Process created from unbacked region

Call stack analysis is usually not applied to all API functions. Elastic mentions the following:

VirtualAlloc, VirtualProtect
MapViewOfFile, MapViewOfFile2
VirtualAllocEx, VirtualProtectEx
QueueUserAPC
SetThreadContext
WriteProcessMemory, ReadProcessMemory

Reference:

Thread State Analysis

Threads can be sleeping for different reasons. Investigating the state, and how the thread got there due his callstack, we find indicators for sleeping beacons, or memory encryption.

Clean (spoofed) callstack for NtDelayExecution(): Sleep Callstack Spoofed

If memory encryption is being used, the thread is usually put to sleep by calling either:

Kernelbase.dll!SleepEx
ntdll.dll!NtDelayExecution

Suspicious things for calls to these sleep functions:

Calls to virtual memory in the callstack
Source in non-backed memory regions

Reference:

https://www.mdsec.co.uk/2022/07/part-1-how-i-met-your-beacon-overview/

Performance Impact

Performance of the EDR is of utmost importance. If developer machines are slow when installing 10'000 NPM packages, people will move to Apple where protections are less, and Microsoft cant allow that. This is such a problem that Microsoft introduced asyncronous Dev Drive scanning.

The least performance intensive operation would be if the detection can be applied directly to a rare event (lets say, opening of a process handle to lsass.exe). Memory scans can involve yara-scanning megabytes of .text sections, or meny alloc’ed memory regions, which is very expensive. Scanning files is the most expensive, even with SSDs.

Most detections are in between those: One or multiple events with suspicious information, which leads to some more correlation. These then may kick-off the memory scanning.

Performance Impact	What
1	Event
3	Event Correlation
10	Query process
100	Memory Scan
1000	File Scan

What could trigger a memory scan?

What	Triggers scan?	Notes
VirtualAlloc()	No	too common, except when RWX
WriteProcessMemory()	No	very common
memcpy()	No	Not visible for EDR
VirtualProtect	No?	RWX or RW->RX may be trigger
CreateRemoteThread()	Yes	Should trigger memory scan

VirtualAlloc() and WriteProcessMemory() are very commonly called functions. CreateRemoteThread() is not only less often called, it is also a more clear indicator of potentially malicious behaviour.

EDR Attacks

The EDR receives events from a large amount of sensors, with various trustworthyness. Also much of the information required from the process is not available in the event itself, but has to be access in or via the kernel (KPROCESS, EPROCESS) or the process memory space itself (e.g. PEB including command line arguments, parent process id).

Many attacks depend on the fact of TOCTOU vulnerability: time of check, time of use.

Command Line Spoofing

EDR’s can check for potentially malicious command line arguments, for newly spawned processes, for example when using mimikatz: mimikatz.exe "privilege::debug" "lsadump::sam". Even if we rename mimikatz.exe, the arguments privilege::debug is a pretty clear indicator with low false positive rate.

But in windows, its possible to spoof command line arguments. The process' command line arguments are stored in the PEB of the respective process. Additionally when we create a new process, the process-creation function will also contain the initial arguments (of the exe to be started).

So we have basically two places for command line arguments:

Child process: Data in PEB
Parent proces: Event from the child create function: CreateProcessW(..., "command line args", ...)

In the PEB:

typedef struct _PEB {
  ...
    PRTL_USER_PROCESS_PARAMETERS  ProcessParameters;
  ...
}

typedef struct _RTL_USER_PROCESS_PARAMETERS {
  ...
  UNICODE_STRING ImagePathName;
  UNICODE_STRING CommandLine;
} *PRTL_USER_PROCESS_PARAMETERS;

As the PEB is modifiable by its process, data in it cannot be trusted.

EDR queries an existing process for its command line, and usually trusts it blindly:

┌────────────────────┐                  ┌─────────────────┐    
│ Process            │                  │                 │    
│                    │                  │                 │    
│      PEB           │                  │                 │    
│     ┌──────────────┤                  │                 │    
│     │              │                  │    EDR          │    
│     │ ImageName    │◄─────────────────┤                 │    
│     │ CommandLine  │                  │                 │    
│     │              │                  │                 │    
│     └──────────────┤                  │                 │    
│                    │                  │                 │    
└────────────────────┘                  └─────────────────┘

But it can be verified by correlating in with the CreateProcess() event including its parameters.

When a parent process calls CreateProcess() to create a child process:

   ┌─────────┐                  ┌──────────┐           ┌───────────┐
   │ Process │                  │          │           │ Child     │
   │         │  CreateProcess() │   OS     │ Spawns    │ Process   │
   │         ├─────────────────►│          ├──────────►│           │
   │         │          ▲       │          │           │           │
   │         │          │       └──────────┘           │PEB        │
   │         │          │                              ├─────────┐ │
   │         │          │        ┌───────┐             │ Command │ │
   │         │          │        │       │       ┌────►│ Line    │ │
   │         │          └────────┤  EDR  ├───────┘     ├─────────┘ │
   │         │                   │       │             │           │
   └─────────┘                   └───────┘             └───────────┘

The EDR can compare the command line in CreateProcess() and then the PEB of the resulting child process, and alert if they dont match.

Intercepting the function call arguments in CreateProcessW(..., "command line args", ...) does not really help much either, as we can create the process in a suspended state with fake arguments, overwrite them with the correct ones remotely, and then resume the process.

Parent: Create new suspended process with fake arguments (“fakearg.txt”)
EDR: receives event with fake arguments (“fakearg.txt”)
Parent: Overwrite PEB of child with real arguments (“privilege::debug”)
Parent: Continue (start) child process (using real arguments) (“privilege::debug”)
Child process: Overwrite its PEB with fake arguments again (“fakearg.txt”)
EDR: querying the process gets the fake arguments (“fakearg.txt”)

The two EDR events in 2) and 6) are still innocent when correlated, but the process was started with malicious arguments, which was not visible to the EDR (without additional effort).

Therefore, if the EDR thinks the child process is malicious in the future, it wants to provide information to the analyst, including the process' command line arguments, taken from the PEB. So the child process needs to overwrite the PEB again, as a “cleanup”.

Command line arguments for processes are therefore pretty untrustworthy.

PPID Spoofing

In Windows, unlike Linux, there is no dependency between parent- and child process, as there is (was) no fork(). The child gets certain attributes from the parent, including the PID of the parent. It will also be stored in the EPROCESS structure of the process.

The function CreateProcessW() can be instructed to provide its own attributes, including the parent process of the child, in the STARTUPINFOEX structure. So already upon creation, we can give the child a wrong parent PID.

CreateProcessW() interface:

BOOL CreateProcessW(
  [in, optional]      LPCWSTR               lpApplicationName,
  [in, out, optional] LPWSTR                lpCommandLine,
  [in, optional]      LPSECURITY_ATTRIBUTES lpProcessAttributes,
  [in, optional]      LPSECURITY_ATTRIBUTES lpThreadAttributes,
  [in]                BOOL                  bInheritHandles,
  [in]                DWORD                 dwCreationFlags,
  [in, optional]      LPVOID                lpEnvironment,
  [in, optional]      LPCWSTR               lpCurrentDirectory,
  [in]                LPSTARTUPINFOW        lpStartupInfo,  // PPID spoofing here
  [out]               LPPROCESS_INFORMATION lpProcessInformation
);

The actual PPID spoofing is just setting attributes in struct STARTUPINFOEX and give this as lpStartupInfo parameter:

{
  STARTUPINFOEXA si;
  HANDLE fakeParent = OpenProcess(.., <pid of fake parent process>);
  ..
  UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_PARENT_PROCESS, &fakeParent, ..);
  CreateProcessA(NULL, (LPSTR)"notepad", .., EXTENDED_STARTUPINFO_PRESENT, .., &si.StartupInfo, ..);
}

Where:

typedef struct _STARTUPINFOEXA {
  STARTUPINFOA                 StartupInfo;
  LPPROC_THREAD_ATTRIBUTE_LIST lpAttributeList; // attributes, one is the ppid
} STARTUPINFOEXA, *LPSTARTUPINFOEXA;

It will be stored in the EPROCESS kernel structure:

typedef struct _EPROCESS
{
    KPROCESS Pcb;
    ...
    HANDLE InheritedFromUniqueProcessId; // PPID
    ...
}

This can be retrieved by the EDR with NtQueryInformationProcess():

__kernel_entry NTSTATUS NtQueryInformationProcess(
  [in]            HANDLE           ProcessHandle,
  [in]            PROCESSINFOCLASS ProcessInformationClass,
  [out]           PVOID            ProcessInformation,  // PROCESS_BASIC_INFORMATION
  [in]            ULONG            ProcessInformationLength,
  [out, optional] PULONG           ReturnLength
);

typedef struct _PROCESS_BASIC_INFORMATION {
    NTSTATUS ExitStatus;
    PPEB PebBaseAddress;
    ULONG_PTR AffinityMask;
    KPRIORITY BasePriority;
    ULONG_PTR UniqueProcessId;
    ULONG_PTR InheritedFromUniqueProcessId; // PID
} PROCESS_BASIC_INFORMATION;

PPID spoofing can be detected, as upon process creation, an event is delivered to the EDR about the new process. This event is usually in the context of the origin process, or the process is referenced in it. The EDR can then compare the content of the STARTUPINFOEX structure with the process the event comes from (e.g. by just comparing the PID of both). Here EDR sees the CreateProcess() call with PPID=y (2), and the effective PID of the process initiating this call (1) having PID=x.

  ┌─────────┐                  ┌──────────┐           ┌───────────┐
  │ Process │  CreateProcess() │          │           │ Child     │
  │         │  PPID=y          │   OS     │ Spawns    │ Process   │
  │         ├─────────────────►│          ├──────────►│           │
  │         │          ▲       │          │           │           │
  │         │          │       └──────────┘           │EPROCESS   │
  │ ┌───────┤    1     │2                             ├─────────┐ │
  │ │PID=x  │◄─────────┤        ┌───────┐          3  │ PPID=y  │ │
  │ │       │          │        │       │       ┌────►│         │ │
  │ └───────┤          └────────┤  EDR  ├───────┘     ├─────────┘ │
  │         │                   │       │             │           │
  └─────────┘                   └───────┘             └───────────┘

So the EDR has:

Parent: PID
Parent: PPID in its issued CreateProcess() call destined for the child
Child: Its PPID

And compare those, especially 1) and 2). Or later 1/2 and 3. It is not always completely clear for the events received, where the origin PID comes from (for example with ETW).

Note that InheritedFromUniqueProcessId is stored in EPROCESS in kernelspace, but still cannot be trusted, as it can be set from userspace.

ETW-patch

A ETW patch will overwrite EtwEventWrite() in ntdll.dll, so the process will not emit any ETW events by itself anymore. The usefulness of this is small, as not many common ETW events are generated by the process in userspace.

ETW-patch usually involves:

VirtualProtect .text: RX -> RW
Overwrite memory (replace function body with a return 0)
VirtualProtect .text: RW -> RX

   Process                                                    
  ┌──────────────────────┐                                    
  │                      │                                    
  │                      │                                    
  ├──────────────────────┤                                    
  │                      │   ntdll.dll RW -> patch -> RX      
  │ .text                ├──────────────┐                     
  │                      │              │                     
  ├──────────────────────┤              │       ┌─────────┐   
  │                      │              │       │         │   
  │                      │              │ ◄─────┤  EDR    │   
  │                      │              │       │  sus?   │   
  ├──────────────────────┤              │       │         │   
  │ ntdll.dll            │              │       └─────────┘   
  │                      │              │                     
  │  - EtwEventWrite()   │◄─────────────┘                     
  │                      │                                    
  │                      │                                    
  ├──────────────────────┤                                    
  │                      │                                    
  │                      │                                    
  │                      │                                    
  └──────────────────────┘

Probably changing permissions of ntdll.dll to modify it will generate more telemetry than patching ETW is avoiding. Its memory permissions need to be changed from RX to RW and then back to RX again.

Note that this patch would only affect the events generated by the patched process. ETW cannot be deactivated globally (Without kernel access).

There are data-only attacks for ETW-patch, which dont need to patch a RX section.

References:

AMSI-AV patching

AMSI will scan scripts executed in supported Windows interpreters, like Powershell, MS Office VBA runtime, or .NET / C#. Or in other words, the application itself asks the OS to perform an AV scan via AMSI on some file or buffer it intends to execute.

To disable AMSI runtime code scanning, for example patch amsi.dll!AmsiOpenSession to remove telemetry. Alternatives are AmsiScanString() / AmsiScanBuffer().

The process is identical to ETW-patch: Make code section writeable, break the functions, restore original permissions again.

     Process                                                   
  ┌──────────────────────┐                                   
  │                      │                                   
  │                      │                                   
  ├──────────────────────┤                                   
  │                      │   ntdll.dll RW -> patch -> RX     
  │ .text                ├──────────────┐                    
  │                      │              │                    
  ├──────────────────────┤              │       ┌─────────┐  
  │                      │              │       │         │  
  │                      │              │ ◄─────┤  EDR    │  
  │                      │              │       │  sus?   │  
  ├──────────────────────┤              │       │         │  
  │ ntdll.dll            │              │       └─────────┘  
  │                      │              │                    
  │  - AmsiOpenSession() │◄─────────────┘                    
  │                      │                                   
  │                      │                                   
  ├──────────────────────┤                                   
  │                      │                                   
  │                      │                                   
  │                      │                                   
  └──────────────────────┘

Disabling the AMSI-AV function is usually done by a loader, before executing well signatured malicious managed code or Powershell scripts. The loader is being scanned, but the .NET/Powershell loaded at runtime wont be.

This is useful for when loading a signatured malicious powershell script in powershell, which otherwise would be scanned by the AMSI interface. A famous site to generate obfuscated AMSI-AV patches is https://amsi.fail.

Executing malware in a managed language like C# or Powershell, and starting it with a loader doing data-only ETW-patch, is probably pretty effective. Remember to avoid signatured code in your loader, and dont forget to implement anti-emulation.

AMSI-hooks patching

AMSI-hook patching (or AMSI patching) is just removing the EDR’s ntll.dll patches which call into amsi.dll. It is basically identical to ETW-patch or AMSI-AV patch, as it just modifies ntdll.dll again. It can generate additional telemetry, for example when loading a clean version of ntll.dll from disk.

       Process                                                      
      ┌──────────────────────┐                                      
      │                      │                                      
      │                      │                                      
      ├──────────────────────┤                                      
      │                      │   ntdll.dll RW -> patch -> RX        
      │ .text                ├──────────────┐                       
      │                      │              │                       
      ├──────────────────────┤              │       ┌─────────┐     
      │                      │              │       │         │     
      │                      │              │ ◄─────┤  EDR    │     
      │                      │              │       │  sus?   │     
      ├──────────────────────┤              │       │         │     
      │ ntdll.dll            │              │       └─────────┘     
      │                      │              │                       
      │                      │◄─────────────┘                       
      │                      │                                      
      │                      │                                      
      ├──────────────────────┤                                      
      │                      │                                      
      │                      │                                      
      │                      │                                      
      └──────────────────────┘

EDR’s which depend mostly on hooking ntdll.dll can be severily blinded with this technique. In that case if will have extra detection for this technique, trying to catch it before the unhooking/blinding succeeds.

References:

AMSI Bypass

“AMSI bypass” can either mean to bypass the AMSI-AV interface as described above. Or the AMSI-hooking patching. Or it means to call OS kernel functions without invoking the ntdll.dll hooks in it, which also “bypasses” the EDR.

This can be done by using direct syscalls: If you know the correct syscall number, you can invoke it directly, without involving ntdll.dll.

Or for indirect syscalls: re-use parts of the ntdll.dll functions, AFTER the hook-invocation.

In both cases, the AMSI-hooks are bypassed, and the EDR will not get any telemetry. Often this technique is used to just undo the AMSI-hooking by the EDR. The second stage will be compiled normally, using the clean ntdll.dll.

If this is the normal function call graph with hooked ntdll.dll:

                                                                                      ┌─────────────┐       
                                                                                      │             │       
┌───────────────────┐   ┌─────────────────┐   ┌───────────────────┐                   │             │       
│                   │   │                 │   │ ntdll.dll:        │                   │    OS       │       
│  Application.exe  │   │ kernel32.dll    │   │ NtCreateFile()    │                   │             │       
│                   ├──►│                 ├──►│                   │                   │             │       
│                   │   │ CreateFile()    │   │                   │                   │    Kernel   │       
│                   │   │                 │   │                   │                   │             │       
└───────────────────┘   └─────────────────┘   │                   │                   │             │       
                                              │                   │                   │             │       
                                     ┌────────┼───jmp callback    │                   │             │       
                                     │        │                   │          syscall  │             │       
                                     │ ┌──────┼──►syscall         ├─────────────────► │             │       
                                     │ │      │                   │                   │             │       
                                     │ │      │                   │                   │             │       
                                     │ │      └───────────────────┘                   │             │       
                                     │ │                                              │             │       
                                     │ │ ┌─────────────────────────┐                  │             │       
                                     │ └─┤                         │                  │             │       
                                     │   │ amsi.dll:               │                  └─────────────┘       
                                     └──►│ HookedNtCreateFile()    │                                        
                                         └──────────┬──────────────┘                                        
                                                    │ notify                                                
                                                    ▼                                                       
                                              ┌────────────┐                                                
                                              │    EDR     │                                                
                                              │    :-)     │                                                
                                              └────────────┘

And here with two techniques:

Direct syscall: Just do the syscall yourself (with the correct syscall number)
Indirect syscall: Re-use parts of hooked ntdll.dll, invocate syscall but not the hook

                  direct                                                                           
                  syscall                                                                        
                ┌────────────────────────────────────────────────────────┐            ┌─────────────┐   
                │                                                        │            │             │   
┌───────────────┴───┐   ┌─────────────────┐   ┌───────────────────┐      │            │             │   
│                   │   │                 │   │ ntdll.dll:        │      │            │    OS       │   
│  Application.exe  │   │ kernel32.dll    │   │ NtCreateFile():   │      │            │             │   
│                   ├──►│                 ├──►│                   │      │            │             │   
│                   │   │ CreateFile()    │   │                   │      │            │    Kernel   │   
│                   │   │                 │   │                   │      │            │             │   
└──────────────┬────┘   └─────────────────┘   │                   │      │   syscall  │             │   
               │                              │                   │      └──────────► │             │   
               │                              │   jmp callback    │                   │             │   
               │                              │                   │          syscall  │             │   
               └──────────────────────────────┼──►syscall         ├─────────────────► │             │   
                indirect                      │                   │                   │             │   
                syscall                       │                   │                   │             │   
                                              └───────────────────┘                   │             │   
                                                                                      │             │   
                                          ┌────────────────────────┐                  │             │   
                                          │amsi.dll                │                  └─────────────┘   
                                          │                        │                             
                                          │HookedNtCreateFile()    │                       
                                          └────────────────────────┘                                        
                                                       no notify                                                
                                                                                                             
                                              ┌────────────┐                                                
                                              │    EDR     │                                                
                                              │    :-(     │                                                
                                              └────────────┘

Or replace ntdll.dll completely with an unhooked version from disk, like in RefleXXion.

References:

Image Spoofing

Similar to spoofing arguments, an attacker may also want to “spoof” the exe: Start a non-malicious exe like notepad.exe, which the EDR records, then replace the content of the process with malicious one like mimikatz. This attempts to trick the EDR into thinking something nonmalicious has been started. This bypasses simple EDR’s. Another name is “Process Image Manipulation”.

The source .exe file is called the Image for a process.

Process hollowing:

                   Event: CreateProcess("notepad.exe")
                        ▲                             
                        │                             
                        │                             
                        │   notepad.exe               
┌───────────┐           │  ┌───────────┐              
│           │ Start     │  │           │              
│           │ Suspended │  │           │              
│           ├───────────┴─►│           │              
│           │              │           │              
│           │              ├───────────┤              
│           │ Overwrite    │ .text     │              
│           │ Memory       │           │              
│           ├──────────────┤►          │              
│           │              ├───────────┤              
│           │              │           │              
│           │              │           │              
│           │              │           │              
│           │ Resume       │           │              
│           ├─────────────►│           │              
│           │              │           │              
└───────────┘              └───────────┘

There are some other techniques:

Process Hollowing: Overwrite process memory of suspended process with WriteProcessMemory()
Process Doppelgänging: Overwrite a file with Transactional NTFS (TxF), start the process, then roll back the transaction so the original file is restored
Process Herpaderping: Write malicious code to a exe, create process, quickly replace malicious content with non-malicious one before it gets scanned
Process Ghosting: Create empty file, semi-delete it, write malicious data, create process from it

For a SOC analyst, if the image spoofing is not detected by the EDR, the process looks trustworthy, as its from a nonmalicious binary.

But, Memory scanning will scan the memory of processes using signatures, like an AV. Therefore malicious code like CobaltStrike can still be identified, even if injected in a genuine process.

Or by comparing the process memory content with the exe file content. The original exe name is stored in the PEB (peb.ProcessParameters.ImagePathName), or the kernel’s EPROCESS structure (eprocess.ImageFilename[15], eprocess.SeAuditProcessCreationInfo.ImageFileName). Comparing the content of memory with that of a file is performance intensive.

Alternatively, the EDR can gather telemetry which identifies the manipulations. Or detect the supporting techniques like direct syscalls, e.g. with call stack analysis.

Technique	Used API
Hollowing	CreateProcess, NtUnmapViewOfSection, VirtualAllocEx, WriteProcessMemory, SetThreadContext, ResumeThread
Doppelgänging	CreateTransaction, CreateFileTransacted, NtCreateProcessEx
Herpaderping	NtCreateSection, NtCreateProcessEx, NtCreateThreadEx
Ghosting	CreateFileA, NtOpenFile, NtSetInformationFile, NtCreateSection, NtCreateProcess, WriteRemoteMem, NtCreateThreadEx

Hollowing references:

Module Stomping

This is similar to Image Spoofing, but with DLL’s.

Module stomping writes the shellcode into the .text section of a unused DLL in a remote process, and creates new thread starting starting there.

                   Event: LoadLibrary("genuine.dll")           
                        ▲                                      
                        │                                      
                        │                                      
                        │   genuine.dll                        
┌───────────┐           │  ┌───────────┐                       
│           │ Load      │  │           │                       
│           │ DLL       │  │           │                       
│           ├───────────┴─►│           │                       
│           │              │           │                       
│           │              ├───────────┤                       
│           │ Overwrite    │ .text     │                       
│           │ Memory       │           │                       
│           ├──────────────┤►          │                       
│           │              ├───────────┤                       
│           │              │           │                       
│           │              │           │                       
│           │              │           │                       
│           │ Start        │           │                       
│           ├─────────────►│           │                       
│           │              │           │                       
└───────────┘              └───────────┘

Same as Image Spoofing, it can be detected by:

Memory signature scanning
Memory/file comparison of .text section
Telemetry of the stomping
Identifying supporting techniques like direct/indirect syscalls with telemetry

References:

Reflective DLL Loading

One can load additional DLL’s into your own address space, or into another process. The latter is called “DLL injection”. This usually uses the function LoadLibrary() with a path to the DLL. Windows will correctly load the DLL into the process, and generate events about it.

DLL loading is useful, as standard C/C++ compiler can generate DLLs natively in the compilation process. These are also position independant. RedTeaming tools can therefore easily be compiled and loaded.

Using LoadLibrary(), the DLL needs to be loaded from disk, and therefore will be scanned by the AV. Alternatively one can do the DLL loading steps by yourself, which is called reflective DLL loading. Its basically a normal DLL, with loading code positioned somewhere (either at the beginning of the DLL, in the PE heade, or in an exported function). The loading code is some shellcode which performs the DLL loading, without involving Windows.

So with reflective DLL loaders, DLLs can be used as shellcode. And there wont be any DLL loading events from the OS.

Memory Encryption

It is possible to encrypt all suspicious regions before sleeping, and decrypt it again when the process resumes. This is not trivial, and requires great care, weird Windows functionality, and support from the payload (e.g. the beacon itself). It can create a lot of telemetry, but much of it is not well capturable by the EDR.

                                             Event           
                                                 │           
                                                 │           
                                                 │           
 Process                Process                  ▼           
┌───────────┐          ┌───────────┐        ┌───────────┐    
│           │          │           │        │           │    
│           │          │           │        │           │    
│           │          │           │        │           │    
├───────────┤          ├───────────┤        │           │    
│           │  sleep   │           │  Read  │           │    
│ .text     ├─────────►│ .text     ◄────────┤    EDR    │    
│           │          │  Encrypted│  Scan  │           │    
├───────────┤          ├───────────┤        │           │    
│           │          │           │        │           │    
│           │          │           ◄────────┤           │    
│ .data     │          │ .data     │        │           │    
│           │          │  Encrypted│        └───────────┘    
│           │          │           │                         
└───────────┘          └───────────┘

A beacon usually Sleep() for a certain amount of time. If it uses memory encryption, any scans performed during this time will just see encrypted memory.

Callstack spoofing

The callstack is basically a function call hierarchy: a list of functions, each called by the one before it. When a process calls a syscall (or a hooked ntdll.dll function), this list can be retrieved by the EDR and analyzed.

When using direct syscalls, indirect syscalls, or other shenanigans, the callstack looks “wrong” by default, which can be identified by the EDR.

Callstack spoofing makes sure that the callstack looks genuine again. It is a supporting technique: e.g. an AMSI-bypass can be detected by using callstacks, so we need to improve the AMSI-bypass so the callstack looks more natural.

The actual callstack spoofing usually doesnt generate telemetry, and can be implemented pretty savely. But by re-using existing callstack-spoofing implementations, it can be identified by signature scanning again (be it on-disk, or in-memory).

Suspicious callstack for NtDelayExecution(): Sleep Callstack Not Spoofed

Clean (spoofed) callstack for NtDelayExecution(): Sleep Callstack Spoofed

Anti-Detection depends on faking the callstack, copying a clean one, or just hide the malicious callstack. Many techniques exist to check the integrity of the callstack, often by correlating with other information. The thread start address should originate from a reasonable location for example. Usually something like loader.exe -> kernel32.exe -> ntdll.exe.

In a normal thread, the user mode start address is typically the third function call in the thread’s stack – after ntdll!RtlUserThreadStart and kernel32!BaseThreadInitThunk. So, when the thread has been hijacked, this is going to be obvious in the call stack For “early bird” APC injection, the base of the call stack will be ntdll!LdrInitializeThunk, ntdll!NtTestAlert, ntdll!KiUserApcDispatcher and then the injected code.

References:

Remote Processes

The attacker can choose if he wants to mess with his own process, or another one of the system. The Windows functions described here can be mostly also used on another process, just by using OpenProcess() first.

This is mostly used for process injection. It is very useful to migrate into another process, like teams.exe. The C2 communication can be hidden in the normal traffic of the application. Also Teams is JavaScript, so potentially a lot of RW->RX allocations of the JavaScript interpreter JIT.

This includes:

VirtualAllocEx() / VirtualFreeEx()
ReadProcessMemory() / WriteProcessMemory()
CreateRemoteThread()
QueryInformationProcess() / NtQueryInformationProcess()

 Process                              Child Process
┌──────────────┐                     ┌─────────────┐ 
│              │                     │             │ 
│              │ OpenProcess()       │             │ 
│              ├────────────────────►│             │ 
│              │              handle │             │ 
│       HANDLE │◄────────────────────┤             │ 
│              │                     │             │ 
│              │ VirtualAlloc(handle)│             │ 
│              ├────────────────────►│             │ 
└──────────────┘                     └─────────────┘

Messing with remote processes is more scrutinized by the EDR, it is safer to just stay in your own process. Instead for migration, use DLL sideloading, or other techniques which do not depend on OpenProcess() something.

Suspended processes

A very common approach is to create a suspended process with argument CREATE_SUSPEND, then mess with it, then let it execute/resume.

CreateProcessA("C:\\Windows\\System32\\calc.exe", NULL, NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi);
...
ResumeThread(pi.hProcess);

Many RedTeam techniques depend on this functionality. Currently using suspended processes doesnt seem to bother the EDR much, but this may change it the future. The reason may be that creating a suspended process looks very similar to create a normal process: the “suspended” flag is not visible to the EDR. Also its not really the process which is suspended, but the main thread - the process is created normally.

For example we can create a new process in suspended state, and queue an APC to execute our shellcode, which may make it invisible to an EDR (as it may be executed before KAPC injection and hooking of ntll.dll):

 Process                                      Child Process
┌──────────────┐                             ┌─────────────┐  
│              │                             │             │  
│              │ CreateProcessA(suspended)   │             │  
│              ├────────────────────────────►│             │  
│       HANDLE │◄────────────────────────────┤             │  
│              │                             │             │  
│              │                             │             │  
│              │ VirtualAllocEx()            │             │  
│              │ WriteProcessMemory()        │             │  
│              │ QueueUserApc()              │             │  
│              ├────────────────────────────►│             │  
│              │                             │             │  
│              │                             │             │  
│              │ ResumeThread()              │             │  
│              ├────────────────────────────►│             │  
└──────────────┘                             └─────────────┘

Outro

EDR Wisdoms

Use threatcheck or avred to identify which part of your stuff gets identified by AV, and patch it
Memory scanning is performance intensive, and usually requires a trigger to be performed
Usermode AMSI is less and less relevant, and therefore AMSI-hooks patching too
Always look out for timing / TOCTOU bypasses

Mistakes writing loaders

Using function calls to copy memory
Putting more than minimal amount of effort into handling entropy
Putting more than minimal amount of effort into handling encryption
Generate too much telemetry
Threads not starting in backed memory
Marking RX pages RW again
Having unclean callstacks

Proposed Loader

Proposed loader layout:

                                                                 ┌──────────┐                  
                                                                 │ encrypted│                  
                                                                 │ Payload  │                  
                                                                 └────┬─────┘                  
                                                                      │                        
                                                                      │                        
                                                                      ▼                        
┌──────┐    ┌────────────┐    ┌───────────┐    ┌─────────────┐   ┌──────────┐   ┌────────────┐
│ EXE  │    │ Execution  │    │ Anti      │    │EDR          │   │ Alloc RW │   │  Payload   │
│ File ├───►│ Guardrails ├───►│ Emulation ├───►│deconditioner├──►│ Decode/Cp├──►│  Execution │
│      │    │            │    │           │    │             │   │ RX       │   │            │
│      │    │            │    │           │    │             │   │ Exec     │   │            │
└──────┘    └────────────┘    └───────────┘    └─────────────┘   └──────────┘   └────────────┘

EXE File: All code should be contained in the .text section (IMAGE)
Execution Guardrails: Only let it execute on the intended target (Anti-Middleboxes)
Anti-Emulation: Stop AV emulating our binary (mem usage, cpu cycles count, time trickery…)
EDR Deconditioner: Decondition the EDR by doing a lot of our Alloc/Copy/VirtualProtect loop with nonmalicious data and free
Payload: Encrypted (how doesnt matter)
Alloc/Decode/Virtualprotect/Exec: As normal as possible (avoid using DLL functions here). Avoid RWX.
Payload Execution: As normal as possible (jmp to payload, avoid creating new threads)

Not part

Detections based on:

File access
Registry access
Network access

Low level techniques which are not discussed:

Software breakpoints
Hardware breakpoints
VEH
APC injection
Anti debugging