Anthropic’s Claude Mythos Preview: The AI Cybersecurity…

TLDR; Anthropic’s Claude Mythos Preview signals that AI can now discover and exploit vulnerabilities at scale, shrinking the gap between defenders and attackers. This accelerates risk discovery but overwhelms traditional remediation models, making continuous, AI-augmented testing and strong human-led prioritization essential for modern security programs. It also places new urgency on how organizations evaluate providers, scale remediation, and move toward continuous validation.

What Happened

On April 7, 2026, Anthropic announced Claude Mythos Preview, a general-purpose frontier AI model that was not designed specifically for cybersecurity, but whose advanced coding and reasoning capabilities proved so effective at finding and exploiting software vulnerabilities that Anthropic chose not to release it publicly.

According to Anthropic, the model has identified thousands of previously unknown vulnerabilities across major operating systems and browsers. Specifically, the model demonstrated the ability to:

Identify previously unknown vulnerabilities across major operating systems and browsers with limited human input.
Surface long-standing vulnerabilities in widely deployed software, including OpenBSD and FFmpeg.
Combine multiple vulnerabilities into more complex exploit chains.

Access is currently limited to approximately 50 organizations through a controlled industry initiative called Project Glasswing. This includes 12 founding partners who are using the model as part of their defensive security work and sharing learnings with the broader industry, and more than 40 additional organizations responsible for building and maintaining critical software infrastructure who are using it to scan and secure their own systems and open-source code.

The announcement reflects what offensive security practitioners have anticipated: AI is increasingly capable of identifying and exploiting software vulnerabilities with limited human involvement. These capabilities are currently being deployed in controlled environments but are expected to become more widely accessible over time.

Why This Matters for Your Security Program

The threat model just changed

For years, the security industry operated on the assumption that identifying complex, chained vulnerabilities required rare, highly specialized human expertise. That assumption is being challenged.

Mythos didn't just scan for vulnerabilities faster. It autonomously chained together multiple vulnerabilities into sophisticated exploits, found a 27-year-old flaw in one of the most security-hardened operating systems in existence, and did so without human steering.

The implication is not that human expertise is obsolete, it’s that the standard for what constitutes a complete security assessment has changed. Assessments that don't incorporate AI-augmented vulnerability discovery are, by definition, incomplete against the threat landscape that exists today.

The defender advantage is real, but temporary

Project Glasswing represents an effort to provide defenders with early access to these capabilities.

Alex Stamos, former CSO of Facebook, current CSO at AI security firm Corridor, and a Bishop Fox advisor, has noted that comparable capabilities are likely to become more broadly available, including through open-weight models, on relatively short timelines. Any advantage defenders gain likely won’t last long.

As vulnerability discovery accelerates, the gap between what is testable and what is tested will continue to widen. If your security program is still relying on annual or quarterly penetration tests as its primary assurance mechanism, it is operating on a cadence designed for a different threat environment.

The question is not whether to modernize your approach but how quickly you can move toward a model that supports continuous validation.

What this means for evaluating security providers

Mythos Preview has implications not only for defenders, but for how organizations evaluate their security service providers. It creates both an opportunity and a pressure test. The ability to scale discovery is increasing, while expectations for validation and outcomes are rising alongside it.

Risk Factor	Implication for Your Program
Discovery scale	AI is expanding the scale and depth of vulnerability discovery. Providers that do not incorporate AI-augmented techniques may have more limited coverage relative to what is now possible.
Triage and prioritization	As discovery accelerates, the volume of findings increases. Without structured triage, business-risk context, and clear remediation guidance, organizations risk accumulating backlog rather than improving security outcomes.
Attacker parity	Capabilities used for defensive testing are likely to become more broadly accessible quickly, including to adversaries. This increases the importance of continuous monitoring and timely remediation, not just periodic assessments.
Workforce and capacity	AI-generated findings will outpace the human capacity to triage, validate, and act on them. Security and engineering teams that are already stretched face a compounding burden. Organizations that don't account for this in staffing and workflows risk burning out the people responsible for turning findings into outcomes.

The Bishop Fox Perspective

The Anthropic announcement reinforces a shift that has already been underway, AI is changing how vulnerabilities are discovered. What hasn’t changed is what matters most, knowing which risks are real and what to do about them.

1. AI increases discovery. Human expertise determines what matters.

Vulnerability discovery has never been the hardest part. Security teams have always had more candidate findings than they can act on. What has changed is the volume and speed at which AI can identify them. As AI accelerates discovery, the problem doesn’t get solved, it compounds. More findings without context creates more noise, not better security outcomes.

The question a CISO should be asking is not 'how many vulnerabilities did we find?' It's 'which ones represent real, exploitable risk in our specific environment, and in what order do we address them?' Answering that question requires understanding your application architecture, your business context, and your actual attack surface, not just a list of CVEs. That is where human offensive security expertise remains not just relevant but irreplaceable.

2. Continuous coverage is no longer optional.

According to zerodayclock.com, a live threat intelligence dashboard tracking over 3,500 CVE-exploit pairs, the median time from vulnerability discovery to exploitation has collapsed from 771 days in 2018 to under 4 hours by 2024, with 2025 seeing the majority of exploits weaponized before public disclosure, and 2026 projected to reach under one hour. Point-in-time assessments leave organizations blind between engagements. Continuous offensive security testing, running persistently against your attack surface, is the only model that matches the speed of modern threats.

But continuous coverage only creates durable outcomes if the findings it generates are actionable. A high volume of unvalidated, unprioritized findings doesn't reduce risk. It creates noise. What matters is whether your security partner delivers findings your engineering team can act on: validated for exploitability, prioritized by business impact, and framed with enough context to drive decisions. That is what separates a security program that improves your posture over time from one that simply produces reports.

The organizations that will navigate this environment most effectively are those that treat offensive security as an ongoing discipline rather than a periodic exercise. They build a continuous feedback loop between what testing surfaces and what the business does about it.

3. Application security is where the business risk actually lives.

The vulnerabilities Mythos surfaced in operating systems, browsers, and shared libraries are foundational. But for most organizations, the more immediate and actionable risk sits on the external attack surface: in the applications they build, ship, and maintain themselves. That is where attackers find entry points into business logic, sensitive data, and privilege escalation paths.

Foundational software flaws matter because modern applications inherit them and oftentimes expose them publicly. A vulnerability in a library your application depends on is your vulnerability, whether or not your team wrote the code. The Mythos findings make clear that the dependency chain is a primary risk surface, and that AI-scale discovery is going to make that surface increasingly visible. Organizations that assess their application layer continuously, before vulnerabilities reach production, will be structurally more resilient than those that discover them after the fact.

Recommendations from the Bishop Fox Team

The Mythos Preview announcement is a useful forcing function for how security leaders evaluate their programs and partners. Based on what we are seeing across enterprise environments, there are a few areas worth focusing on:

Evaluate how your providers are approaching AI-augmented security. AI-powered offensive security is still early and evolving rapidly. No vendor has fully solved it yet, and that's not a disqualifier, it's the reality of the moment. What matters is whether your providers are actively building the workflows, tooling, and frameworks required to meet this threat environment. Look for evidence of investment and a credible roadmap, not just a claim.

Also, factor in the cost implications of AI-powered security at this capability level. Frontier model pricing is significant. Mythos-class capability is about 5x the cost of Claude Opus 4.6 tokens, and the industry is still working out how those costs translate into sustainable service delivery models. This is an evolving area worth monitoring as the market matures.
Assess your remediation capacity honestly. The patch gap Mythos exposed is not unique to open-source software. Most organizations lack the internal bandwidth to absorb a significant increase in high-fidelity findings. If your security program generates more findings than your teams can act on, the problem isn't discovery. It's triage and prioritization.

Not all AI-generated findings are created equal. Volume without accuracy creates its own kind of noise. Ask your providers how findings are validated before they reach your engineering team, and what their false positive rate looks like in practice.
Revisit your testing cadence in the context of current threat dynamics. The median time from vulnerability discovery to exploitation is now measured in hours, not months. If your primary assurance mechanism runs annually or quarterly, calculate the exposure window that creates. Continuous coverage should be on your near-term roadmap, not deferred to the future.