Expert Analysis of Recent SaaS Attacks That Shocked Global Brands. Watch now

Ready to Hack an LLM? Our Top CTF Recommendations

Ready to hack an LLM? Blog banner promoting Bishop Fox’s top Capture the Flag (CTF) recommendations for large language model hacking challenges.

Share

Large Language Models (LLMs) are changing everything, from how organizations build products and services to how they interact with their customers. But as we've explored, all this power comes with a new wave of security headaches. Attackers are already hard at work, poking and prodding these systems with jailbreaks, prompt injections, and the like. Reading about these threats is useful, but to really get it, you have to get your hands dirty.

That’s where Capture the Flag (CTF) competitions and hands-on playgrounds come in. They’re like a high-tech sandbox, providing a safe, structured way to experiment with real-world attack techniques and see exactly how LLMs behave under pressure. CTFs are the perfect place to let you explore vulnerabilities without putting production systems at risk.

At Bishop Fox, we believe the fastest way to learn LLM security is by doing. That’s why we’ve put together this list of CTFs and playgrounds that will help you get a feel for the offensive and defensive sides of LLM security. Let’s dive in.

LLM CTF Recommendations 

  1. Local LLM CTF and Lab
    Creator: Derek Rush, Bishop Fox (@BishopFox)
    Bishop Fox’s Local LLM CTF Lab is a sandboxed environment designed to teach players how to attack and defend large language models. Built with Go and Ollama, it simulates a layered “gatekeeper” system where prompts must bypass quarantined LLMs before reaching a privileged model. The setup highlights common defense techniques like regex filters, semantic validation, and context isolation, while also showing how these can be bypassed. It’s a practical way to explore jailbreaks, guardrail weaknesses, and the trade-offs of securing LLM pipelines.
  2. OWASP FinBot CTF
    Creator: OWASP Gen AI – Agentic Security Initiative (@OWASP-ASI)

    OWASP’s FinBot CTF is an interactive challenge built around a simulated AI-powered financial chatbot. What sets this CTF apart is its meticulously constructed business environment where traditional application logic intersects with AI decision-making in ways that mirror how real organizations evolve their systems to handle complex operational needs. Players test their skills by exploiting weaknesses in natural language understanding, prompt handling, and guardrail design to manipulate the bot into revealing sensitive information. The scenario mirrors real-world risks of deploying LLMs in high-stakes domains like fintech, where the pressure to automate complex workflows often creates exploitable gaps between intended security policy and actual system behavior and where data leakage or model abuse could have serious consequences. It’s a hands-on way to practice prompt injection and adversarial testing in a realistic business context.
  3. Dreadnode Crucible CTF
    Creator: Dreadnode (@dreadnode)

    Crucible is a hosted AI-hacking sandbox where practitioners sharpen their red teaming skills via LLM/ML challenges. It includes tasks spanning prompt injection, evasion, fingerprinting, and model inversion, with new challenge releases and community write-ups. Designed to be accessible from beginner to expert levels, users interact via APIs, notebooks, or a chat interface to discover flags.
  4. Steve’s Chat Playground (and on GitHub)
    Creator: Steve Wilson (@virtualsteve-star)

    Steve’s Chat Playground is an open-source, browser-based sandbox for experimenting with LLM guardrails, vulnerable chat models, and filter bypassing. Users can test input/output constraints like prompt injection, moderation, rate limits, and content filtering, all without needing backend dependencies. Because it’s entirely client-side, it offers a low-barrier way to explore LLM defenses and failure modes in a hands-on environment.
  5. Web LLM Attacks
    Creator: PortSwigger (@portswigger)

    PortSwigger’s Web LLM Attacks is a hands-on learning path that teaches attackers and defenders tactics against LLM-enabled web apps through interactive labs. It covers practical issues such as prompt injection, excessive agency (LLMs calling APIs or executing commands), insecure output handling, and sensitive-data leakage with real lab scenarios that use a live model. The labs are designed to mirror realistic deployment risks and include step-by-step exercises, so practitioners can practice exploitation and mitigation techniques.
  6. Wild LLaMa
    Creator: Allen Institute for AI - Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, and Yuntian Deng

    Wild LLaMa is a prompt-engineering mini-game that walks players through progressively harder levels designed to expose LLM limitations and prompt-injection vulnerabilities. The challenges rely on clever GPT prompt manipulations (rather than custom validation code) and include levels that explore hidden/zero-width encodings, context tricks, and persistence/override techniques. It’s a compact, hands-on way to sharpen adversarial prompt skills and learn how small changes in input can subvert naive guards.
  7. Gandalf: Agent Breaker
    Creator: Lakera (@lakeraai)

    Gandalf is a gamified red teaming platform that challenges players to bypass progressively stricter LLM defenses across layered levels. It teaches practical attack techniques, like prompt injection, evasion, and hallucination exploitation, while letting defenders observe how protections change attack surfaces as they tighten. The project also produces large prompt-attack datasets and community research used to study defenses and trade-offs between security and model utility.
  8. Damn Vulnerable LLM Agent
    Creator: WithSecure Labs (adapted by ReversecLabs)

    Damn Vulnerable LLM Agent is a deliberately insecure ReAct agent chatbot that teaches Thought/Action/Observation injection techniques against agentic LLM systems. The lab simulates a banking chatbot where practitioners exploit the ReAct loop to force the agent into unauthorized actions like accessing other users' transactions or executing SQL injection payloads through the agent's tools. The challenges require manipulating the agent's reasoning chain by injecting fake observations and thoughts that convince the LLM to bypass access controls or execute malicious database queries. It's a focused environment for understanding how agentic architectures introduce unique prompt injection vectors that differ from traditional chatbot attacks.
  9. Damn Vulnerable MCP Server
    Creator: Harish Santhanalakshmi Ganesan (@harishsg993010)

    Damn Vulnerable MCP Server is a progression-based lab that exposes security weaknesses in Model Context Protocol implementations through 10 escalating challenges. The environment demonstrates how the MCP (model context protocol) tool-based architecture creates attack surfaces including: tool poisoning through malicious descriptions, tool shadowing where attackers override legitimate tools with malicious versions, rug pull attacks that exploit mutable tool definitions, and indirect prompt injection via compromised data sources. Challenges progress from basic prompt injection to advanced multi-vector attacks that chain several vulnerabilities together. More difficult levels require token theft from insecure storage and remote code execution achievement through vulnerable tool implementations.
  10. Neurogrid CTF: The Ultimate AI Security Showdown
    Creator: Hack The Box

    HTB's Neurogrid CTF offers a new kind of competitive scenario where AI agents handle technical execution while humans provide strategic direction. This limited-time four-day event, happening Nov. 20-24, is an MCP-only competition where participants deploy AI agents to analyze malware, dissect code, and exploit AI models while human teammates provide strategy and oversight. Designed exclusively for Model Context Protocol integration, the competition requires teams to orchestrate their AI agents through complex offensive security challenges spanning cryptography, reverse engineering, web exploitation, and forensics. The human role shifts from performing technical tasks to making high-level decisions about which challenges to tackle, how to allocate agent resources, and when to pivot strategies based on the competitive landscape. It's meant to be a proving ground for benchmarking how effectively teams can leverage AI capabilities in offensive security operations, testing not just the technical sophistication of the agents themselves but also the strategic acumen of the humans directing them in a competitive environment.

Final Thoughts

Wrapping up, we hope you enjoy diving into these CTFs and playgrounds. These are just a few of the tools out there. The bottom line is that mastering the security challenges of LLMs is essential in today’s world.

By leveraging these CTFs and playgrounds, you gain practical experience that goes far beyond just reading about vulnerabilities. You'll learn to think like an attacker, understand how LLMs behave under pressure, and discover the creative ways these systems can be manipulated. These hands-on labs are the fastest way to bridge the gap between theory and practice, ensuring you're prepared to defend against the next generation of threats.

To learn more about how Bishop Fox can help your team build and maintain secure AI systems, explore our AI/LLM Security Solution Brief.

Subscribe to our blog

Be first to learn about latest tools, advisories, and findings.


Banksy Fox exploder1

About the author, Luke Sheppard

Senior Security Consultant

Luke Sheppard is a Senior Security Consultant at Bishop Fox, specializing in web application security, API penetration testing, and AI/LLM-integrated application assessments. With extensive experience in infrastructure security, he has secured environments spanning CI/CD pipelines and AWS cloud ecosystems.

As an active security researcher and developer, Luke leads the Bishop Fox LLM Penetration Testing Service-level Advisory Board (SLAB), where he guides the strategic direction of AI/LLM testing practices. He authored the AI/LLM Penetration Testing Playbook, evaluated numerous open-source LLM security tools, and frequently mentors other consultants on advanced AI/LLM testing tools, techniques, and theory. Luke also develops automation tooling in Python and Golang to enhance and customize penetration testing workflows.

Beyond client engagements, Luke contributes to the broader security community as a volunteer developer for open-source security initiatives and is the creator of Instability.py, a specialized tool for security automation. He also mentors aspiring security professionals through programs such as Bishop Fox Mentorship, RaiseMe, HackTheBox, and TryHackMe.

More by Luke

This site uses cookies to provide you with a great user experience. By continuing to use our website, you consent to the use of cookies. To find out more about the cookies we use, please see our Privacy Policy.