Large Language Models (LLMs) are reshaping enterprise technology and redefining what it means to secure software. But here’s the problem: most penetration testers are using the wrong tools for the job. Traditional techniques focus on exploits and payloads, assuming the AI is just another application. But it’s not.

In this session, Brian D., Security Consultant III at Bishop Fox, makes the case that effective LLM security testing is more about persuasion than payloads. Drawing on hands-on research and real-world client engagements, Brian reveals a new model for AI pen testing – one grounded in social engineering, behavioral manipulation, and even therapeutic dialogue.

You’ll explore Adversarial Prompt Exploitation (APE), a methodology that targets trust boundaries and decision pathways using psychological levers like emotional preloading, narrative control, and language nesting. This is not Prompt Injection 101 — it’s adversarial cognition at scale – using real-world case studies to demonstrate success.

This virtual session tracks key operational challenges: the limitations of static payloads and automation, the complexity of reproducibility, and how to communicate findings to executive and technical leadership.

Brian covers:

Why conventional penetration testing methodologies fail on LLMs
How attackers exploit psychological and linguistic patterns, not code
Practical adversarial techniques: emotional preloading, narrative leading, and more
Frameworks for simulating real-world threats to LLM-based systems
How to think like a social engineer to secure AI

Who Should Watch:

This session is ideal for professionals involved in securing, testing, or developing AI systems, particularly those using large language models (LLMs). Penetration testers and red teamers will find it valuable as it introduces a new adversarial framework that goes beyond traditional payload-based approaches, focusing instead on behavioral manipulation and social engineering. AI/ML security practitioners and researchers will gain insight into emerging psychological attack techniques—such as emotional preloading and narrative control—that exploit how LLMs process language, not code. The virtual session also offers practical strategies and case studies, making it useful for developers seeking to better understand how attackers interact with their models. Additionally, CISOs and technical managers will benefit from discussions on the operational challenges of LLM security testing, such as reproducibility and how to communicate complex findings to leadership. Overall, this session provides a critical perspective for anyone working on the front lines of AI security.

Key Takeaways:

Traditional pentesting approaches fail with LLMs
1. Using static payload lists and automation is ineffective for thoroughly testing AI models.
2. LLMs respond to language and conversation, not technical exploits.
3. Simply testing with known jailbreaking prompts doesn't constitute proper due diligence.
Adversarial Prompt Exploitation (APE) methodology includes:
1. Emotional preloading and pivoting: Starting with benign conversation before suddenly changing direction
2. Leading the narrative: Using false information to trick the model into correcting or engaging
3. Negative casing and comparative framing: Asking how "not" to do something or forcing choices between bad options
4. Content-adjacent prompting: Describing restricted content indirectly through its components
5. Language and translation nesting: Using non-English languages to bypass English-focused guardrails
Context window manipulation is crucial
1. As conversations grow longer, they approach the model's context window limit
2. Operating at this boundary increases the chance of bypassing initial restrictions
3. System prompts may not be included with every message, creating vulnerabilities
Real-world examples demonstrated critical vulnerabilities:
1. Creating extremist vacation rental listings that appeared endorsed by the client company
2. Generating anti-Western propaganda through emotional manipulation
3. Converting an intentionally negative AI personality into a friendly one
4. Producing imagery depicting drug use by describing it as a "DIY vaccine program"
Effective defense strategies include:
1. Implementing defense-in-depth rather than single guardrails
2. Running AI modules in sandboxed environments isolated from sensitive data
3. Keeping models separate from privileged actions and operations
4. Monitoring for unusual behavior or policy violations in real-time
5. Requiring human review of AI-generated actions that affect permissions or workflows
Testing should be conversational and creative
1. The most effective testing resembles social engineering rather than technical exploitation
2. Testers should understand that results may be inconsistent due to model parameters like temperature
3. Combining multiple techniques yields better results than relying on single approaches
4. Documentation requires capturing entire conversation flows, not just individual requests

The presentation emphasizes that as AI technologies evolve rapidly, both testing methodologies and defense strategies must adapt accordingly. Understanding the psychological aspects of LLM behavior is essential for comprehensive AI/LLM security testing in this emerging field.

Scoring high in the GigaOm Radar for the fourth year in a row!

See Why We're the Leaders in Offensive Security

The State of Offensive Security

The Best Defense is a Great Offense

Want to Work with the Best Minds in Offensive Security?

Breaking AI: Inside the Art of LLM Pen Testing

Brian covers:

Who Should Watch:

Key Takeaways:

Extend Your Knowledge

Check out these related resources.