Hacking the Human Voice: Our DEF CON Battle of the Bots Win

Hacking the Human Voice: Our 1st Place Win at DEF CON’s Battle of the Bots

How do you build a bot that can out-talk human judgement in real time?

At DEF CON’s first Battle of the Bots, we put ours to the test, and won.

Team DirectDefense, consisting of Matt Bangert and Michael Tomlinson, were selected to compete in DEF CON’s first ever Battle of the Bots: Vishing Edition; a competition that challenged teams to design AI-powered agents capable of placing live voice phishing calls in real-time.

Matt Bangert and Michael Tomlinson are Senior Enterprise Security Consultants at DirectDefense specializing in red team operations, adversary emulation, and social engineering.

How the Competition Worked

The Social Engineering Community’s “Battle of the Bots: Vishing Edition” was a groundbreaking competition held at DEF CON where teams developed AI-powered voice agents to conduct live vishing calls against real human targets.

Teams were assigned specific companies and objectives with point values, then had to build autonomous AI agents capable of extracting sensitive information through voice-based social engineering without any human intervention during the calls. Competitors operated from soundproof booths where they could monitor and manage their bots but were prohibited from speaking during live calls.

The competition challenged participants to push the boundaries of AI-driven social engineering, combining technical expertise in voice synthesis, natural language processing, and telephony systems with psychological manipulation tactics. Teams were judged on their bot’s performance in achieving assigned objectives, technical innovation, and overall creativity in their approach to automated social engineering attacks.

The judges were an impressive panel of industry experts: Rachel Tobac (CEO of Social Proof Security), Perry Carpenter (Chief Human Risk Management Strategist at KnowBe4), and Snow (Stephanie Caruthers) the Co-Founder of the Social Engineering Community and the Global Lead of Cyber Range and Cyber Crisis Management at IBM X-Force.

Five competing teams were selected, consisting of a company specializing in AI voice phishing, and a tech-based YouTuber with nearly 4 million subscribers.

Building our Bots

Technology Stack and Architecture

Rather than developing sophisticated AI from scratch, we leveraged commercially available platforms. The core infrastructure consisted of:

RetellAI, a commercially available AI agent calling platform
Jambonz, a cloud-based telephony platform
Locally-hosted administrative Node.js application that handled AI agent and target selection, background noise, and memes displayed on the competition screen

AI Agent Management

RetellAI is a cloud-based platform that lets users build AI voice agents for automated phone calls. Originally intended for business use cases like customer support and appointment scheduling, the platform provides the infrastructure to connect large language models with telephony systems. For the competition, we repurposed RetellAI as the core component driving our social engineering calls. The platform handled the real-time processing pipeline – converting speech to text, sending prompts to GPT-4o, and converting responses back to speech – all while maintaining natural conversation flow.

The primary challenge we faced was working around RetellAI’s limitations. The platform lacks native capability for monitoring live outbound calls in progress, which required creative engineering solutions.

SIP Telephony Platform

Jambonz’s conferencing capabilities enabled operators to listen to conversations between AI agents and targets as they occurred and was required to play the live call audio out loud for real-time monitoring and the ability to intervene if necessary. The platform’s open-source flexibility and comprehensive SIP handling also allowed for extensive customization to support the specific competition requirements.

Security Implications

The Democratization of Advanced Attacks

The use of automated AI agent calling platforms like RetellAI demonstrates that virtually anyone can now create complex social engineering campaigns at scale, regardless of technical expertise. An attacker needs only a credit card and time to configure the flows and prompts. This democratization of advanced attack capabilities represents a fundamental shift in the threat landscape

The Competition in Action

Multi-Agent Architecture

We developed three configurable AI personalities and conversation styles, allowing for agent switching through our admin interface. Our agents were designed with specific voice characteristics and personalities optimized for believability based on the pretext we developed.

One of our most memorable moments was when our agent convinced a human to visit a phishing URL and read back a specific error code. This outcome demonstrates how sophisticated and believable AI-driven attacks have become.

Real-Time Adaptation

The bots demonstrated real-time adaptation capabilities during the calls. The integration of GPT-4o allowed for natural conversation flow, while our conference bridge architecture enabled seamless coordination between multiple agents when needed. Our platform achieved sub-800ms response times, making the conversations feel natural and human-like.

What We Learned

What Worked Well

The Jambonz + RetellAI integration proved robust for enterprise telephony needs
GPT-4o’s conversational capabilities exceeded expectations for social engineering contexts
Voice synthesis from Cartesia and ElevenLabs allowed for convincingly human-like interactions

Human Factor Insights

This competition demonstrated alarming realities about how humans respond to AI-driven social engineering:

Humans are remarkably vulnerable to well-crafted AI voices
The scalability of these attacks represents a paradigm shift in threat landscape
Detection of AI-generated calls remains challenging for average users

Personal Reflections

The most concerning takeaway wasn’t our technical achievement, but rather how accessible these capabilities are. We used commercially available platforms that anyone with basic technical knowledge and willingness to pay could deploy for malicious purposes. This wasn’t advanced research; it was off-the-shelf technology.

The Win and What it Means

After putting in a tremendous amount of hard work, we were thrilled to come away with a first-place victory. This victory highlights a critical blind spot in organizational security: while we’ve spent years training employees to recognize human social engineers, we’re now facing the reality that AI can automate these attacks at unprecedented scale using commercial tools that require minimal technical sophistication to deploy.

Looking Forward

We want to express our sincere gratitude to the judges: Rachel Tobac, Perry Carpenter, and Snow for their expertise and fair evaluation. Special thanks to the Social Engineering Community and DEF CON for creating this innovative competition format that pushes the boundaries of what’s possible in security research.

The Social Engineering Community’s willingness to explore these cutting-edge attack vectors in a controlled, ethical environment is crucial for advancing our collective defense capabilities. We’re already looking forward to next year’s competition and the continued evolution of AI-powered security testing.

Most importantly, this competition serves as a wake-up call for the security industry. The future of social engineering is here, it’s powered by AI, and it’s more accessible than we’d like to admit. It’s time to evolve accordingly.

By: Matt Bangert and Michael Tomlinson 09.11.25

Hacking the Human Voice: Our 1st Place Win at DEF CON’s Battle of the Bots

Hacking the Human Voice: Our 1st Place Win at DEF CON’s Battle of the Bots

How do you build a bot that can out-talk human judgement in real time?