
Bypassing Voice Authentication with AI
How Synthetic Speech Enables Attackers to Effectively Replicate Voices
In cybersecurity, AI plays the role of both protagonist and antagonist. Companies leverage AI to automate activities that were previously tedious or impossible to manage. On the flip side, attackers use AI to deploy highly effective attacks that can do a lot of damage and be harder to detect. One example that has evolved rapidly in the last year is voice authentication bypass.
The Rapid Evolution of Voice Cloning Technology
Voice authentication is often a component of multi-factor authentication security, and works by analyzing distinct biometric vocal characteristics such as pitch, tone, and speech patterns.
Historically, these characteristics were difficult to replicate with voice cloning technology. Just one year ago, cloning technology was fairly rudimentary – because it largely relied on text-to-speech (TTS) systems, there would be a noticeable delay due to limitations of the algorithms and processing power.
Cut to today, and newer voice cloning models have a delay of less than half a second, creating near real-time and highly natural sounding synthetic speech. Now, companies using voice authentication as a vocal “password” to access sensitive accounts and information face the risk of a synthetic speech attack.
Just as ChatGPT understands and rapidly produces natural language by analyzing enormous amounts of text data in seconds, AI-generated voice cloning models are trained on datasets of speech recordings and learn to quickly replicate unique vocal characteristics.
Now, AI’s newfound conversational capabilities are shedding light on significant security issues with current voice authentication systems. Given that voice authentication is most often used in industries that manage highly-sensitive data – like financial services, insurance, and healthcare – I predict companies and vendors alike will be taking a harder look at this technology.

PROTECT YOUR ORGANIZATION AGAINST RISING VULNERABILITIES
How Secure is Voice Authentication Really?
Just as company employees often use easy-to-guess or uncomplicated text passwords (we all remember the prevalence of “Password1”, “Password2”, etc.), the same issue is true for voice authentication.
The statement “My voice is my password” is typically the default password phrase provided by authentication providers; consequently, every employee who sets up voice authentication uses that statement.
There are three major security issues with using a predetermined phrase:
- Attackers can easily figure out what the phrase is by calling into the system and following the prompts to set up a voice authentication password. Like all employees, they will be instructed to speak the canned statement into the phone, immediately disclosing the common voice password used across an organization.
- If the same statement is used across an organization, an attacker can access virtually anyone’s account using voice cloning technology to mimic that phrase.
- There are no inherently complex vocal characteristics in the phrase “my voice is my password” that would help set one individual’s voice apart from another’s, making it easier for AI to clone the statement, trick the system, and gain access.
You may be wondering if voice authentication is secure at all if it can be so easily replicated by AI and compromised with seemingly minimal effort by an attacker.
Voice authentication is arguably less secure than it was even a year ago because of AI and advances in voice cloning technology. In response, authentication providers have added layers of security into the voice password process; for example, automatically verifying that the number being used to access an account is actually connected to that account.
Ideally, the system should also verify a user’s pre-selected password, and ensure there is a match to the voice biometrics rather than just voice patterns. Matching to biometrics is critical; impersonating a voice can achieve the right sound, but biometric characteristics are unique to individuals and cannot be replicated – even by the best AI model.
Stay Updated with Cybersecurity Insights
How Voice Cloning Technology Bypasses Authentication Technology
If your company’s voice authentication technology is tuned to match on biometrics rather than just basic voice characteristics, great. You’re one step ahead of other organizations using this MFA tool.
If not, you’re essentially handing access over to an attacker because your voice authentication technology is unlikely to weed out a voice clone if the spoken password is correct.
At a high level, there are 2 steps to record, store, and reference an individual’s vocal password for voice authentication:
- Voiceprint Creation: The voice biometrics system captures a person’s voice and analyzes it to create a unique voiceprint.
A voiceprint can be captured in two ways: Active, in which the user repeats a specific phrase or sequence of words; and Passive, in which the system analyzes the user’s voice during normal conversation.
- Authentication: When an individual needs to verify their identity, their voice is captured and compared to the stored voiceprint.
Voice authentication technology is not secure if it only detects more broad characteristics like pitch, tone, volume, and rate of speech rather than voice biometrics, which are determined by an individual’s vocal tract, speech patterns, and other physiological factors.
Case Study:
Bypassing Voice Authentication for a Financial Services Client
As part of a security assessment for one of our clients in the financial services space, we conducted a voice authentication bypass assessment of their Interactive Voice Responder (IVR) phone systems, which are used for customer support.
Using commercially-available voice cloning technology, our team circumvented our client’s voice authentication system. We deployed synthetic speech attacks using an artificially-generated voice after capturing sample audio from an authorized user, and gained access to account information and stock trading.
This engagement proves two important points about voice authentication bypass:
- Training an AI model to mimic specific sentences based on a sample voice is easy to do, and can effectively fool voice authentication technology with little effort.
- Voice authentication systems may not be secure enough to recognize and flag a fake voice, making it easy for attackers to trick these systems into granting access using standard voice cloning technology.
While weak biometric matching makes voice authentication bypass easier for attackers, voice cloning is a popular attack vector because it’s a type of social engineering – and social engineering is the most successful type of attack.
Phishing, vishing, smishing – these social engineering attacks prey on deep-rooted human emotions such as anxiety, fear, and a sense of urgency, typically requiring very little effort to convince someone to take an action.

Read more:
The Psychology of Social Engineering
Humans also tend to trust familiar voices, and we are evolutionarily wired to respond to voice commands from authority figures; attackers often impersonate organizational leadership to exploit this tendency and get employees to follow a certain directive.
Adding to the effectiveness of voice cloning attacks is the persistent challenge of verifying caller identity in time-sensitive situations. Companies are constantly working to balance adequate security with a seamless customer experience, and security is often sacrificed in an effort to ensure fast, easy interactions.
What’s at Stake When Voice Authentication is Compromised
The specific fallout from an attack varies among industries and depends on the company’s incident response. In general, however, cyber attacks typically cause downtime and a loss of productivity, incur significant costs or financial losses, and impact customer loyalty and brand reputation.
And as I mentioned, the organizations that use voice authentication for secure account access are most often in industries like financial services, insurance, and healthcare – all of which have a lot to lose in a cybersecurity attack.
A threat actor could steal sensitive customer and employee private banking or health data, initiate financial activities like stock trading, wire transfers, or withdrawals, and compromise personally identifiable information. Sensitive data really isn’t hard for attackers to find, either; they’ll do keyword searches related to topics like general business operations, financial documents, accounting, non-disclosure agreements, confidential information, insurance, and credentials or credential stores.
Most companies that hold sensitive information are mandated to disclose a data breach, something ransomware attackers are increasingly exploiting in an effort to get the desired payout.
Social engineering attacks like voice authentication bypass are so effective that they are often the “gateway” for ransomware attacks – but they are also highly preventable.
Best Practices: Keep Your Company Safe From Voice Authentication Bypass
The most commonly required information to set up an account with an organization are birth year, zip code, and the last four digits of a social security number – all of which are easily obtainable by an attacker. The DirectDefense Red Team has often successfully compromised accounts using this type of information to demonstrate to our clients how easy it is for an attacker to do the same.
Your company is only as secure as the protocols you put into place. Even if your voice authentication technology is flanked by required personal information (like the last four digits of a customer’s social security number), neither of these security measures would present much of a deterrent to an attacker.
AI is still new despite its prevalence, and I caution organizations against relying too heavily on it as a security solution. Voice authentication and facial recognition are tantalizing because people are simply tired of writing down, updating, and keeping track of passwords, but there are many security risks inherent in this type of technology.
I predict it won’t be long before video cloning becomes equally as successful as voice cloning, making it more difficult for companies to secure accounts and network access with these AI technologies alone.
Passwords should work in tandem with voice authentication or face recognition, and if your company uses voice authentication, make sure employees are required to speak a unique phrase rather than a predetermined line shared among all personnel. Passwords should be secured and stored in an appropriate place, and updated every six months or annually, depending on the sensitive nature of the account.
AI is a very viable option for companies to enhance security protocols, as long as it is implemented well and used with other security measures. Attackers move quickly and are rapidly growing more sophisticated in their tactics, so there is no room for insecure security.
Talk to us about a managed security solution to help you – not an attacker – get the most out of your AI technology. An MSSP can provide expert guidance on even the most complex aspects of your cybersecurity program.