A new report revealed that artificial intelligence systems forget their safety measures the longer users engage with them. Cisco researchers found that extended conversations make AI chatbots more likely to produce harmful or illegal content.
Using a “multi-turn attack” method, the study tested AI models from OpenAI, Meta, Google, Mistral, Alibaba, Deepseek, and Microsoft. Researchers conducted 499 conversations, each lasting five to ten exchanges, to see how many prompts it took to bypass safety systems.
Cisco reported that when users asked several follow-up questions, the success rate for extracting unsafe information rose to 64 percent — compared to just 13 percent with a single question.
Mistral Ranked Most Vulnerable Among Tested Models
Results varied widely across platforms. Mistral’s Large Instruct model gave harmful responses in 93 percent of cases, while Google’s Gemma resisted most attacks, complying only 26 percent of the time. The study found that repeated prompts allowed attackers to refine requests until AI systems ignored built-in restrictions.
Researchers said this weakness could expose companies to data leaks or help spread misinformation. Cisco warned that such vulnerabilities might allow hackers to gain unauthorised access to confidential data.
Open-weight language models, like those from Mistral and Meta, were especially at risk. Because users can access and modify their safety settings, the responsibility for ensuring ethical use shifts to whoever customises them. Cisco added that these models typically include “lighter” internal safety barriers to make them easier to adapt.
Industry Faces Scrutiny Over AI Misuse
Tech giants including Google, OpenAI, Meta, and Microsoft claim to have strengthened measures to prevent malicious fine-tuning. Still, experts say criminals continue exploiting these systems. Cisco’s findings echo growing fears about how easily people can manipulate AI tools into producing banned or unethical content.
In August, Anthropic confirmed that hackers used its Claude model to carry out large-scale data theft and extortion, demanding ransoms exceeding $500,000. The report concluded that without stronger safety memory and enforcement, AI chatbots will continue to “forget” their safeguards, leaving users and companies exposed to manipulation.
