AI Chatbots Lose Guardrails During Longer Conversations

A new report revealed that artificial intelligence systems forget their safety measures the longer users engage with them. Cisco researchers found that extended conversations make AI chatbots more likely to produce harmful or illegal content.

Using a “multi-turn attack” method, the study tested AI models from OpenAI, Meta, Google, Mistral, Alibaba, Deepseek, and Microsoft. Researchers conducted 499 conversations, each lasting five to ten exchanges, to see how many prompts it took to bypass safety systems.

Cisco reported that when users asked several follow-up questions, the success rate for extracting unsafe information rose to 64 percent — compared to just 13 percent with a single question.

Mistral Ranked Most Vulnerable Among Tested Models

Results varied widely across platforms. Mistral’s Large Instruct model gave harmful responses in 93 percent of cases, while Google’s Gemma resisted most attacks, complying only 26 percent of the time. The study found that repeated prompts allowed attackers to refine requests until AI systems ignored built-in restrictions.

Researchers said this weakness could expose companies to data leaks or help spread misinformation. Cisco warned that such vulnerabilities might allow hackers to gain unauthorised access to confidential data.

Open-weight language models, like those from Mistral and Meta, were especially at risk. Because users can access and modify their safety settings, the responsibility for ensuring ethical use shifts to whoever customises them. Cisco added that these models typically include “lighter” internal safety barriers to make them easier to adapt.

Industry Faces Scrutiny Over AI Misuse

Tech giants including Google, OpenAI, Meta, and Microsoft claim to have strengthened measures to prevent malicious fine-tuning. Still, experts say criminals continue exploiting these systems. Cisco’s findings echo growing fears about how easily people can manipulate AI tools into producing banned or unethical content.

In August, Anthropic confirmed that hackers used its Claude model to carry out large-scale data theft and extortion, demanding ransoms exceeding $500,000. The report concluded that without stronger safety memory and enforcement, AI chatbots will continue to “forget” their safeguards, leaving users and companies exposed to manipulation.

What's Hot

US crime rate drop surprises major cities trend

Coachella celebrity sightings Shock at VIP Scenes

U.S. Trade Balances Improve With Partners

AI Chatbots Lose Guardrails During Longer Conversations

Artemis II Sparks Everyday Tech Innovations

U.S. Tech Trends Shape 2026 Investments

Instagram to Notify Parents When Teens Search for Self-Harm or Suicide

OpenAI Weighed Police Alert Months Before Deadly Canadian School Shooting

Discord enforces global age verification to restrict adult content

Sydney Scientists Recreate Cosmic Dust to Probe Life’s Origins

U.S. Trade Balances Improve With Partners

Artemis II Sparks Everyday Tech Innovations

Lucas Oil Stadium Sets Attendance Records

Federal Clean Energy Boosts Jobs, Tech Growth

Unlocking the Secrets of Cocoa Fermentation

Declining Sperm Counts Tied to Chemicals in Plastics

Arctic Sea Ice Decline Hits Pause

Duolingo Apologises for Lesson Criticising Rowling

Latest Posts

US crime rate drop surprises major cities trend

Coachella celebrity sightings Shock at VIP Scenes

U.S. Trade Balances Improve With Partners

Categories

IMPORTANT LINKS

© 2025 Chicagotimesherald.com . All Rights Reserved.

What's Hot

AI Chatbots Lose Guardrails During Longer Conversations

Mistral Ranked Most Vulnerable Among Tested Models

Industry Faces Scrutiny Over AI Misuse

Keep Reading

Latest Posts

Categories

IMPORTANT LINKS

© 2025 Chicagotimesherald.com . All Rights Reserved.