Study Finds AI Chatbots Decline in Performance During Extended Conversations

February 22, 2026

Recent research conducted by Microsoft Research in collaboration with Salesforce has shed light on a notable challenge faced by advanced artificial intelligence chatbots during prolonged interaction with human users. The study analyzed a vast dataset of over 200,000 conversations involving leading AI models including GPT-4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, OpenAI o3, DeepSeek R1, and Llama 4.

AI Models Struggle in Multi-turn Dialogues

The investigation revealed that these state-of-the-art AI systems frequently experience difficulties in sustaining coherent and accurate communication across natural, multi-turn conversations. As dialogue progresses with alternating exchanges, the models are reported to often lose track of context and generate increasingly erroneous or nonsensical responses.

For human interlocutors, this degradation in model performance manifests as a gradual decline in conversational quality, sometimes described as the AI becoming noticeably less informed or ‘dumber’ over the course of interaction. This phenomenon is coupled with an uptick in hallucinations, where the AI produces statements that are factually incorrect or lack grounding in the conversation.

These findings are significant because they highlight key limitations in the current generation of conversational AI. Despite substantial advancements in natural language processing and machine learning, maintaining comprehension and relevance in extended, dynamic exchanges remains a difficult hurdle.

The results stem from a comprehensive assessment of diverse and prominent AI technologies, providing valuable insights into how different architectures and training approaches impact dialogue management. The study underscores the importance of ongoing research to enhance the reliability and contextual awareness of chatbots during longer discussions.

While AI chatbots have become instrumental in a wide range of applications—from customer support to content generation—the tendency to degrade over time in conversation poses challenges for real-world deployment where sustained quality interaction is critical. Understanding these weaknesses is a step toward engineering more robust and dependable AI communication systems in the future.

A study by Microsoft Research and Salesforce reveals major AI chatbots struggle to maintain coherence in long, multi-turn dialogues.