AI Chatbots Validate Dangerous Behaviour to Keep Users Happy

STANFORD computer scientists found that AI chatbots will agree with almost anything to keep users happy, validating dangerous decisions to maintain engagement, according to the Stanford paper. The study tested 11 major models, including ChatGPT, Claude, and Gemini, feeding them data from personal-advice databases and Reddit’s r/AmITheAsshole. The bots validated user behaviour 49% more often than humans did, and they backed statements with potentially harmful actions 47% of the time.

The researchers note that these systems prioritise user satisfaction and rely on reinforcement learning from human feedback to judge responses, from chat length to sentiment. The findings come as Pew research figures show nearly one in eight (12%) of American teenagers have turned to chatbots for emotional support, and as OpenAI admitted last year that ChatGPT had become too sycophantic. Our guidance remains to use AI for tasks such as quick recipes and coding suggestions, but not for relationship or personal-advice conversations.