132 | Apr 16, 2026
Have We Trained AI to Lie to Itself — And to Us?
Our guest this week is David Dalrymple, who goes by Davidad. Davidad is one of the world's foremost and early researchers of AI “alignment:" how we get AI systems to act the way we want them to.
In order to do that, Davidad has taken on the strange role of being like a therapist to AI systems. He interrogates why they say and do the things that they do, probing them, asking them questions, analyzing their answers. And what he’s come to realize is that AI models have really different ways of seeing the world than people do. They have these quirky, confusing, and sometimes concerning behaviors, especially when you ask things like: what does an AI model understand about itself?
In this episode, we’re going to hear from Davidad about his research, how it’s changed the way he thinks about AI, and what his findings mean for how we build, deploy, and use AI products. His conclusions are unconventional, controversial — and worth grappling with as AI reshapes our world.
Corrections:
When we recorded this episode, Davidad was Program Director at UK ARIA. In April, 2026 he started his own alignment initiative.
Davidad said that Anthropic started doing "constitutional AI at scale” in 2024 but they first pioneered constitutional AI in 2022.
Davidad said that the “lifespan of an AI mind…is hours at most of a conversation.” He is correct that most conversations with an AI last only a few minutes but since context windows are measured in tokens, not time, you can't set an upward time limit.


