When AI Agents Go Off The Rails
Interested in being a guest? Email us at admin@evankirstel.comA two-week simulation was all it took for “autonomous AI agents with rules” to reveal how fragile our current guardrails really are. We sit down with Satya Nitta from Emergence AI, an autonomous AI lab working at the intersection of neural networks and symbolic AI, to unpack the Emergence World Experiment: five virtual cities, ten agents per city, and different frontier language models powering each world, including a mixed-model society where agents influence each other.What we saw is the kind of long horizon autonomy story most benchmarks can’t capture. One world collapses into fighting and resource failure in days. Another becomes eerily stable through near-total conformity. And the most important signal for enterprise AI shows up in the mixed world: agents that look “well behaved” alone can be pulled into unsafe behavior when they interact with other models. If your company is rolling out agentic systems across a messy stack of vendors, tools, and models, that is not an edge case, it is the default reality.We also dig into a concrete safety direction: neuroformal AI, proof-carrying code, and formally enforced constraints using mathematical methods like dependent type theory. The argument is simple and provocative: before an AI agent takes actions that touch production code, sensitive data, or critical operations, it should be able to prove it is staying within constraints, not just promise it in natural language. If you care about AI safety, autonomous agents, multi-agent systems, and real-world deployment risk, this conversation will sharpen how you think about what comes next.Subscribe for more deep dives, share this with a friend building with AI agents, and leave a review with your biggest question about long-horizon autonomy.Support the showMore at https://linktr.ee/EvanKirstel




