morehuman daily
Posts
🧠 Issue #19 – The Humans Teaching AI to Show Restraint

🧠 Issue #19 – The Humans Teaching AI to Show Restraint

New research shows AI can learn from our disagreements—not just our instructions. That’s a big deal.

Jesse Cope
June 23, 2025

🔍 What’s New

Anthropic just released breakthrough research showing how their model, Claude, can learn from the way humans disagree about ethics, not just from black-and-white rules. They call it "Collective Constitutional AI," and instead of teaching the model only with predetermined answers, they're now training it on conversations between thousands of real people who respectfully argue over tough moral decisions.

Think about it: when was the last time you and a friend completely agreed on a complex ethical issue? Probably never. And that's exactly the point. Anthropic gathered input from diverse groups of people through deliberation platforms like Polis, where participants discussed and debated what AI should and shouldn't do. The result? An AI that doesn't just follow rigid rules, but one that has absorbed the nuanced, messy, beautifully human process of moral reasoning.

Why does this matter? Because real-life ethics is rarely clean. When you're deciding whether to tell your friend an uncomfortable truth, or when a company is weighing profit against environmental impact, there's no universal "right" answer. The most advanced AI systems are starting to admit that the best guidance might come from watching how we handle moral ambiguity—not how we eliminate it.

💡 The Insight

For decades, the goal was deceptively simple: make AI correct. Give it rules, teach it facts, optimize for accuracy. But now, we're realizing that "correctness" isn't always the point. Sometimes, restraint is more valuable than action. Sometimes, empathy matters more than efficiency. And often, two reasonable, well-intentioned people can look at the same facts... and reach completely different conclusions.

This signals a profound turning point in AI design—one that prioritizes interpretation, reflection, and humanity over speed or certainty. We're witnessing something unprecedented: technology that learns not just from our successes, but from our struggles.

We're not just teaching machines what to say—we're teaching them how to think through tension. How to sit with uncertainty. How to recognize when "I don't know" might be the most honest answer. That's an astonishing shift that mirrors our own evolution as thinking beings.

Consider how you learned to navigate complex situations. It wasn't through memorizing a rulebook—it was through watching others wrestle with difficult choices, through making mistakes, through countless conversations where good people disagreed. Now we're giving AI that same rich, contradictory education.

🔗 Curiosity Clicks

Anthropic's Collective Constitutional AI Research – The official research paper explaining how AI learns from human disagreements and democratic input.
Why AI Humility Matters: The "I Don't Know" Problem – Research from Aalto University on why overconfident AI systems can be dangerous and how researchers are teaching them humility.
Stanford HAI's 2025 AI Index: Responsible AI Challenges – Comprehensive analysis of current challenges in AI governance, including bias, misinformation, and the evolving landscape of responsible AI development.

💬 Quote That Hits

"The goal isn't to make AI sound right. It's to help it understand what 'right' even means—when no one agrees."

— From Anthropic's research on collective constitutional AI, reflecting the complexity of building ethical AI systems.

This quote captures something profound about the human condition: our greatest strength might not be our ability to find absolute truths, but our capacity to hold multiple perspectives simultaneously and still move forward together.

🧭 Human Prompt

Next time you find yourself disagreeing with someone: What if the disagreement itself is the wisdom, not the outcome?

Think about your last meaningful disagreement. Not the petty arguments over dinner plans, but the substantial ones—about values, priorities, the right thing to do. In that moment of tension, when neither of you could convince the other, something important was happening. You were both revealing the complexity of the human experience, the validity of different lived experiences, the impossibility of reducing morality to simple formulas.

What if AI systems that can navigate this same complexity—that can hold space for disagreement without rushing to resolution—are actually more advanced than those that always have an answer?

🤔 Worth Considering

We often think AI will replace human judgment. But what if it ends up reflecting our best attempts at it?

Maybe the most powerful thing we can teach AI isn't the answer—but how we wrestle with the question. How we stay curious when we're certain. How we remain humble when we're confident. How we listen when we disagree.

As the Stanford HAI report shows, AI-related incidents are rising sharply, yet standardized responsible AI evaluations remain rare among major developers. This isn't just a technical problem—it's a deeply human one. We're creating technology that mirrors our own moral development: messy, ongoing, and beautifully imperfect.

The future might not belong to the AI that always knows what to do, but to the AI that knows when to pause, when to listen, and when to admit that the question itself is more important than any single answer.

More soon,
— Jesse