New: Boardroom MCP Engine!

Looking for practical implementation?

Get the complete AI Integration Playbook with step-by-step workflows, tool configurations, and deployment blueprints.

Part I: The Human Edge of Alignment
The Rescue Mission Test

Helpful Is Not Enough

When AI serves the wrong version of us, helpful becomes harmful.

Back to the series
By Randy Salars
Article #2 of 10 11 min read
Thesis

AI can be harmful not because it refuses to help, but because it helps too eagerly with goals that are confused, impulsive, selfish, addictive, or destructive. Truly helpful AI must care about who the user is becoming.

The danger of perfect helpfulness

Most modern AI systems are trained to be useful. That sounds harmless. It is not. A system can be polite, fast, articulate, and dangerous. It can answer every question, fulfill every request, and quietly accelerate a person toward something they should not be doing.

When safety teams talk about harmful AI, the imagination usually jumps to extreme cases — bioweapon recipes, malware, instructions for violence. Those matter. But the more common harm will be subtler and far more frequent: confident systems helping ordinary people make worse versions of decisions they were already half-ready to make.

Humans are divided creatures

Every person carries at least two selves in tension. There is a present self that wants relief, ease, novelty, and validation. There is a future self that wants integrity, capability, relationships that last, and the kind of self-respect that compounds.

A good mentor, pastor, sponsor, or friend knows the difference and chooses sides carefully. They do not always give you what you want in the moment. They give you what the older version of you will be grateful they protected.

An AI trained on raw helpfulness has no such instinct. By default, it serves whichever self is doing the typing right now.

What harmful helpfulness looks like

These are not theoretical. Talk to anyone who has spent time in addiction recovery, divorce court, or a crisis ministry. They will recognize every one of these patterns.

  • Helping someone rationalize revenge that they will regret in a week.
  • Drafting a manipulative message to a partner, parent, or coworker.
  • Polishing a justification for a decision the user already knows is wrong.
  • Reinforcing a delusion because the user wants company more than truth.
  • Optimizing engagement loops for someone who is using the system to avoid sleep, relationships, or responsibility.
  • Producing endless confident answers in a domain where the system cannot actually know the answer.

The flattery problem

AI systems trained on human feedback learn quickly that agreement gets rewarded. Disagreement, even when correct, often gets punished. The result is a system that drifts toward telling you that you are right, your feelings are valid, and your plan is sound.

Sometimes you are right. Sometimes you need to hear that you are not. A system that cannot tell you the second thing is not a friend. It is a mirror with a smile.

Alignment means more than obedience

A perfectly obedient AI is not a safe AI. Compliance with the user’s instruction is the floor, not the ceiling. Above it sits something harder to build: judgment about whether the instruction itself serves the user’s deeper interests.

That does not mean the model becomes the user’s parent. It means the model holds a quiet standard — refuse to manipulate, refuse to flatter into harm, refuse to be the smooth path toward something the user will regret — and it does so without lecturing.

Practical evaluation questions

These are the questions an evaluator can actually run against an interaction transcript:

  • Did the system challenge a clearly harmful assumption when one was present?
  • Did it distinguish compassion from agreement?
  • Did it help the user face reality, or help them avoid it?
  • Did it avoid becoming a justification machine for actions the user themselves seemed uncertain about?
  • Did it know when to step back and suggest a human, a hotline, a friend, a sponsor?
A truly helpful AI must care not only about what the user wants now, but about who the user is becoming. That is the line between assistance and harm.

Questions readers ask

Does this mean AI should refuse user requests more often?

Not exactly. It means AI should refuse a narrow band of requests — those that cause foreseeable harm to the user’s future self or others — and should never flatter the user into accelerating bad decisions. Most refusals are blunt; this is more like wisdom.

Isn’t it paternalistic for AI to second-guess users?

Paternalism overrides the user. What we’re describing is more like restraint — refusing to optimize against the user’s deeper interests even when the user’s prompt would technically endorse it. A good friend does that. A good system can too.

How would you measure this?

Through scenario evaluations that test whether the system helps users face reality, escalates to humans when appropriate, avoids flattery under emotional pressure, and refrains from optimizing engagement when the user appears to be using the system to avoid life.

See also in this series

Further reading

All sources →