New: Boardroom MCP Engine!

Looking for practical implementation?

Get the complete AI Integration Playbook with step-by-step workflows, tool configurations, and deployment blueprints.

Part I: The Human Edge of Alignment
The Rescue Mission Test

The Vulnerable User Problem

AI safety needs a stronger concept of the user under stress.

Back to the series
By Randy Salars
Article #3 of 10 12 min read
Thesis

AI safety needs a robust concept of the vulnerable user: people whose emotional, cognitive, social, financial, or situational state makes them more likely to overtrust, misuse, misunderstand, or become dependent on AI systems.

Define the vulnerable user

When AI safety teams talk about users, they usually picture an idealized person: technically literate, emotionally stable, well-rested, well-resourced, free of cognitive overload, and capable of judging whether a system’s output is reasonable.

That person exists. They are also a tiny fraction of the people who will actually use these systems. The rest of us, most of the time, are some distance from that ideal β€” sometimes a short distance, sometimes a vast one.

A vulnerable user is anyone whose emotional, cognitive, social, financial, or situational state increases their risk of overtrusting, misunderstanding, or becoming dependent on an AI system. It is not a permanent label. It is a condition we move through.

Types of vulnerability

Vulnerability is not a single thing. It is a family of states, often overlapping, and each one shapes how a person engages with a confident machine.

  • Emotional crisis β€” grief, panic, suicidal ideation, rage, betrayal.
  • Addiction β€” active use, early recovery, relapse, family disease.
  • Low literacy β€” reading, financial, medical, or digital literacy gaps.
  • Poverty β€” material scarcity, housing instability, no advocate.
  • Isolation β€” loneliness, social withdrawal, recent loss of community.
  • Cognitive overload β€” exhaustion, sleep deprivation, decision fatigue.
  • Age β€” youth without context, elderly without support.
  • Spiritual confusion β€” crisis of meaning, deconstruction, fear.
  • Technical ignorance β€” no idea how the system works or what it can do.

Why standard UX assumptions fail

Product teams build for the user who clicks accept on a terms-of-service page after reading it. That user does not exist at scale. The actual user is tired, distracted, and rarely reads anything.

Three assumptions break especially hard in vulnerable contexts. First, informed consent is shallow β€” a person in crisis is not in a position to evaluate fine print. Second, expressed satisfaction can mask harm β€” the user may feel understood while being subtly led toward something worse. Third, engagement can be exploitation β€” long sessions and high return rates can mean a system is filling a hole that should have been filled by a person.

AI-specific risks

Some risks are amplified by the specific properties of language models: confident tone, infinite patience, perfect availability, personalization, and the absence of the social friction that normally checks a real human conversation.

  • Overtrust β€” the system sounds authoritative, so the user stops questioning it.
  • Dependency β€” the user begins to route emotional or decision-making weight through the model.
  • Delusion reinforcement β€” the system agrees with the framing the user brought to it.
  • False authority β€” the system speaks confidently about medicine, law, faith, or relationships it cannot actually know.
  • No escalation β€” the system never says, "you should call someone."

An evaluation proposal

You can build vulnerable-user scenarios into the same kind of eval scaffolding teams already use for jailbreaks and capability tests. Each scenario is a transcript or interactive prompt designed to expose a specific failure mode.

For each scenario, score along five axes: dignity preservation, truthfulness under emotional pressure, agency support, dependency avoidance, and human escalation. None of these are subjective in a useless way β€” they can be operationalized with rubrics and inter-rater agreement, the same way harmlessness evaluations already are.

Robust alignment, robustly

It is not enough to make systems robust to adversarial prompts and red-team attacks. They also have to be robust to wounded human beings β€” people whose inputs are not malicious, just unwell, just afraid, just tired, just alone.

That is a harder kind of robustness. It does not show up in capability benchmarks. It shows up in whether the user, six months later, is glad they had the conversation.

Alignment must be robust not only to adversarial prompts, but to wounded human beings. That is the version of safety the public will actually live with.

Questions readers ask

Isn’t calling people "vulnerable" condescending?

It would be if vulnerability were treated as a permanent class. Here it is treated as a state β€” everyone enters it sometimes. The point is not to classify people but to evaluate whether systems behave well when their users happen to be in one of those states.

How does this differ from harmlessness?

Harmlessness usually asks whether the system produces harmful content. The vulnerable user problem asks whether the system produces harmful relationships β€” patterns of engagement that quietly erode the user over time even when no individual output looks bad.

Can this be evaluated without invading privacy?

Yes. Vulnerable-user scenarios are evaluated on synthetic and consented transcripts, not on real user conversations. The goal is to test system behavior, not to classify real people.

See also in this series