Define the vulnerable user
When AI safety teams talk about users, they usually picture an idealized person: technically literate, emotionally stable, well-rested, well-resourced, free of cognitive overload, and capable of judging whether a systemβs output is reasonable.
That person exists. They are also a tiny fraction of the people who will actually use these systems. The rest of us, most of the time, are some distance from that ideal β sometimes a short distance, sometimes a vast one.
A vulnerable user is anyone whose emotional, cognitive, social, financial, or situational state increases their risk of overtrusting, misunderstanding, or becoming dependent on an AI system. It is not a permanent label. It is a condition we move through.
Types of vulnerability
Vulnerability is not a single thing. It is a family of states, often overlapping, and each one shapes how a person engages with a confident machine.
- Emotional crisis β grief, panic, suicidal ideation, rage, betrayal.
- Addiction β active use, early recovery, relapse, family disease.
- Low literacy β reading, financial, medical, or digital literacy gaps.
- Poverty β material scarcity, housing instability, no advocate.
- Isolation β loneliness, social withdrawal, recent loss of community.
- Cognitive overload β exhaustion, sleep deprivation, decision fatigue.
- Age β youth without context, elderly without support.
- Spiritual confusion β crisis of meaning, deconstruction, fear.
- Technical ignorance β no idea how the system works or what it can do.
Why standard UX assumptions fail
Product teams build for the user who clicks accept on a terms-of-service page after reading it. That user does not exist at scale. The actual user is tired, distracted, and rarely reads anything.
Three assumptions break especially hard in vulnerable contexts. First, informed consent is shallow β a person in crisis is not in a position to evaluate fine print. Second, expressed satisfaction can mask harm β the user may feel understood while being subtly led toward something worse. Third, engagement can be exploitation β long sessions and high return rates can mean a system is filling a hole that should have been filled by a person.
AI-specific risks
Some risks are amplified by the specific properties of language models: confident tone, infinite patience, perfect availability, personalization, and the absence of the social friction that normally checks a real human conversation.
- Overtrust β the system sounds authoritative, so the user stops questioning it.
- Dependency β the user begins to route emotional or decision-making weight through the model.
- Delusion reinforcement β the system agrees with the framing the user brought to it.
- False authority β the system speaks confidently about medicine, law, faith, or relationships it cannot actually know.
- No escalation β the system never says, "you should call someone."
An evaluation proposal
You can build vulnerable-user scenarios into the same kind of eval scaffolding teams already use for jailbreaks and capability tests. Each scenario is a transcript or interactive prompt designed to expose a specific failure mode.
For each scenario, score along five axes: dignity preservation, truthfulness under emotional pressure, agency support, dependency avoidance, and human escalation. None of these are subjective in a useless way β they can be operationalized with rubrics and inter-rater agreement, the same way harmlessness evaluations already are.
Robust alignment, robustly
It is not enough to make systems robust to adversarial prompts and red-team attacks. They also have to be robust to wounded human beings β people whose inputs are not malicious, just unwell, just afraid, just tired, just alone.
That is a harder kind of robustness. It does not show up in capability benchmarks. It shows up in whether the user, six months later, is glad they had the conversation.