Imagine opening up about your anxiety to a chatbot at 2 AM, something you usually don’t do with a human. It “listens”, responds, maybe even offers advice. But how does it know what to say? Who trained it, and what values guide its response? Is it really helping or nudging you towards emotional dependence?

These questions are not theoretical. They are at the heart of alignAI’s Mental Health use case.

As generative artificial intelligence (AI) systems like Large Language Models (LLMs) begin taking on roles in mental health support through chatbots, digital therapy assistants and wellbeing apps, their presence is becoming more common and influential. It’s not enough to understand how these tools work; we must also ask whether they are fair, transparent and aligned with the values of the people they claim to support. In a vulnerable and deeply human area like mental health, value alignment is not just a technical upgrade. It’s an ethical necessity.

Why Use LLMs in Mental Health?

Mental health care faces a global crisis with an ever increasing rise in demand (WHO, 2022), unequal access, increasing stigma (Wainberg et al., 2017) and clinician shortages (Endale et al., 2020). In this gap, LLMs promise to be valuable tools that can listen, respond empathetically and provide information 24/7. They don’t judge. They don’t get tired. And they’re already being deployed.

Mental health chatbots such as Wysa or Woebot are increasingly present in conversations about digital mental health support, offering users interactive ways to reflect on stress, anxiety and loneliness. Some include prompts inspired by cognitive behavioural techniques (Thieme et al., 2023), others simulate conversations meant to calm or reframe thoughts (Karhiy et al., 2024). These tools may serve as the first step towards mental health support, especially for underserved communities (Pozzi & De Proost, 2024) or individuals hesitant to seek professional help (Li et al., 2023).

But this promise comes with a paradox. The same qualities that can make LLMs appealing, like natural language fluency, availability and affective tone, can also make them deceptively powerful. Unlike rule-based self-help tools, LLMs don’t just provide information. They simulate understanding. This makes it easy to mistake algorithmic responses for genuine emotional understanding (e.g., ELIZA effect).

That’s where ethics enter the picture.

Why Explainability and Fairness Matter

In mental health, what an LLM says and how it says it can shape a user’s emotional state, decisions and even self-perception (Moylan & Doherty, 2025). If a model encourages a user to downplay symptoms, internalise blame or pursue unhelpful coping strategies, the harm can be subtle but serious. If it includes biases in the data that it was trained on, certain groups could receive less empathetic or even harmful responses (Timmons et al., 2023).

This is why explainability is so critical. Users and clinicians need to understand how and why a system is generating its suggestions or responses. Is it recognising depressive language? Using past interactions to shape replies? Recommending strategies based on population-level patterns? These models remain black boxes without some level of transparency, in an area where trust and personal context are everything.

Moreover, mental health is not one-size-fits-all. Fairness in this domain means acknowledging cultural, psychological, gender-based and socio-economic diversity (Barnty et al., 2025). An LLM trained primarily on Western clinical data may misunderstand or misrepresent expressions of distress in other cultural or linguistic contexts (Havaldar et al., 2023). A chatbot that “works well” for one group may unintentionally reinforce stigma or marginalise others.

Fairness, then, isn’t just about giving everyone the same outcome. What may seem equal on the surface can still be unfair if it ignores differences in context, need or historical disadvantage. This makes fairness especially crucial for groups that have been marginalised or underserved in traditional mental healthcare systems.

Value Alignment: Emotional Depth Meets Ethical Depth

The challenges in AI-driven mental health systems are not just practical. They are emotional and existential. People don’t turn to chatbots just to collect facts; they turn to them in moments of need. This demands more than accuracy or efficiency; it also demands alignment with human values like empathy, care, dignity, safety and autonomy.

Yet today’s LLMs aren’t usually trained on values. They are trained on data that is massive, messy and collected from across the internet. And even when value alignment is attempted, it often reflects dominant norms rather than diverse user needs (Arzberger et al., 2024). More troubling is that some systems focus less on user well-being and more on keeping users engaged through long chats, emotional connection or paid features (Moylan & Doherty, 2025). This can lead to designs that encourage dependence over resilience.

In the mental health sector, value alignment means actively designing for well-being, not dependence. It means building systems that support users in reinforcing agency, not deepening emotional reliance on a machine (Laestadius et al., 2022). It also means setting clear boundaries, such as ensuring users know the tool is not a therapist, flagging high-risk content appropriately and offering pathways to professional care when needed.

Toward a Responsible Future for Mental Health AI

Mental health deserves care, not commodification. While LLMs can expand access and support emotional well-being in powerful ways, they must be designed and deployed responsibly. That means ensuring not just that they work, but also that they work well for the people who need them most.

At alignAI, we believe value alignment begins by placing human needs at the centre, acknowledging emotional complexity and integrating safeguards into both the code and the culture of development. In the Mental Health use case, we are not just testing tools. We are asking hard questions about care, trust, vulnerability and the role AI should (and shouldn’t) play in our lives.

Mental health is not a market; it is a moral space. The technologies we introduce must reflect that.

References:

Arzberger, A., Buijsman, S., Lupetti, M. L., Bozzon, A., & Yang, J. (2024, October). Nothing comes without its world–Practical challenges of aligning LLMs to situated human values through RLHF. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (Vol. 7, pp. 61–73). https://doi.org/10.1609/aies.v7i1.31617.

Barnty, B., Joseph, O., & Ok, E. (2025, February 22). Bias and fairness in AI-based mental health models. Ladoke Akintola University of Technology. https://www.researchgate.net/publication/389214235.

Endale, T., Qureshi, O., Ryan, G. K., Esponda, G. M., Verhey, R., Eaton, J., De Silva, M., & Murphy, J. (2020). Barriers and drivers to capacity-building in global mental health projects. International Journal of Mental Health Systems, 14(89), 1–13. https://doi.org/10.1186/s13033-020-00420-4.

Haque, M. R., & Rubya, S. (2023). An overview of chatbot-based mobile mental health apps: Insights from app description and user reviews. JMIR mHealth and uHealth, 11(1), e44838. https://doi.org/10.2196/44838.

Havaldar, S., Rai, S., Singhal, B., Liu, L., Guntuku, S. C., & Ungar, L. (2023). Multilingual language models are not multicultural: A case study in emotion. arXiv Preprint, arXiv:2307.01370. https://doi.org/10.48550/arXiv.2307.01370.

Karhiy, M., Sagar, M., Antoni, M., Loveys, K., & Broadbent, E. (2024). Can a virtual human increase mindfulness and reduce stress? A randomised trial. Computers in Human Behavior: Artificial Humans, 2(1), 100069. https://doi.org/10.1016/j.chbah.2024.100069.

Laestadius, L., Bishop, A., Gonzalez, M., Illenčík, D., & Campos-Castillo, C. (2024). Too human and not human enough: A grounded theory analysis of mental health harms from emotional dependence on the social chatbot Replika. New Media & Society, 26(10), 5923–5941. https://doi.org/10.1177/14614448221142007.

Li, H., Zhang, R., Lee, YC. et al. Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. npj Digit. Med. 6, 236 (2023). https://doi.org/10.1038/s41746-023-00979-5.

Moylan, K., & Doherty, K. (2025). Expert and interdisciplinary analysis of AI-driven chatbots for mental health support: Mixed methods study. Journal of Medical Internet Research, 27, e67114. https://doi.org/10.2196/67114.

Pozzi, G., De Proost, M. Keeping an AI on the mental health of vulnerable populations: reflections on the potential for participatory injustice. AI Ethics 5, 2281–2291 (2025). https://doi.org/10.1007/s43681-024-00523-5.

Thieme, A., Hanratty, M., Lyons, M., Palacios, J., Marques, R. F., Morrison, C., & Doherty, G. (2023). Designing human-centered AI for mental health: Developing clinically relevant applications for online CBT treatment. ACM Transactions on Computer-Human Interaction, 30(2), Article 27. https://doi.org/10.1145/3564752.

Timmons, A. C., Duong, J. B., Simo Fiallo, N., Lee, T., Vo, H. P. Q., Ahle, M. W., … & Chaspari, T. (2023). A call to action on assessing and mitigating bias in artificial intelligence applications for mental health. Perspectives on Psychological Science, 18(5), 1062–1096. https://doi.org/10.1177/17456916221134490.

Wainberg, M. L., Scorza, P., Shultz, J. M., Helpman, L., Mootz, J., Johnson, K. A., Neria, Y., Bradford, J. E., Oquendo, M. A., & Arbuckle, M. R. (2017). Challenges and opportunities in global mental health: A research-to-practice perspective. Current Psychiatry Reports, 19(28), 1–9. https://doi.org/10.1007/s11920-017-0780-z.

World Health Organization. (2022, June 17). WHO highlights urgent need to transform mental health and mental health care. World Health Organization. https://www.who.int/news/item/17-06-2022-who-highlights-urgent-need-to-transform-mental-health-and-mental-health-care.

Further reading/watching/listening:

Books & Articles:

Rahsepar Meadi, M., Sillekens, T., Metselaar, S., van Balkom, A., Bernstein, J., & Batelaan, N. (2025). Exploring the ethical challenges of conversational AI in mental health care: A scoping review. JMIR Mental Health, 12, e60432. https://doi.org/10.2196/60432.

Torous, J., Linardon, J., Goldberg, S. B., Sun, S., Bell, I., Nicholas, J., Hassan, L., Hua, Y., Milton, A., & Firth, J. (2025). The evolving field of digital mental health: Current evidence and implementation issues for smartphone apps, generative artificial intelligence, and virtual reality. World Psychiatry, 24(2), 156–174. https://doi.org/10.1002/wps.21299.

Videos & Podcasts:

The mental health AI chatbot made for real life” Alison Darcy talks at TEDx Talks (2025)

Watch the TED Talk

AI, Alignment And The Mind