Imagine this: You’re in a courtroom and have been convicted of a crime. Before announcing the sentence, the judge consults a risk assessment tool – an AI system designed to promote consistency and reduce human bias in sentencing. The algorithm generates a score meant to help the judge weigh rehabilitation potential versus public risk. It’s intended to be impartial. Would you trust the tool to fairly assess your case? Would the judge?
As inherently opaque AI systems are being increasingly implemented across sectors influencing real-world, sensitive outcomes, expressions like “Trust in AI”, “Trustworthy AI” and “AI you can trust” have become not just cornerstones of several AI governance and ethics frameworks, policy papers and corporate manifestos, but also established buzzwords in the greater public debate about AI.
But what does trust actually mean in the context of AI? And how is this different from building systems that are trustworthy? What exactly does it take for an AI system to qualify as trustworthy? Does the algorithm work as expected? Does it treat us fairly? Is it used and developed by people we trust or regulated by systems we believe in? Does it benefit humanity? It doesn’t lie or manipulate?
These aren’t just semantic distinctions; in fact, trust and trustworthiness are often confused in public, political, and even academic discourse because they are different but deeply interrelated concepts
Trust vs. Trustworthiness: A Crucial Distinction
In everyday language, we might say we “trust” a tool to work or “trust” a system to produce results. But philosophically, trust is more. Trust constitutes a core component of human relationships and behaviour and is thus more of a relational stance. It’s an attitude involving vulnerability, expectation, and often moral judgment (O’Neill, 2018; Ryan, 2020). We grant, withdraw, or calibrate trust in relationships involving emotions, responsibility, intent, and the potential for betrayal (Ryan, 2020; Reinhardt, 2022; Baier, 1986).
Trust in technology isn’t quite the same. A machine can fail, but it can’t betray us. Unlike humans, it can’t choose to honour or break our trust (Duenser & Douglas, 2023). However, as Glikson and Woolley (2020) argue, human trust in AI involves both cognitive trust (based on perceptions of reliability, competence, or utility) and emotional trust (based on affective responses or social connection). These two modes of trust differ in their origins and consequences, and the distinction is crucial. Rational trust may be calibrated over time based on performance, but emotional trust can be influenced by things like anthropomorphism or perceived empathy. Both matter in how people engage with AI systems.
Trustworthiness, on the other hand, is not an attitude or affective state, but a normative property. It refers to whether a person, institution, or technology meets criteria that justify the trust placed in it (Simion & Kelp, 2023). And while “trust” and “trustworthiness” are conceptually distinct, the two are still deeply intertwined. Without trustworthy systems, trust may be misplaced. Yet without trust, even highly trustworthy systems may be rejected or feared.
In the context of AI, “trustworthy AI” mainly refers to frameworks that establish and propose exactly these criteria to justify and define a system’s trustworthiness. This lies at the heart of recent efforts to define “trustworthy AI.” Leading frameworks, like the EU’s Ethics Guidelines for Trustworthy AI and the US NIST AI Risk Management Framework, attempt to make trustworthiness operationalisable and measurable (Ethics Guidelines for Trustworthy AI, 2019; Raimondo, 2023). They propose that AI should be lawful, ethical and technically robust, and then break this down into principles such as safety, fairness, privacy, transparency, explainability, and accountability (Simion & Kelp, 2023).
In theory, these tools should help and guide developers to align AI systems with societal values. However, in practice, these frameworks often struggle to operationalise such broad ideals. As Reinhardt (2022) points out, “trustworthiness” becomes a conceptual sponge, absorbing every good intention into a vague, feel-good label. What does it really mean for an AI system to be fair or transparent? What happens when these values conflict? The risk is that trustworthiness becomes a checklist or branding tool, rather than a deep, context-specific commitment.
From Trustworthy to Trust Building
This instrumental logic treats trust as a product and fails to differentiate trust and trustworthiness. It treats trust as something to be engineered through design features, explainability dashboards, or clever UX choices. But trust, as Jacovi et al. (2021) argue, can’t be built like software. People may trust systems for irrational reasons or distrust them despite good design. When people are told to trust systems they don’t understand, trust can easily be betrayed. Research by Miller (2018) confirms this: users are more likely to trust systems they can make sense of and more likely to withdraw that trust when they feel manipulated or misled, which is precisely why efforts that make AI explainable to the users are crucial. But explainability is not enough, and better software does not necessarily make us trust AI.
The deeper problem is that trust in AI is about much more than just the algorithm. It’s about everything around it. Who developed it? Who profits from it? What data was used? What regulations apply? What values are baked into the model, and whose interests are being served? AI, as Ryan (2020) reminds us, is a socio-technical system. Trust in it reflects not just system performance but also institutional credibility, democratic accountability, and perceived fairness. Yet many discussions still focus narrowly on the technology, ignoring the messy social, economic, and political realities underneath (Duenser & Douglas, 2023). Instead of asking whether a system should be trusted, we ask how we can make people want to trust it. Trust becomes a UX problem, not a question of ethics or governance. But the stakes are high: algorithms now influence hiring decisions, credit scoring, medical triage, border control, and policing. Misplaced trust in these systems can lead to discrimination, harm, and human agency erosion.
Some scholars have tried to reframe the discussion. Jacovi et al. (2021) introduce “contractual trust”, grounded in expected performance and perceived risk. Simion and Kelp (2023) offer a functionalist view: AI systems are trustworthy if they reliably fulfil their intended function within ethically acceptable boundaries. However, while these models help to contextualise the type or extent of trust that we “can” have in these systems, they highlight the same basic point: trust isn’t just about what the system does. It’s about how we relate to it as humans embedded in broader social systems.
So, can we trust AI? Perhaps the better question is: when is trust in AI appropriate, and when is it not? What distinguishes a system that is merely reliable from one that can be trusted? What structures and safeguards are necessary to ensure that trust, once granted, is not misused?
Ultimately, trust isn’t a feature. And trustworthiness isn’t a checkbox. Both are complex, evolving concepts that must be continuously interpreted, debated, and enforced. If we get it wrong and reduce trust to marketing and trustworthiness to compliance, we end up watering down what it means to trust and to be trustworthy.
References
Baier, A. (1986). Trust and Antitrust. Ethics, 96(2), 231–260. https://doi.org/10.1086/292745.
Duenser, A. & Douglas, D. M. (2023). Whom to Trust, How and Why: Untangling Artificial Intelligence Ethics Principles, Trustworthiness, and Trust. IEEE Intelligent Systems, 38(6), 19–26. https://doi.org/10.1109/mis.2023.3322586.
Ethics guidelines for trustworthy AI. (2019). Shaping Europe’s Digital Future. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.
Glikson, E. & Woolley, A. W. (2020). Human Trust in Artificial Intelligence: Review of Empirical Research. Academy Of Management Annals, 14(2), 627–660. https://doi.org/10.5465/annals.2018.0057.
Jacovi, A., Marasović, A., Miller, T. & Goldberg, Y. (2021, 1. März). Formalizing trust in artificial intelligence. https://doi.org/10.1145/3442188.3445923.
Miller, T. (2018). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007.
O’Neill, O. (2018). Linking Trust to Trustworthiness. International Journal Of Philosophical Studies, 26(2), 293–300. https://doi.org/10.1080/09672559.2018.1454637.
Raimondo, G. M., U.S. Department of Commerce, National Institute of Standards and Technology & Locascio, L. E. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf.
Reinhardt, K. (2022). Trust and trustworthiness in AI ethics. AI And Ethics, 3(3), 735–744. https://doi.org/10.1007/s43681-022-00200-5.
Ryan, M. (2020). In AI We Trust: Ethics, Artificial Intelligence, and Reliability. Science And Engineering Ethics, 26(5), 2749–2767. https://doi.org/10.1007/s11948-020-00228-y.
Simion, M. & Kelp, C. (2023). Trustworthy artificial intelligence. Asian Journal Of Philosophy, 2(1). https://doi.org/10.1007/s44204-023-00063-5.
Thiebes, S., Lins, S. & Sunyaev, A. (2020). Trustworthy artificial intelligence. Electronic Markets, 31(2), 447–464. https://doi.org/10.1007/s12525-020-00441-4.
Further Reading/Watching/Listening:
Books & Articles:
Christian, B. (2020). The alignment problem: Machine Learning and Human Values. National Geographic Books.
Nowotny, H. (2021). In AI we trust: Power, Illusion and Control of Predictive Algorithms. Polity.
Weizenbaum, J. (1976). Computer Power and Human Reason: From Judgment to Calculation. W H Freeman & Company.
Videos & Podcasts:
“In AI We Trust. But Should We?” Dr. Aaron Hunter talks at TEDx Talks (2024).
