Minimum levels of interpretability for artificial moral agents
File(s)s43681-024-00536-0.pdf (3.17 MB)
Published online version
Author(s)
Vijayaraghavan, Avish
Badea, Cosmin
Type
Journal Article
Abstract
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making (MDM), also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent’s internal reasoning mechanisms for effective use and error correction. In this paper, we bridge the technical approaches to interpretability with construction of AMAs to establish minimal safety requirements for deployed AMAs. We begin by providing an overview of AI interpretability in the context of MDM, thereby framing different levels of interpretability (or transparency) in relation to the different ways of constructing AMAs. Introducing the concept of the Minimum Level of Interpretability (MLI) and drawing on examples from the field, we explore two overarching questions: whether a lack of model transparency prevents trust and whether model transparency helps us sufficiently understand AMAs. Finally, we conclude by recommending specific MLIs for various types of agent constructions, aiming to facilitate their safe deployment in real-world scenarios.
Date Issued
2024-07-31
Date Acceptance
2024-07-21
Citation
AI and Ethics, 2024
ISSN
2730-5953
Publisher
Springer Science and Business Media LLC
Journal / Book Title
AI and Ethics
Copyright Statement
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
License URL
Identifier
http://dx.doi.org/10.1007/s43681-024-00536-0
Publication Status
Published online
Date Publish Online
2024-07-31