Empathy Toward AI Versus Human Experiences

Empathy Toward AI Versus Human Experiences

Featured Article

JMIR Mental Health | 2024, Vol. 11, p. 1-13

Article Title

Empathy Toward Artificial Intelligence Versus Human Experiences and the Role of Transparency in Mental Health and Social Support Chatbot Design: Comparative Study

Authors

Jocelyn Shen, MS; MIT Media Lab, Cambridge, MA, United States

Daniella DiPaola, MS; MIT Media Lab, Cambridge, MA, United States

Safinah Ali, MS; MIT Media Lab, Cambridge, MA, United States

Maarten Sap, PhD; Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States

Hae Won Park, PhD; MIT Media Lab, Cambridge, MA, United States

Cynthia Breazeal, PhD; MIT Media Lab, Cambridge, MA, United States

Abstract

Background: Empathy is a driving force in our connection to others, our mental well-being, and resilience to challenges. With the rise of generative artificial intelligence (AI) systems, mental health chatbots, and AI social support companions, it is important to understand how empathy unfolds toward stories from human versus AI narrators and how transparency plays a role in user emotions. 

Objective: We aim to understand how empathy shifts across human-written versus AI-written stories, and how these findings inform ethical implications and human-centered design of using mental health chatbots as objects of empathy. 

Methods: We conducted crowd-sourced studies with 985 participants who each wrote a personal story and then rated empathy toward 2 retrieved stories, where one was written by a language model, and another was written by a human. Our studies varied disclosing whether a story was written by a human or an AI system to see how transparent author information affects empathy toward the narrator. We conducted mixed methods analyses: through statistical tests, we compared user’s self-reported state empathy toward the stories across different conditions. In addition, we qualitatively coded open-ended feedback about reactions to the stories to understand how and why transparency affects empathy toward human versus AI storytellers. 

Results: We found that participants significantly empathized with human-written over AI-written stories in almost all conditions, regardless of whether they are aware (t196=7.07, P<.001, Cohen d=0.60) or not aware (t298=3.46, P<.001, Cohen d=0.24) that an AI system wrote the story. We also found that participants reported greater willingness to empathize with AI-written stories when there was transparency about the story author (t494=–5.49, P<.001, Cohen d=0.36). 

Conclusions: Our work sheds light on how empathy toward AI or human narrators is tied to the way the text is presented, thus informing ethical considerations of empathetic artificial social support or mental health chatbots.

Keywords

empathy; large language models; ethics; transparency; crowdsourcing; human-computer interaction

Summary of Research

“Empathy, the sharing of emotions with a social other, is foundational in developing strong interpersonal ties and mental well-being. With the rise of large language models (LLMs) and increase in chatbots for social companionship and mental health, it is crucial to understand how empathy toward artificially intelligent agents manifests and what the social implications of this phenomenon are. In particular, commercial chatbots often display anthropomorphism by adopting their own identities or experiences. Current artificial intelligence (AI) systems hold the ability to express social and emotional influences through the mechanisms of empathy, which can lead to downstream impacts in the real world” (p. 1-2).

“Prior works generally indicate that perceptions of AI can change depending on transparency. Most works find that knowledge of AI involvement reduces the perception of the agent or quality of interaction and that there are fundamental qualities of “humanness” in texts written by people, but that fostering trust and acceptance can lead to more empathy toward an AI agent. Grounded by these works, we hypothesize that empathy toward AI-written stories, both generated and retrieved in response to a user’s own personal story, will be significantly lower than empathy toward human-written stories whether the author is disclosed [H1]. We also hypothesize that people will be more willing to empathize with AI stories when the author of the story is made transparent, as the output could be perceived as more trustworthy” (p. 2). 

“We conducted 4 crowd-sourced studies with a total of 985 participants to assess the effects of author origin on empathy. Within each session, participants wrote their own personal stories and rated empathy toward stories written by people or by ChatGPT. The retrieved stories were matched based on similarity of the embeddings of stories, and generated stories were generated on the fly, given the user’s story as a prompt. We used ChatGPT to generate a set of stories using seed stories from the EmpathicStories data set. Stories generated by ChatGPT (gpt-3.5-turbo) were prompted with a context story and the following instruction: Write a story from your own life that the narrator would empathize with. Do not refer to the narrator explicitly” (p. 2).

“In summary, we find that empathy is higher for ChatGPT-generated stories than ChatGPT-retrieved stories; total empathy toward the story is generally higher for stories written by humans than AI, but that transparency creates greater willingness to empathize with AI” (p. 8). When examining “cross-comparisons between ChatGPT-written retrieved stories (H-CR) and ChatGPT-generated stories (H-CG), we find that empathy is higher for ChatGPT-generated stories rather than ChatGPT-retrieved stories. Interestingly, we find that empathy is higher toward ChatGPT-generated stories than human-written retrieved stories. Thus, we did not validate that humans would empathize more with human-written stories in all conditions” (p. 8).

“In studies H-CR and H-CR+T, we find that people significantly empathize less with retrieved AI-written stories than human-written stories, which is in line with and supports previous research findings [26,27]. We find that empathy decreases most between human-written and AI-retrieved stories in H-CR+T when we are transparent about the author of the story. This indicates that knowing when a story is written by AI alters our empathy toward that story and ability to relate to the narrator, possibly because AI is conveying experiences that are not its “own” (p.9). 

Translating Research into Practice

“From our studies, we show that retrieval of human-written stories can encourage human-human empathy rather than empathy toward AI systems, which has broader implications in the digital mental health domain. Large, pretrained generative models do not truly experience the situations present in stories. As such, mental health or social support chatbots powered by AI represent a population sourced from large quantities of human data, but still fall short of human-written stories in their empathic quality. This appropriation of human experiences could be subverted by using AI to instead retrieve more empathically similar texts between human authors, such as in social support group settings via web, or to mediate human-human communications, such as between the patient and therapist. 

To ensure ethical deployment of chatbots and LLMs more broadly in the mental wellness domain, the field of AI has historically advocated for transparency as an ethical design tenet. The more transparent a system is, the more agency one has in the way they use it. However, we show that in framing interactions with stories, a one-sentence disclosure of the author significantly decreased empathy. This finding might be in tension with systems that rely on empathy for efficacy, such as in persuasive technologies that use bonds with AI to improve mental wellness outcomes. The empathy and transparency trade-off might not be mutually exclusive, as transparency can breed trust, which also influences interaction. Our work paves directions research should be conducted to understand long-term effects of transparency on the outcomes of chatbots for mental health” (p. 9).

Other Interesting Tidbits for Researchers and Clinicians

“The primary limitation in our study design is that not all participants were exposed to all conditions. Given the number of conditions (varying generation or retrieval and transparent or not transparent author), we opted to mix within-subject comparisons and cross-study comparisons, resulting in a less clean study design. However, given the size of our online study, with around 200 participants per study, our results are still statistically sound. Future work can aim to replicate our findings with different study designs to confirm the psychological insights’ soundness. In addition, given the nature of crowdsourcing and the demographic pool of participants we surveyed, it is important to ensure that findings are replicated in other diverse populations. Although our studies were roughly balanced by gender, prolific respondents are predominantly White. Future work can assess the impact of identity on empathetic reaction to stories told by AI systems. 

Another limitation of this work is that the quality of stories written by users may have affected the generated or retrieved stories from ChatGPT and the human-written stories database. This could have downstream effects on the user’s empathy toward the story. Although we did not explore this confound in this paper, comparisons between human-written and AI-written stories were both conditioned on just the user’s story. Our findings indicate that, at large, empathy patterns shift depending on transparency of the author but did not explore personal nuances in the quality of the user’s story. Future works can aim to quantify the quality of written stories and how this might affect empathetic response. 

This work focuses on human perceptions of AI story sharing, which can have implications in chatbot design. Such implications are extendable to mental health or social support chatbots that have their own identities or self-disclose their own personal experiences. However, these implications might not apply specifically to chatbots that serve the function of delivering therapy sessions without story sharing. As such, future work should explore the role of transparency regarding machine-like quality or human-like quality in mental health chatbot sessions that are not specific to story sharing. 

Finally, there is still a key question to be asked about what role the agent should play in mental health domains, and where empathy fits into this context of human-AI interaction. In traditional patient-therapist relationships, therapeutic alliance, or the working relationship between the two, is a key component and leads to stronger patient outcomes. AI chatbots have been designed to model this type of alliance through verbal empathy or expressing their understanding of the user. The stories presented to the study participants in this study are one way an agent can demonstrate empathy. It is important to note that disclosing personal anecdotes as a form of empathy is different from traditional therapist-client relationships, where the therapists typically shared limited information about themselves. However, there are other supportive relationships or interactions, including companion agents or coaches, that could be mediated by AI technologies. This work paves interesting future directions for how to think about the presentation of model outputs in the context of empathy and personal experiences across a multitude of domains” (p. 9-10).