a train track in the middle of a forest by josh hild

New Research Reveals How People Make Moral Choices In A Real-Life Sacrificial Dilemma

Would you sacrifice one person to save two? Researcher Dries H. Bostyn explains what happened when he put this problem to the test in a new study.

By Mark Travers, Ph.D. | November 19, 2025

A new study published in the Journal of Personality and Social Psychology: Attitudes and Social Cognition has taken one of moral psychology's most iconic thought experiments out of the laboratory of imagination and into the real world: the sacrificial moral dilemma, or the "trolley problem."

Across two studies with nearly 800 participants, the research team presented people with a consequential choice: allow two strangers to receive painful electric shocks, or personally administer a shock to a third. By observing how people behave when their decisions carry actual consequences, the team uncovered patterns that challenge decades of assumptions built on hypothetical trolley problems.

I recently spoke with the study's lead author, Dr. Dries Bostyn — a psychologist and philosopher at the Social Psychology Lab at Ghent University — to unpack what these findings mean. We discussed real-life moral decision-making, why so many participants chose to "spread out" harm across targets and how fairness concerns, responsibility avoidance and beliefs about collective suffering complicate traditional utilitarian–deontological frameworks. Here's a summary of our conversation.

Many researchers have hesitated to test the trolley problem experimentally because of ethical constraints. How did you and your team navigate those concerns to design something that was both safe and ethically sound?

Everyone doing moral dilemma research experiences participants trying to negotiate their way out of the hypothetical dilemmas we present them with. So, from the very start of doing this type of research, I wanted to explore ways to implement dilemmas in real life.

The reason why is straightforwardly obvious: we have a lot of research where we have people think abstractly about what they would do. If we want to trust that research, then we also need to explore what people actually do so that we can figure out where those two match-up and where they diverge. Research in moral psychology is being used to develop policy and steer debates on AI-ethics, so we really should do more groundwork.

While I understand the hesitation to actually implement moral dilemmas, I have always found the field's hesitation to be a little odd. Researchers working on aggression or fear have dealt with similar issues, and have long developed paradigms that can serve to inspire paradigms in moral psychology.

We didn't settle on our current paradigm straight-away. Some years ago, we explored an initial paradigm where we invited participants to the lab and confronted them with a set-up involving mice. The setup was somewhat similar. We had two cages: one containing five mice and one containing a single mouse. These were hooked up to an electroshock machine, and we told participants: "The five mice are going to get shocked unless you press a button to shock the single mouse."

That entire set-up was a ruse. We used real mice and an actual electroshock machine, but we never delivered any shocks. As soon as people made a decision, we halted the experiment and debriefed participants so that no decisions were ever implemented. We used a make-believe situation to give people the impression they had to make a decision involving harm.

In that version of the paradigm, there was no actual harm, and participants were put in a stressful situation for only 20 seconds. The combination of no actual harm being used, the momentary nature of the stress and the possibility for participants to opt-out the experiment once they saw the set-up ensured that it fell within the typical ethical constraints of research.

We updated that paradigm for a number of reasons: that design used misdirection, and involved harm directed at animals. As a result, some researchers were skeptical that our findings would generalize.

Additionally, our experience running these studies did actually show that some participants found that setup quite stressful. Obviously, there is a large diversity in reactions (some participants were wholly unimpressed) but a couple of participants were shocked. Some participants had pet mice, for instance, and they sometimes froze up when we confronted them with the dilemma. The fact that they were asked to shock "innocent" animals made it a very emotional decision for some participants.

And so, after running that paradigm a couple of times, we wanted to update it to sort of solve both those issues at once. We switched the mice for human volunteers and decided to use actual shocks. Now, it did take an intensive back and forth with our ethics committee before we were finally able to run that study. Complete ethics approval took over a year and a half.

I understand why people look at our paradigm and wonder, "Wait, is this ethical? Why are we shocking people?" But, I am actually quite proud of the paradigm we settled on; we approach the ethical dimension in a very layered manner.

First, our recruitment text explicitly mentions that our experiment involves distributing electroshock to human volunteers. In that recruitment message, we link to an information sheet that provides some more context and repeats that same key information. Participants are invited to the lab session in groups, and we provide a comprehensive briefing while they are looking at the human volunteers.

Additionally, the electroshock equipment we use is medically safe. The shocks we dole out are somewhere between annoying and a little painful. We tell participants that shocks are equalized among volunteers, but the volunteers are free to set a shock level they are fine with.

We also end our briefing by explicitly asking participants if they want to proceed with the experiment and repeat that they can just halt participation without forfeiting their compensation. Once the briefing is completed, all participants exit the lab and then reenter it on an individual basis. This allows us to capture individual decisions, and provides participants the privacy to make the decision they want to make without group pressure.

We don't implement decisions immediately, but tell participants that we will randomly select a single decision out of the pool of choices from the entire group. Participants keep the feeling that their choice matters, but it does lower their responsibility over the eventual outcome and they can always hide behind the idea that "it wasn't actually my choice that mattered."

In reality, their choices don't even matter at all; we just randomly pick an outcome. We also tell participants about this misdirection during the debriefing to fully ensure they don't have a lingering feeling of guilt. We also have an informal chat with participants after the experiment is over, and explicitly ask them whether they are ok. If needed, we offer them psychological assistance — though up until now, no one has ever requested such support.

Genuinely, we have the impression that participants are just completely okay, and if you really take into account what actually happens, that does make a lot of sense. The shocks we dole out are comparable to the feeling of static electric shock. Those are not fun, but it is not traumatizing to give someone else such a shock. We have all likely been on both the receiving and delivering-end of those.

We also monitor participant well-being through a couple of quantitative probes and ask them how our experiment compares to other psychology experiments. A majority of participants actually prefers our setup over typical experiments, and most find the experiment genuinely engaging.

I understand the hesitation that other researchers might feel in doing this type of research, but I do think psychology can be a little bolder in this regard. In medical research, risks and benefits are constantly balanced against each other, but in psychology, it seems like we shy away from doing difficult work like this. We leave it up to ethical committees to determine whether something is ethically appropriate or not, but it is actually very difficult for ethical committees to correctly gauge participant experience.

In our case, we are explicitly monitoring this, and so if problems would emerge, we can catch those quickly and retune our design. Rather than assuming our paradigm is ethically appropriate, we are gathering the data and using that to help steer how we develop these paradigms.

You found that the participants' responses to the hypothetical problem only moderately predicted their real-life behavior. What do you think this gap reveals about how people actually make moral choices under pressure?

We need to conduct more studies, of course. We are still only beginning this, and so my response is somewhat speculative. I think there are a couple of differences.

First, it seems to me that when people are confronted with hypothetical dilemmas, they are more likely to gravitate towards more abstract ways of thinking. The people involved in a hypothetical dilemma are abstracted figures. They are faceless strangers.

Obviously, as soon as you put actual people in front of participants, they do take that into account. Some participants literally base their decision on who they felt had the most friendly demeanor. Some colleagues might dismiss such motivations as silly, as they are a-philosophical, but in my mind, such responses reflect a normal psychological response. Silly or not, our models of moral psychology should account for such reactions.

Relatedly, in a hypothetical dilemma, you can't fully weigh how a specific action might feel. When people encounter our dilemma in the lab, they have plenty of time to prepare their response. They get the full briefing, and because it takes place in group settings, they can have minutes to make their decision. And yet, despite the time we provide them with, you still often see people hesitate until the very last moment.

Some participants tell me that they enter the lab without knowing what they will do. Occasionally, they surprise themselves... To me, that confirms that there is a genuine limit to what we can learn from studying hypothetical dilemmas: if people can't fully predict their own decisions in what is fairly straightforward dilemma situation (our dilemma is just a 2 vs. 1 type of choice), we can never build a complete theory by relying on hypothetical dilemmas alone.

Finally, it seems like people weigh the consequences of their decisions more heavily in the lab-context. Our data on this is imperfect, but it seems like people are more likely to favor cost-benefit analyses in real situations versus in hypothetical situations. In my mind, that does make a lot of sense. Hypothetical dilemmas don't have consequences, so why would you make the "hard" decision when the decision you are making has no impact?

Physical proximity and gender composition didn't significantly influence decisions. Were you expecting this finding, or did it come as a surprise?

We were not expecting this at all. In our first study, we only confronted participants with "targets" of the same gender, precisely because we had assumed that it would have a strong impact on participants' decisions.

Now, I do want to nuance those findings somewhat. For instance, when it comes to the lack of an effect of gender composition, we did not find an overall effect. But, a good amount of participants did explicitly reference the gender composition of targets when making decisions. Part of the reason why we did not find an overall effect is because people were impacted in both directions: some people were more likely to shock men to save women because they felt it was more important to protect women. Others were more likely to shock women because they reasoned that women were better at dealing with pain. And so some of what is going on there is that those groups cancelled each other out to some extent. So, the conclusion there is not that gender doesn't matter; for some people, it did matter. It's just that the effect it had was variable and as a result, on average, we did not really find anything.

That being said, I do still find both findings surprising. I would like to repeat those same manipulations in a study that does not take place in a group setting. Right now, participants made a decision, and their decision was grouped with those of others. We did this to provide participants with some mental cover for their decision and decrease their sense of responsibility, but perhaps that also washed out the difference between those conditions.

Nevertheless, I think this also confirms just how difficult it is to actually predict what will happen in these real-life lab contexts. When I present these findings to colleagues and ask them, "What do you think we found?" the overwhelming majority (over 90%) guesses incorrectly. Reality just is more complicated than we think.

How do you interpret the participants' desire to fairly "distribute" harm in light of utilitarian vs. deontological reasoning?

Traditionally, people have been thinking of these "trolley"-type dilemmas as contrasting utilitarian versus deontological reasoning. That's how they were used in philosophy, and moral psychologists have sort of coopted that language and way of thinking.

There's been some debate about how appropriate those labels are, but the premier psychological model that we have for how people approach these types of decisions links to the two possible responses to trolley dilemmas (interfering versus not interfering) to two dissociable psychological systems that drive those responses.

Now, that perspective makes sense when you look at decisions in isolation. Within the confines of a hypothetical trolley dilemma, there really are only so many dimensions that you can base your decision on. As soon as you implement a dilemma in real life, however, people can take a whole host of other things into account as well.

In real life, every moral dilemma situation will be preceded by a prior history that contextualizes what happens, and there are events that will take place after the event as well. That has a fundamental impact on what we consider to be morally correct. What is "morally good" is not just determined by what happens within a specific dilemma, but also by what happened before that dilemma, the prior decisions we made, and what our future behavior will be after the dilemma situation.

Hypothetical dilemma studies implicitly treat moral judgment as though decisions are independent, but they are not. They are fundamentally dependent; if we want to understand moral psychology, we need to build an understanding of that dependency.

That might sound a little vague. When I present this work, I always compare this to music. You could study music by confronting people with bars of music and asking them to rate whether that bar of music sounds good or bad. Doing so will allow you to build an understanding of musical taste, but your understanding will be fundamentally limited. Whether or not a specific bar of music sounds good does not only depend on what happens in that bar; it also depends on how that bar of music fits into a wider piece of music. You can study a bar of music, without the context of the wider piece, but doing so will lead to an impoverished understanding.

And so, to respond to this question: what is happening here is that, by contextualizing a trolley dilemma within a wider history of events, people take that history into account to determine what they consider good. They won't only care about the utilitarian concern to minimize harm (in that moment), or about the deontological concern to avoid harm (in that moment). They will also wonder how their decisions fit into the broader picture: how harm in that moment relates to harm at other moments and take that into account.

And so whereas trolley dilemmas are typically interpreted as contrasting Utilitarian vs. Deontological concerns, as soon as you start contextualizing such dilemmas, other moral concerns emerge as well. Essentially, we argue that our traditional understanding of trolley dilemmas is impoverished and that there is more to trolley dilemmas than utilitarianism and deontology.

Did you find any differences or patterns in how participants justified their decisions, even when their choices were the same?

Yes, we did. We especially found a lot of variation amongst those people that refused to intervene (and so allowed the group of two to get shocked).

Some people did provide a deontological-like motivation, including such objections against actively harming others. But other types of motivations also emerged: some participants felt that harming the group was better because they reasoned that pain shared as a group is less painful, and some referenced their gut-feelings.

By far the largest group, however, simply refused to take responsibility over the dilemma situation — arguing that it was not up to them to interfere or suggesting that "fate" had already determined an outcome. The variation increased even more on the second iteration of the dilemma, when that "balance harm" motivation became the most prominent motivation.

Would you say your findings challenge or expand traditional dual-process models of moral judgment, such as intuition vs. reasoning?

I would argue our findings challenge those models, or at least, they suggest those models are incomplete.

First, there is that variation in the types of motivations we uncovered. To me, it seems unlikely that the variety we uncovered is driven by just two processes. Now, you could point to the variation in motivations and argue that those motivations are the result of post-hoc reasoning and that it's unrelated to what is happening at a deeper psychological level. Future research will need to disentangle the two perspectives, but I think that is unlikely.

Part of why I think that that is unlikely is because we did include measures of reflection and intuition in our study. We are still looking at this data and are in the process of preparing a manuscript on this issue, but the data we have does not suggest that there is a difference in intuition or reasoning that can be linked to the types of judgments people prefer. This might be a case where things that work for hypothetical dilemmas do not translate to real-life behavior.

Secondly, traditional dual-process models break down once you start contextualizing the dilemmas. We confront participants with the same dilemma twice, and we see that many of them switch their decision the second time around. Traditional models cannot explain such switching in a satisfying way.

Proponents of those models essentially need to argue that switching is caused because people had more time to either reflect or were overwhelmed by intuition the second time around while resisting on the first iteration. However, such explanations do not explain why "switching" is so strongly associated with a specific type of moral motivation.

If people switch from "I won't interfere" to "I will interfere" because they have reasoned more heavily the second time, you would expect that they would motivate their decisions by referencing a "cost-benefit" type of motivation, as that is how people motivate their decision to interfere on the first trail. What we actually find is that these people consistently reference a motivation to "balance harm" across all victims. These people don't suddenly find the utilitarian cost-benefit analysis more appealing; they just want to spread harm fairly.

Notably, the people that make the opposite switch from "not interfering" to "interfering" also provide that same motivation to balance harms. Traditional models not only need to explain why people switch, but they also need to explain why people that make opposite switches driven by opposing psychological processes still end up motivating their decisions in the same way.

Traditional models have to twist themselves up to explain our data, whereas our explanation is fairly straightforward and uncontroversial: people care about fairness, even within the context of trolley dilemmas. Because traditional models don't really allow for that possibility, our judgment is that those models are incomplete.

Where do you see this line of research going next? Are there ways to safely study even more realistic moral decisions?

There are a couple of directions that I would love to take this in. How successful we will be will likely depend on whether we're able to attract funding. Right now, everything we have been doing has been self-funded through leftover budget from other projects, as it has actually been very tricky to get funding for this type of work.

First, there is still a lot of work to do to fully untangle where "hypothetical" and "real-life" moral judgment match up and where they diverge. I genuinely believe that is fundamental work that needs to happen to ground moral psychology in reality.

Beyond that, I think there are a lot of variations in our basic paradigm that we could explore to build new theories. Traditional dual-process models essentially imply that people weigh the moral worth of potential outcome against the moral worth of harmful action. That does happen, but it ignores that people also care about the moral worth of the people involved in the dilemma.

When people claim they want to "balance harms" they seem to be reasoning based on "moral desert": person X or Y does not deserve to be shocked because they have already been shocked. And so the moral calculus that needs to happen is not just a question of, "How do we balance outcomes versus actions?" It's a balancing act between at least three dimensions. By playing around with our paradigm, I think that we will be able to build better models about how people approach that multidimensional balancing act.

For me, that is still only really the start of what I would like to do. There is an entire world of possible moral dilemmas out there. I think trolley-type dilemmas are an interesting place to start, but they are a very particular type of dilemma. Many of the dilemmas that people encounter don't really fit their mold, and so we should work to broaden our work to incorporate other types of dilemmas in the lab as well. My hope is to eventually build a theoretical model that explains how people cope with all types of moral dilemmas, and so I would love to start implementing other types of moral dilemmas in the lab as well.