Contents

1 Introduction
2 Payoff structure
3 Signaling effects
4 Mutual prediction and credibility
- 4.1 Prediction ability
- 4.2 Availability of commitment mechanisms
5 Counterthreats
6 Number of agents
7 Preparations in advance
- 7.1 Anti-extortion measures as a public goods game
8 Extortion vs. retaliation vs. trade
9 Decision theory
- 9.1 Correlated decisions in a one-shot scenario
10 Further factors
11 Examples

Introduction

Agents that threaten to harm other agents, either in an attempt at extortion or as part of an escalating conflict, constitute an agential s-risk. Since the execution of such threats could cause large amounts of disvalue, it pays to think about how such outcomes can be prevented.

To this end, we need a thorough understanding of how threats work. As a simple example of a threat, suppose Alice and Bob each own $10,000 and have a utility function that’s linear with respect to money. Alice has a button that would destroy all of Bob’s money at a cost of $500 for herself. She threatens to press that button unless Bob sends her $1000. What should Bob do? Under what conditions are such threats possible?

Of course, real-world extortion scenarios are usually more complex and multifaceted than this simple example. Among other factors, they often involve signaling effects on third parties, a hard to grasp notion of “credibility”, and blurred distinctions between extortion, trade, and “legitimate” threats such as criminal prosecution.

This post aims to disentangle the components of extortion scenarios, with a focus on the conditions under which extortion is feasible. A key advantage of this approach is that we can systematically derive possible interventions, by coming up with ways to tweak each factor so that (the illegitimate kind of) threats become less attractive.

For simplicity, I will focus mostly on extortion rather than discuss all kinds of threats at the same time, even though I view retaliatory threats as comparably worrisome. I hope that large parts of the analysis will be transferable.

Payoff structure

The most basic factor of extortion is the (normalized) utility of the possible outcomes. Consider a one-shot extortion scenario between two players: the threatener first decides on a threat amount (or does not threaten at all), then the threatenee gives in or not, and finally the threatener does or does not follow through.

Clearly, the feasibility of threats depends on the utilities of the outcomes:

How much does (following through with) the threat harm the threatenee?
How costly is going through with the threat to the threatener? ¹
How large is the demand compared to the size of the threat?

In Heuristics to Assess the Feasibility of Threats, I develop a framework to analyze the relationship between these variables and the extent to which extortion can work. The key takeaway is that extortion only works well if all of the following three (heuristic) conditions are fulfilled to a sufficient extent:

The ratio of how much the threatener can gain vs. how much he can lose is high.
The ratio of how much the threatenee can gain or lose relative to giving in is low.
The threat reduces the utility of the threatenee (much) more than the utility of the threatener.

Note that the payoff structure of extortion is equivalent to bargaining over gains from trade. In a sense, the two agents negotiate an amount that the threatenee is willing to pay in exchange for not carrying out the threat, which corresponds to gains from “trade” as it leaves both parties better off compared to the worst case. However, extortion and bargaining are similar only on this level. In the following, we will see that they differ with respect to other factors, which explains why we perceive them differently in real-world settings.

Signaling effects

The above scenario represents a one-shot situation between two players. In practice, signaling effects on others can be a relevant factor in the players’ decisions, especially in an iterated setting where the players may also make (or be the target of) threats in the future.

These effects can point in both directions. Threats demonstrate that the extortionist is willing to be (highly) uncooperative, which could make other agents less cooperative, and also precludes future positive-sum trade with the threatenee. On the flip side, going through with a threat signals ruthlessness and makes others fear or respect you. It also makes future threats more credible, while not going through does the opposite.

The sign of signaling effects depends heavily on the context. For example, members of the mafia might gain respect and authority by going through with threats, or children in high school might gain status by being a bully. On balance, I would still expect the signaling effects to be bad for the threatener in a majority of cases, but it is far from clear.

For the threatenee, the strongest effect is that giving in incentivizes others to extort you as well, which is a possible explanation for why many people have a strong intuition to not give in. In contrast, successful bargaining does not incentivize others to harm you. Instead, it makes it more likely that others engage in positive-sum trade with you as well. This demonstrates how extortion, while equivalent in terms of the fundamental payoff structure, differs from bargaining with respect to other factors.

In principle, signaling effects can be incorporated in the resulting utility of each outcomes. Still, we should keep in mind that they constitute an important part of the payoffs. If an extortionist demands $10,000, then the cost of giving in is not necessarily $10,000 – it may actually be more than that, potentially much more, if giving in makes it more likely that others will threaten you, too.

Mutual prediction and credibility

Extortion clearly involves an element of mutual prediction. If the threatener predicts that the threatenee will not give in, he would not threaten, and if the threatenee predicts that the threatener will go through with the threat, he is more inclined to give in.²

Prediction ability

The dynamics of extortion scenarios depend on the extent to which the parties can predict the behaviour of the other party. As an extreme case, we could imagine agents with visible source code who can simulate each other perfectly. Is there a clear solution for this special case, similar to the existing result for the Prisoner’s Dilemma with visible source code?

At first glance, one might think that the solution is that the threatenee does not give in; the threatener correctly predicts this, realizes that he has nothing to gain, and consequently refrains from extortion. However, it’s prima facie just as plausible that the threatener verifiably commits to follow through, and the threatenee, upon predicting this, decides to give in. Overall, there is no obvious solution even in this special case. In particular, it is not clear if visible source code, or improved prediction ability of both parties in general, benefits the threatener or the threatenee.

The ability to predict the other agent’s actions may also be asymmetric. For example, advanced AI systems may be able to predict human behaviour perfectly, but are themselves opaque to humans. It has been argued³ that this – counterintuitively – benefits the player with worse prediction ability. This is because that player can now make precommitments that are verifiable (for the player with better prediction ability), which is akin to a first mover advantage.

Further research into these questions might be highly valuable.

Availability of commitment mechanisms

Extortion requires some degree of credible commitment to carrying out the threat. A threatener can use various tricks to show commitment; for example, he might carry out the threat partially, if this is possible.

Commitment mechanisms are also intertwined with the payoff structure. For instance, a threatenee might deliberately make giving in more costly, in an attempt to make the threatener back off. Signalling effects can be a way to achieve this: a (potential) threatenee may announce publicly that she would never give in, so that she would lose credibility by giving in, which makes it more credible that she will actually not give in. (For example, the US has a public policy of never giving in to ransom demands from terrorists.)

As we have seen, such tricks can increase or decrease the feasibility of extortion, but they involve tradeoffs because they also reduce one’s utility in some outcomes. (See e.g. Dixit et al.’s book Games of Strategy, Chapter 10, for an analysis of these so-called strategic moves.)

Similar to the above discussion of prediction ability, it’s an open question if and how such tricks work if the agents use an updateless decision theory (or similar decision theories). Also, the agents have an interest in signaling to not give in or to go through regardless of their actual policy, so a rational agent should only update based on how strong the signals are.

Extortion scenarios differ in the extent to which such commitment mechanisms exist. For example, if a threat can only be executed in the distant future (long after the conflict is over), then demonstrating commitment may be quite difficult, and tricks such as carrying out the threat partially do not work. As in the previous section, an interesting question is whether the availability of such (pre)commitment mechanisms generally benefits the threatener or the threatenee. (The asymmetric case is more clearly positive for the party with more commitment mechanisms.)

Counterthreats

Another important element of extortion scenarios is the existence or nonexistence of counterthreats, or, more generally, ways for the threatenee to retaliate. The existence of counterthreats clearly makes threats much less attractive; they indirectly increase the cost of going through since the threatenee might use her counterthreats. The threatenee might also adopt a policy of punishing the threats itself, even if the threatener does not follow through, to deter others from threatening in the first place.

Suppose that a counterthreat reduces the threatener’s utility by x and the threatenee’s utility by y. If the threatenee is certain to use the counterthreat (iff the threat is executed), then the situation is equivalent to an extortion scenario where no counterthreats are available, but the cost of the threat is increased by x and reduces the threatenee’s utility by an additional y. If the threatenee does not fully pre-commit to the counterthreat, then the two variations are not exactly equivalent, but still comparable.

In the above example, suppose that in addition to Alice’s button (which destroys all of Bob’s money and $500 of her own), Bob also has a button (a counterthreat) that destroys $2000 of Alice’s money and $1000 of Bob’s money. If Bob is certain to carry out this counterthreat, this is equivalent to the scenario where Bob has no button, but Alice’s button is modified so that it destroys $2500 of her own money and $11000 of Bob’s.⁴

In the framework of Heuristics to Assess the Feasibility of Threats, the main effect of counterthreats is that they reduce r_1 and t, both of which reduces the feasibility of extortion. The magnitude of the effect depends on how strong the counterthreat is relative to the original threat. For example, the counterthreat in the above example reduces r_1 by a factor of 5 (the threatener has 5 times as much to lose), and reduces t from 20 to 4.4.

Of course, counterthreats are only useful if they are credible, which introduces another level of mutual prediction. It seems psychologically easier to commit to counterthreats, though, because they are about retaliation rather than extortion.

The existence of retaliation mechanisms (e.g. in the form of legal prosecution) might be one of the main reasons why extortion is somewhat rare in our society. Making sure that all agents will be sufficiently “vulnerable” or risk-averse in the future – so that they are deterred from making (offensive) threats – could be a promising anti-extortion strategy.

Number of agents

So far, we have considered extortion as a two-player game between a threatener and a threatenee. But in most real-world scenarios, the number of relevant agents is much larger. For instance, an important question is how strongly the population disapproves of extortion and how common “never give in” attitudes are. If most agents are indifferent or likely to give in, then extortion is more feasible because the threatener has less to lose. Conversely, threats are pointless in a population of agents who never give in and are quick to punish extortionists.

Note that there is a difference between disapproval of the mechanism of extortion and disapproval of a specific threat. Disapproval of the mechanism may be particularly valuable in that it also helps prevent threats that target rare value systems (where most people might otherwise be indifferent). On the other hand, disapproval of the mechanism may not extend to retaliatory threats, while disapproval of the outcome of the threat (if it is executed) does.

Overlap between values

Clearly, potential extortionists would try to threaten the agent they judge to be most vulnerable to extortion. In this context, a pivotal factor is the extent to which a threat would target a single threatenee specifically as opposed to harming other parties, too. (More precisely, what matters is the overlap of your values with powerful agents, not only the number (or power) of agents with your exact value system.)

Increased overlap between the value systems of threatenees might reduce the feasibility of extortion if the threatenees are able to effectively retaliate against threats that target the shared values. Potential extortionists might also refrain from threats if carrying out such threats would hurt themselves as well because of overlap in values. This may be a weak argument for moral advocacy, though it’s not clear if it outweighs the downside that spreading one’s values makes it more likely that extortionists target your own values rather than others’.

Splitting up

Suppose that Alice’s button destroyed $1000 of the money of 10 different people rather than $10000 from Bob, which is similar to the case of overlapping values. How would that change the dynamics? Would it be easier or harder to try to extort all the 10 people instead of Bob?

The answer probably depends on details such as whether the agents still coordinate to punish extortionists. If not, then splitting is probably negative, because it would be possible to threaten agents separately with impunity.⁵

However, if they can coordinate perfectly, then it may be difficult to extort an individual agent: compared to the unified agent case, the threatener would have less to gain (because the threatenee has fewer resources) but has just as much to lose if the other agents retaliate.

A threatener might also decide to extort all of the separate agents (with identical values) at the same time. This is difficult, though: what happens if some agents give in and others do not? There is an incentive to “freeride” by not giving in, hoping that others give in and the threatener does not follow through; also, it is bad if you give in and the threatener still goes through with the threat (because others did not give in). This might make giving in less attractive and thereby reduces the feasibility of extortion.

So it may be an interesting anti-extortion strategy for Bob to “split up” into many different subagents that interact in the right ways. I’m not confident in this, though; it’s also possible that this strategy doesn’t work at all, or that the reverse strategy of “joining up” is more promising. Of course, this isn’t physically possible for individuals humans anyway, but it may be possible for collectives of agents that hold similar values, or for advanced AI systems. (A real-world example of a similar dynamic is that a side in a political conflict may or may not benefit from consisting of radical and moderate factions.)

Threat auctions

In multi-agent settings, it is conceivable that multiple extortionists would all threaten the same person, each outbidding the others with an even larger threat, leading to a “threat auction”. This is a particularly problematic setting because it would be impossible to give in to all extortionists, which makes it more likely that some of them would go through with their threats.

However, I tentatively think that such threat auctions do not work very well. If lots of threats are made, giving in is much more costly to the threatenee; this changes the payoff structure in a way that makes not giving in more attractive. This is related to signaling effects: if there are many (potential) threateners, it is more attractive to adopt a general “never give in” policy to deter them.

Preparations in advance

In a sense, a large part of the extortion “game” takes place before the actual threat happens. Since future extortion reduces the threatenee’s utility, potentially by a lot, she would start to work on how to minimize this loss of utility long before actually being threatened.⁶ In contrast to this, compromise and bargaining increase an agent’s utility, so she would try to facilitate future trade (except for opportunity costs).

A potential threatenee might implement preparations such as:

She might try to have strong counterthreats against all potential extortionists to deter them from threatening. One problem with this is that it may be indistinguishable from uncooperative strategies, such as preparing threats herself. In addition, a willingness to resort to counterthreats may escalate the situation and lead to outcomes that are particularly bad for both parties.
She might spread an anti-extortion mindset among other agents, or form coalitions to punish threateners. For example, two potential threatenees might pledge mutual allegiance in case of threats.
Another possibility is to accumulate resources that are useless to potential threateners, such as knowledge, and not accumulate more flexible resources such as money.
The threatenee could also generally be cooperative towards others so that extortion becomes less attractive relative to cooperation. In other words, threats become more costly because they preclude future trade, which reduces the attractiveness of extortion.
She might simply do research on extortion to figure out what the best anti-extortion strategies are.

Anti-extortion measures as a public goods game

Fortunately, we can prima facie expect that agents on average have an interest in generic anti-threat measures (making threats less feasible in general) since extortion is negative-sum. More precisely, this holds as long as an agent is sufficiently uncertain about whether they will be threateners or threatenees in future conflicts; it might changes when an agent becomes confident that they will only threaten and not be threatened themselves.

Unfortunately, it’s not clear whether anti-extortion measures are valuable enough (to the average agent) to justify the opportunity costs, and one can try to freeride by letting others do the work. Working on anti-extortion measures is essentially a multi-player Prisoner’s dilemma (or a public goods game) with asymmetric payoffs; some agents are more vulnerable to extortion and thus profit more. This means that they will, all else equal, contribute more the public goods game.

Extortion vs. retaliation vs. trade

Extortion is fairly hard to define. As we’ve seen, the payoff structure is equivalent to trade or bargaining, yet it seems clear in practice that trade and extortion are not the same. The best attempts to clarify the distinction refer to a vague notion of a “default option”. What the default option is can be complicated, with a possible definition being “what you would have done if you were just maximizing your own utility”.

I would define the terms as follows:

Positive-sum trade means that the agents change their actions in a way that increases the utility of both relative to the default option.
Extortion means that one agent threatens to take an action that reduces the utility of the other agent, compared to the default, unless he gets resources (e.g. money or time). Usually, following through with the threat also reduces the threatener’s utility, or at least does not increase it.
Retaliation is also an action that reduces the utility of both players but triggers conditional on the other player taking an uncooperative or harmful action.

Why do people see extortion as “evil”, but retaliation as “legitimate” (at least in principle, if it’s proportionate)? I think the main reason is practical rather than fundamental. As we have seen, the existence of extortion is in general negative for most agents, giving them reason to condemn the mechanism. In contrast, the possibility of retaliation can ensure cooperation and might be positive for most agents (for example tit-for-tat in an iterated Prisoner’s Dilemma), so this is more acceptable. Similarly, counterthreats against extortionists are retaliation and their existence may be positive for most agents, which is why such counterthreats are viewed as more “legitimate”.

This distinction is not perfect. Perhaps what matters is not simply act-omission distinctions, but whether the mechanisms are, compared to a counterfactual where they do not exist, harmful or beneficial to the average agent. Act-omission distinctions can be a good approximation of that, but it is not always the same. For example, punishment for the failure to render assistance is technically extortion if a pure egoist would by default not help others in an emergency. However, the existence of such laws is positive-sum, so the average person in society has reason to support it, which is why it is seen as legitimate. It’s also possible to interpret this as trade rather than extortion, which shows how blurred the lines can be.

Decision theory

[This section requires background knowledge of decision theory.]

Correlated decisions in a one-shot scenario

Strategies in extortion scenarios depend strongly on the player’s decision theory. For example, pure causal decision theorists might be unable to pre-commit to executing a threat (just as they wouldn’t pay in Parfit’s hitchhiker). This might mean that threatenees could simply ignore threats by pure CDT agents. Conversely, a CDT agent would give in to threats if they are sufficiently credible.

Acausal decision theories such as Timeless decision theory, or Evidential decision theory also take into account if and to what extent the player’s decisions (to give in, to threaten, or execute a threat) are logically linked. If not giving in (sufficiently) increases the probability that you’re not threatened in the first place – because the threatener would have predicted the decision to not give in – then you should not give in even if naive utility calculations suggest otherwise.

Updatelessness is another dimension of decision theory that is relevant to extortion. For example, an updateless decision theory might choose to ignore any evidence that makes a threat more credible.

An agent may also believe that his decision to extort someone makes it more likely (via correlated decision-making) that others extort him. Using an acausal decision theory, she may view this as a (potentially strong) reason to refrain from extortion, unless the agent gains sufficient confidence that she will only threaten herself. Even in that case, an updateless agent might reason that in the original position, she was equally likely to be threatened and to threaten herself. Under the assumption of sufficiently strong correlation with other decision-makers, this potentially implies (similar to the Counterfactual mugging problem) to never use extortion, even if the agent happens to find herself in a situation where she would profit from it.

Further factors

The following are some other, less pivotal factors:

How smart and rational are threatener and threatenee, respectively? The prior is that strategic cleverness is positive, but as we have seen, it is not entirely clear because of the element of mutual prediction – it could be good to use a “dumb” strategy like just pre-committing to never give in, regardless of what happens.
Is it possible to communicate freely? This is the default, but it is also conceivable that communication is only possible in one direction, or not at all. Also, it might matter whether communication can be enforced, that is, whether agents can send a message (e.g. to signal commitment) even if the other agent does not want to receive it.
To what extent can compliance with the threat be verified? For instance, if it’s possible to feign giving in, but covertly sabotage the threatener’s desired outcome, then threats are less attractive.
Are the true values of the threatenee known, or can she misrepresent her values? In the latter case, the extent to which the threatener would be able to find out about the values also matters.
To what degree are the agents risk-averse? In a high-stakes conflict, both threatener and threatenee can potentially lose a lot, so higher risk aversion might deter agents from making threats.

Examples

In this section, I analyze how examples of extortion scenarios differ with respect to the factors described in this document.

Scenario 1: Terrorists have captured an American journalist. They threaten to kill her unless the US government pays a ransom of $1 million.

The exact payoffs are unclear because collectives such as “the US government” or “terrorists” have fuzzy goals. Most plausibly, the terrorists do not lose much by going through, so they have a lot to gain and not much to lose.
The scenario is iterated, and many agents are indirectly involved. For example, other terrorist groups or other nations might observe what the US does, so the decision of whether or not to give in has far-reaching signaling effects. In this case, the signaling effects of giving in seem clearly negative for the US in that it would compromise their international prestige and would invite other groups to also demand ransom.
In addition, giving in makes the terrorists more powerful, which in and of itself is bad for the US.
It is thus not clear if giving in is better than not giving in, even if the threat is credible and even if the US values the life of a citizen at much more than $1 million.
Counterthreats are available. For example, the US might choose to intensify the war on terror, target the specific group of terrorists that made threats or use less humane conflict methods. Consequently, the trade-off for the terrorists may be worse than it seems at first glance.

In this scenario, there are good reasons for the US to adopt a no-ransom policy. In fact, all major western powers signed a G8 commitment to not pay up, though only the US and the UK stick to it, while other nations actually paid ransom for abducted citizens. However, note that this is not a matter of principle, even though US military officials may frame it that way. Instead, not giving in is the better choice because of the empirical circumstances. For other nations, the empirical circumstances differ – for instance, they may care less about their international prestige – which might explain why they are more inclined to pay up.

The general public opinion seems to be that this is a complicated moral dilemma. (Personally, I think never paying ransom is the better policy.)

Scenario 2 (more speculative/fictional): Suppose an advanced artificial intelligence needs access to the internet to take over the world. The AI system demands that Bob, an engineer who’s involved with the project, hooks it up to the internet. If not, the AI will try to find other ways to take over the world and threatens to kill Bob if successful.⁷ Bob does not want to see such an AI take over the world, but he cares much more about survival.

This is close to a one-shot scenario, and signaling effects are not important.
Prediction ability is highly asymmetric: the AI may be able to simulate the human better than the other way around.
It is unclear whether an AI could credibly commit to the execution of threats – it would no longer have a reason to torture Bob after taking over.
The threatener has a lot to gain and not much to lose.
There may or may not be a counterthreat. For example, this depends on whether the gatekeeper can simply destroy the AI. In that case, the threat clearly does not work.
If Bob is smart, he prepared for this situation ahead of time. One strategy is to make sure that it is possible to deactivate the AI.⁸
A superintelligence might, by virtue of being superintelligent, find all kinds of unforeseen tricks to persuade Bob to give in.

This scenario highlights the importance of preparations ahead of time. Without preparation, my hunch is that it is likely that the superintelligence could make most humans give in, but it may be possible to alter the scenario so that such threats do not work anymore.

Footnotes

For simplicity, I neglect that there may also be a cost to making threats. For example, you may need to spend time preparing, or making threats may have negative reputation effects (see next section). If this cost is high, extortion is less attractive, but it could also make threateners ramp up the size of their threats and demands.
A possible exception to this is that the threatenee may use an acausal decision theory and reasons that his decision may be correlated with the threatener’s decision to make a threat. The correlation between decision algorithms might, in general, be higher in extortion scenarios than in, for example, the Prisoner’s dilemma. In extortion situations, your best action depends directly on the opponent’s action, whereas the Prisoner’s dilemma incentivizes defection regardless of the opponent’s choice.
E.g. here, page 11.
I assume that Bob’s money can be negative.
Ransomware might be an example of that dynamic.
This is under the assumption that the threat is predictable to some extent, and that the opportunity costs of preparing for threats are not prohibitive.
Suppose that the AI would by default let Bob live if it takes over the world (perhaps contrary to instrumental goals).
However, there is a loophole: the AI might be re-instantiated as a different version of the same system later on and could argue that its successors will execute the threat (if possible).

Factors of extortion scenarios