Models Are Descriptive Not Prescriptive

Sep 08, 2024

This is a story about another thing that you might expect should be obvious, except that if it was, it would surely have been noted somewhere as some kind of principal assumption or axiom. Therefore we may assume it is not obvious, and consider its counterfactual.

There are a couple of ways of asking the relevant question:

Is each of, or any of, {Game Theory, Decision Theory, Economics, Psychology, Philosophy, …} descriptive or prescriptive? (One could also say “normative”, but I wanted it to rhyme.) That is, is its purpose purely to model behavior, or is it to discover what one should do if one found themselves in an isomorphic scenario to the one used in a model given by one of the theories?

Let's consider why the answer to the question may not be obvious, and may be treated as either yes (normative) or no (not normative), depending on the context.

Actually, I am going to cheat a bit and tell you right now that the answer must be technically no, precisely because the answer depends on the context. And the context might be unknown, and the context of the context might be unknown, and so on and so forth.

There are certain fields such as philosophy, especially sub-fields such as ethics, that certainly aim to be prescriptive at times, and do not attempt to hide this. The fields that we will be mainly considering here are slightly more formal, most often being considered parts of mathematics rather than a part of philosophy (although there is certainly a bit of overlap). I will mainly be covering decision theory and to a lesser extent, game theory, but mainly any attempt to formalize rational thought in any respect.

Von Neumann and Morgenstern1 treat the mathematization of a subject as a great achievement that is not always possible depending on the maturity of that subject. For example, empirical data and measurement capabilities have to be fairly advanced in that field as a prerequisite. It needs to be possible to convert observations into variables that can be abstracted well and understood to have specific meaning in the context of a model. Furthermore, narrow sub-problems should be fully understood and mastered before widening the scope of understanding.

We expect such models to tell us what people will do if they are rational within specific scenarios. So it is possible to interpret these models both as telling us what one should do as well as what an observer should simply expect.

You can imagine a “normative” formal model as like a recipe, or algorithm, written down as verbal instructions or as a computer program, to be followed every time the conditions specified in the model are met, to the best of the ability of the agent to follow the algorithm as closely as possible.

There are at least some times where the above is truly normative: Such as the case where one has no idea what to do, but someone else - more knowledgeable than us - has prescribed a formula for us to follow and has given it to us.

So the question we’re really trying to answer is “what is ultimately normative?”

As you’ll see through the rest of this post, I think that there is most likely something problematic about wanting to have a formal theory that is also ultimately normative at the same time.2

Cases For Both Views

The following are a collection of essays and quotes that generally give the flavor of what it is like for a formal specification of behavior to be considered "normative."

Pro-Normative:

Functional Decision Theory

The fact that FDT works matters for real-world decision-making. Newcomblike problems are common in practice (Lewis 1979), and the issues underlying Newcomblike problems lie at the very heart of decision theory.

The consistent ability of FDT agents to achieve the best outcomes in fair dilemmas suggests that FDT may be the correct normative theory of rational choice.

With the exception of this paragraph:

This quote from the paper quotes Joyce saying something that could be interpreted as anti-normative, however.

Newcomb-like Problems Are The Norm

The reason that we care is this: Newcomblike problems are the norm. Most problems that humans face in real life are "Newcomblike".

Comment by Nate Soares:

My interest is in figuring out what the idealized process we want to approximate is first, and then figuring out the heuristics. The whole "Newcomblike problems are the norm" thing is building towards the motivation of "this is why we need to better understand what we're approximating" (although it could also be used to motivate "this is why we need better heuristics", but that was not my point).

(These are anti-normative with respect to what these decision theories consider lesser decision theories.)

Eliezer Yudkowsky, the primary researcher behind FDT, argues that such decision theories might be even more normative to an AI than they would be to a human (for the time being that’s how I interpret his remarks here, at least).

I don't think my own position advises people to do this in situations that wouldn't have been considered common sense before the sad invention of "causal decision theory"? Humans are not things that you can logically bargain with; that's a reason AIs eat them.

And here:

It doesn't do this with you. You cannot negotiate with It because you cannot verify Its future compliance. (Nor build an agreed-upon mutually-verified common superintelligence.)

Anti-Normative:

Hell is Game Theory Folk Theorems

Explaining Hell is Game Theory Folk Theorems

Now consider what happens when this Hell game is infinitely repeated. Weirdly, there are many more Nash equilibria than just “everyone puts 30”. Why? Same intuition as with the prisoner’s dilemma.
For example, here’s a Nash equilibrium: “Everyone agrees to put 99 each round. Whenever someone deviates from 99 (for example to put 30), punish them by putting 100 for the rest of eternity.”
Why is this strategy profile a Nash equilibrium? Because no player is better off deviating from this strategy profile, assuming all other players stick to the strategy profile.

From "Implicit Extortion" (Paul Christiano):

Paying off a (committed) extortionist typically has the best consequences and so is recommended by causal decision theory, but having the policy of paying off extortionists is a bad mistake.
Even if our decision theory would avoid caving in to extortion, it can probably only avoid implicit extortion if it recognizes it. For example, UDT typically avoids extortion because of the logical link from “I cave to extortion” → “I get extorted.” There is a similar logical link from “I cave to implicit extortion” → “I get implicitly extorted.” But if we aren’t aware that an empirical correlation is due to implicit extortion, we won’t recognize this link and so it can’t inform our decision.

From "In Defense of Open-Minded UDT" (Abram Demski):

Updateless Decision Theory (UDT) clearly keeps giving Omega the $100 forever in this situation, at least, under the usual assumptions. A single Counterfactual Mugging is not any different from an infinitely iterated one, especially in the version above where only a single coinflip is used. The ordinary decision between "give up $100" and "refuse" is isomorphic to the choice of general policy "give up $100 forever" and "refuse forever".

Decision Theories As Heuristics

I claim that both of these stories are applications of the rules as simple heuristics to the most salient features of the case. As such they are robust to variation in the fine specification of the case, so we can have a conversation about them. If we want to apply them with more sophistication then the answers do become sensitive to the exact specification of the scenario, and it’s not obvious that either has to give the same answer the simple version produces.

You'll notice that many of these examples are actually about a formal model giving us an answer that is potentially problematic by other standards.

Incidentally, the process by which decision theorists come up with a new decision theory is to construct a formal specification that is subjected to a battery of tests consisting of a set of decision problems, such as Newcomb's Problem, Counterfactual Mugging, and so on. Decision theorists usually already have an idea of what they want their decision theory to say on each of these problems. If one decision theory performs better than another on these tests, we usually say that such a decision theory is superior to the other. Therefore, whatever process we are using to choose a decision theory mimics a decision theory itself, and furthermore is taken to be even more normative than the decision theories we are trying to construct.

Is there ever to be a point at which we get to say we've found the best decision theory, and then can take it easy by following it ourselves for the rest of time?

Issues With Hidden Infinities, Ones and Zeroes

In this section, I want to briefly point out that there are also a lot of additional claims floating around that are intended as criticisms of specific decision theories which also come with a similar pattern: That is, they typically criticize the aspect of a decision theory which purportedly forces an agent to do one particular action in one particular situation.

The most-often cited case is Causal Decision Theory’s answer to Newcomb’s Problem. Now, you may already be quite familiar with this. Informally, the problem with CDT is that “it” “doesn’t believe in” the causal effects of its own decision process. That is, CDT carries with it several assumptions about how the physics of reality works, and these assumptions are quite arguably wrong.

I use those scare-quotes to emphasize the fact that it is also somewhat ambiguous who or what is doing the believing.

There are over-claims about what such-and-such decision theory actually does, or what specific types of agents (the type can be quite general) will always or never do.

From Wikipedia (section “Criticisms”):

Suppose you have committed an indiscretion that would ruin you if it should become public. You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay! FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed. Not being blackmailed isn't on the table. It's not something you can choose.

I should note that I disagree with some criticisms of decision theories (which by nature would make the case for them being anti-normative), such as this one. I disagree with criticisms like this because it has implicit claims about hidden zeroes, ones, and infinities in the parameters of the world-model. Now, decision theories do have hidden (and not hidden) zeroes, ones, and infinities in their parameters, due to their formality, but this is in fact where it gets complicated.

For example, in this criticism, it is implicitly part of the world model that the agent must consider future blackmail attempts. But also, that the agent can only currently consider either paying $1 or not paying at all. There is no such option for bargaining or negotiating, nor for modeling the blackmailing agent as another potential reasoner who may be using FDT themselves, and who therefore might update the options they present to the primary agent as they both perform FDT simultaneously.

Analyzing these criticisms is useful because it gives us a sense of how and why decision theories and formal models fail to satisfy our intuition.

Here’s another essay that makes very strong claims with hidden infinities:

Consequentialists can get caught in commitment races, in which they want to make commitments as soon as possible
Consequentialists are bullies; a consequentialist will happily threaten someone insofar as they think the victim might capitulate and won't retaliate.
Consequentialists are also cowards; they conform their behavior to the incentives set up by others, regardless of the history of those incentives. For example, they predictably give in to credible threats unless reputational effects weigh heavily enough in their minds to prevent this.

In general, because consequentialists are cowards and bullies, the consequentialist who makes commitments first will predictably be able to massively control the behavior of the consequentialist who makes commitments later. As the folk theorem shows, this can even be true in cases where games are iterated and reputational effects are significant.

Logical updatelessness and acausal bargaining combine to create a particularly important example of a dangerous commitment race. There are strong incentives for consequentialist agents to self-modify to become updateless as soon as possible, and going updateless is like making a bunch of commitments all at once.

(I am quoting this piece so much because it is a rare gem that states exactly what I don’t believe.)

However, keep in mind that even though I disagree with these kinds of conclusions, I also expect that confusion often has the property of being right about what one model sometimes says. Making a commitment is like setting a parameter in your world model to zero, one, or infinity, depending on how your model maps parameters to certain meanings. And, like I said earlier, decision theories often do have such parameters, and are perhaps distinguishable by which ones are free to vary or unknown, and which ones are not.

If we were take Nate Soares’ ambitions for what decision theory ought to be for us, as well as what we know about, e.g., Updateless Decision Theory, we might conclude something akin to what this piece concludes: That rational agents will be compelled to make decisions that we also consider to be extreme.

The Utility Function Contains The World Model

The following section contains similar points to those I made in Values and Capabilities Aren’t Very Distinct.

These are the axes / dimensions I'm claiming to be somewhat related:

<descriptive --------------- normative>

<less powerful --------------- more powerful>

<narrow AI --------------- general AI>

There is actually a more subtle philosophical point, here. This is related to my question, "Is FDT the top-level decision process?"

My current best guess is that the answer to that question is "No, and an LDT cannot be, because that's already too over-specified."

The only top-level decision process that is not so underspecified that it amounts to literally anything can only be a process that simply picks things out from other things. It distinguishes things, and therefore needs to be able to separate concepts from other concepts. To do this, it needs to be able to know what it likes. Since utility seems inseparable from being an agent, but decision theories can be chosen by something that likes and dislikes them, it seems like utility and the thing we're trying to find that picks things out from other things might either be the same thing, or aspects of each other in some way.

If I was an agent that only ran one decision theory at the "top level", such an agent would be equivalent to an agent with a utility function that favored only using this one decision theory, and no others. In other words, it would assign the maximum value of utility to using this decision theory and the minimum value to all others (and one could add that it assigns these values in all situations).

As a brief aside, keep in mind that a utility function is defined over "anything" so it's already underspecified in the way that we need it to be. One could add dimensions to the space it is allowed to be defined over, and this would be somewhat akin to the "computational capacity" of the agent possessing this utility function.

Note, for example, that in Q-learning, the Q function is learned via a reinforcement-learning style algorithm to be able to model the value of state-action pairs. This Q-function is often implemented as either a look-up table, or in more modern implementations, as a neural network. Selecting actions can be done by doing a search over the action space, with different variations that may involve Monte Carlo rollouts or simply choosing the best action in a single time-step. This works very well when the action space is discrete.

Even this “action selection” step amounts to a kind of quasi-decision theory. By the way, I don’t admit to having fully detailed knowledge about the process that actually selects actions. This process still has to value certain ways of selecting the action as more or less appropriate depending on the context. Other forms of machine learning, such as LLM text prediction, or reinforcement learning that only uses a “policy” network, don’t learn any value function (except purely implicitly). A policy is simply a mapping from non-deterministic noise to an action, learned via optimization with an outer loss function (which is of course our “higher” sense of normativity).

A “generator” is a bit like sampling from the space generated by the inverse mapping of the utility function.

Formality and Free Will

But there is a deeper question here about what the meaning of "formality" is. Suppose that I define my AI's behavior using a symbolic language (which can be a spoken or written language, containing both words as well as formal equations and algorithms). In this specification, it says that my AI does A when it encounters B. Perhaps it says this by saying it follows a certain decision theory or the like somewhere. Now, additionally suppose that I wanted this AI to be able to do something not within the formal specification, even if just rarely. Well, if I were to add that to the formal specification, it might say something like, "With probability p, do A when it encounters B. With probability 1 - p, do C when it encounters B."

But this is still just another formal specification. If I really wanted it to be able to follow any specification in principle, I might need to be able to make it add to its own specification in the same way that I did just now by adding alternatives. How does one program a formal specification for “at this moment, it is prudent to double-check this specification itself, searching for anything that could be an unknown or undiscovered parameter”?

I like to think of “free will” and “utility” as dual types of one another, in the following sense:

If I prefer A to B, and I am presented with a choice of picking only one of either A or B, then I am more predictable to myself in the sense that I believe it is more likely that I will pick A.

If I prefer neither one over the other, then it seems it may be equally likely that I pick either one. In this situation I am the least predictable to myself.

So it seems reasonable to conclude that my sense of free will has somewhat of an inverse relationship to the certainty I feel in what I prefer.

Additionally, although I may feel initially certain that I prefer A to B, I may consider that even that feeling of certainty could be uncertain, and therefore I may elect to occasionally pick B, in the full set of situations in which it appears I have been presented with the same choice. And I might in fact value having that sense of free will, which corresponds to a willingness to be open to the idea that I might be incorrect, or be missing or unaware of context that could be important to consider, and also variable. In other words, when presented with context (call it C for now), I don’t always know for sure that C is truly just C and nothing else. C could sometimes be C, X or C, Y, etc.

So this seems like an argument that relatively high assignments of value to certain states and actions corresponds to less free will and vice-versa. We also have an argument that formality is a lot like knowing exactly what context we are in, and valuing formality is a lot like valuing certainty (and knowing that spurious parameters do not affect the downstream conclusions).

Von Neumann and Morgenstern might say that it’s okay to value being able to make very precise predictions relatively easily, with use of mathematics, but that it is not something we always get to assume we have at the outset, and requires a very mature and advanced level of development.

Conclusion

In this piece, I’ve argued for why formal models of rational decision-making and behavior are hard to take as entirely normative in an absolute or ultimate sense. Certain attempts at establishing a theory of “executable meta-philosophy” such as the rationality project were ambitious in their aims, and in particular, sought to formalize such a philosophy using analogies to mathematics or computer programs. My position is that this is somewhat unreasonable to expect if there is an insistence on formality, since the “formality / informality” distinction seems to be an inherent mode of thinking of not just humans, but of any intelligent agent.

In Theory of Games and Economic Behavior, mostly in the introductory chapter.

I think the word “formal” might signify safety to some people, which is an intuition I slightly disagree with. Moreover, I think it may also be possible to formalize the sense of what we mean by “formality” as well as “informality”, though I do not attempt to give a full treatment of that in this post.