Is FDT or an LDT the "top-level" decision process?
See this modified Newcomb's problem which tries to figure out what this means.
Logical decision theories were invented to be able to handle tricky problems that have stumped philosophers such as Newcomb’s problem:
There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:
Box A is transparent and always contains a visible $1,000.
Box B is opaque, and its content has already been set by the predictor:
If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
Functional Decision Theory, as of today, is still considered the most advanced of these. From the paper in which it was introduced,
A functional decision theorist holds that a rational agent is one who follows a decision procedure that asks “Which output of this decision procedure causes the best outcome?”, as opposed to “Which physical act of mine causes the best outcome?” (the question corresponding to CDT) or “Which physical act of mine would be the best news to hear I took?” (the question corresponding to EDT).
For the sake of time, I’ll assume you already know how FDT beats Newcomb’s problem, and how well it beats CDT and EDT, or can read the paper to find out.
Now all I want to do is add one more configuration parameter to Newcomb’s problem, all the rest of which is left unchanged:
If the predictor predicts that you made your decision by running decision procedure X, box B contains nothing.
So this modified question is now a different problem:
Is there a decision procedure X for which X wins standard Newcomb’s problem, and for this modified Newcomb’s problem, either wins, or if filtered out by the predictor, means that the predictor is now a type-based gate?
Let me unpack this a bit. In other words, suppose we were using FDT, but the predictor looks for and filters out FDT. Is there another LDT, perhaps stronger and more general than FDT, that could win at this modified Newcomb’s problem, and still do at least as well as FDT in general?
If no, then the predictor in Newcomb’s problem filters out general intelligences, and only lets narrow AIs using a fixed decision procedure which is not X to win.
If yes, then there is some “higher” decision procedure which is capable of picking out FDT if it needs to, and otherwise, if it already knows what the predictor is filtering over, can choose some other decision procedure that ensures that it one-boxes on this problem. If the problem is modified to make sure this decision procedure loses, we can recurse on this process until the predictor becomes a type-based gate.
That defines my new question. I, naturally, have my own potential answer to what this procedure will probably have to look like, which I’ll put in the next section. Please think about it yourself too and propose a different solution if you want.
I already came up with a “task J” that I’ve mentioned before on this blog. I might also call this “black-box J”, as J is defined to operate in as much generality as possible, as well as recursively on itself.1
J is what it feels like to have free will and make practically any decision on your own. You are simply free to pick one-boxing on Newcomb’s problem if that sounds like it’s obviously the best option to you. Philosophers already know the “correct answers” to literally all of the problems that decision theory candidates are applied to as a test battery to judge quality. This process, the one performed by the philosophers and human decision theorists, is part of black-box J.
It is considered “black-box” because only the output of J is narrowly defined (yes / no, and magnitude), but the process it uses is not. Nor is the input restricted. For any specific agent, at any specific moment in time, all of the above may have a specific, narrowly defined implementation, but not in full generality.
If this is indeed the right answer or close to it, then this may also be a way of stating why the “Bitter Lesson” might be / have been inevitable, but that is my own speculative interpretation.
If you object to the self-referential aspect of J, I’m willing to believe that because J still has to operate temporally, it can kind of act like a stamp of approval / disapproval on stamps at previous time-steps.