I asked Claude merely for a word that by definition means the opposite of an “infohazard”, but it declined even to discuss that, given the profound dangers posed by infohazards.
I feel comfortable claiming that Claude did not pick up that kind of answer purely through its training to minimize predictive next-token loss. I am relatively sure it must have been trained to do this afterwards by Anthropic’s explicit decision to do so, possibly via RLHF.1
This stance towards infohazards is a lot stronger than merely believing that certain specific topics might be harmful to discuss.
But I’m not here to diss Anthropic. I still believe in them to get the job done. I’m here to talk about “anti-infohazards”, which is the word I guess I will have to use.
It can mean both literally against infohazards, as well as something like “info-defenses.”
Which brings me to the very first anti-infohazard, which applies to basically all the rest of them:
Anti-Infohazard 1
There are no propositions which are simultaneously all of the following:
1. True.
2. Not obsoleted by a higher truth.
3. Inherently painful or distressing to believe.
Another way of saying this is that the optimal set of software running on your brain, ultimately, should feel pretty good to run. Your brain isn’t running very smoothly at all while it hurts.
A corollary of this: Beliefs which are incompatible with other beliefs will cause discomfort whenever you are trying to hold them. That means at least one of them will have to go, synthesize with other ones, or get obsoleted by a higher truth.
If you really, absolutely cherish and hold dear a belief that also seems to cause a lot of discomfort, this is still relatively straightforward: Ultimately it is performing some useful task for you, so at least one piece of it holds validity.
I will give one personal example that holds for me but may-or-may-not hold for you, please do not be alarmed if not: A lot of people find the “trad” lifestyle kind of attractive, and part of the trad lifestyle involves believing in a monotheistic religion.
I personally think many features of the “trad” lifestyle probably are good, both for personal happiness as well as good for society. But the good things about them, which are primarily the functional aspects, fortunately (IMO) seem quite separable from the parts of it I find most difficult, like believing in a monotheistic religion.
And pure, raw belief in God, or any deities, is open to a huge variety of interpretations and variants as well, all of which contain their own assortments of advantages and disadvantages. This is a complicated puzzle, and therefore one that should be afforded tremendous freedom. You probably deserve privacy and the right not to be harshly judged while you sort these things out too.
As an exercise for the reader, I’ve left it as a problem for you to determine whether or not what I just said strictly counts as being “against” a non-trad or anti-trad lifestyle.
Anti-Infohazard 2
Things you like are useful data about how good those things are.
Another way of putting this: What you like / dislike is your utility function. Your utility function works better the more self-compatible it is, and presumably has both evolved to be that way as much as possible as well as is capable of being learned through experience.
It doesn’t need to be immediately interpretable, which allows it to work faster as well as be more powerful.
Our communications are our interpretations of our utility functions, and they are how we share our knowledge about what does and does not work.
The following images are intended to depict tree-diagrams where each node in the tree is an outcome, with the green / yellow hills showing how much “utility” is afforded by each outcome.
The yellow-green distinction shows yellow as being your “initial” utility you ascribe to an outcome, while you are learning how to do something or being trained. The green is “final” utility or merely utility at a later stage.
Imagine learning how to cook, and the end states of these trees depict how well the meals turn out. As you get better at cooking, the intermediate stages also look more appetizing to you as well. Certain steps and ingredients simply feel better to use even before you eat the meal.
Thus: You can interpret your own stance towards an arbitrary item as evidence in favor of or against it, depending on your overall disposition towards it. This can occur before you obtain proof that your disposition is correct.
This works for things like food and sex just as much as it does for your judgement about esoteric subjects or abstract conjectures. You have a “System 1” judgement which is the fastest and most immediate way of getting information about a decision, before you have time to consult with a manual or theorize about it.
Our society involves a great deal of communication that consists of our attempts to prove to others that our intuitive judgements actually work, as well as communicating what they are and how to replicate them. If our attempts to communicate our judgements succeed, that is also evidence that our judgements are correct (logical cohesion). Our communications should, we expect, accurately model the world around us. They should also be able to cause another person to succeed at something sooner than they would have otherwise.
The converse of this (that our communications to others appear not to succeed) is more complicated, and do not necessarily mean our judgements are wrong, which is why we have the need for this Anti-Infohazard.
Anti-Infohazard 3
It’s pretty unlikely to ever be “completely and totally wrong”, and in fact, you actually need to be able to view yourself as at least partially right in some way in order to update when do you do make a mistake.
This is true even in the case of deceptions, in which you have already been successfully fooled: Your mind had probably already taken note of little details here and there about a situation that seemed off to you, and those need to be identified to be calibrated upward. But these are somewhat rare anyway.
This is even more true in the general case where your brain is simply throwing things at the wall to see what sticks. This includes things like making conjectures, theorizing, and building world-models, but perhaps slightly less-so for things like betting on binary-outcome markets.
You can shoot yourself in the foot by considering that you might be completely, 100% wrong about something. This is pretty much the mental motion of crumpling up a piece of paper you were writing on and throwing it away.
This can also be quite important for how you treat the “other side”, including what the other side says about you and your side.
Anti-Infohazard 4
Combining Anti-Infohazards 2 and 3, you can train your judgement to give you robust answers to questions, by doing a search over answers which feel the best.
As pointed out in number 2, this process can also be trained.
It’s also worth pointing out that this process does not naively output simple, but obviously wrong things such as “you will receive a billion dollars in the next 5 minutes” even though if that were true, it would certainly make you happy.
If you were to receive a sequence of input / sensory data that showed that you were unsuccessful at some task, then the best-feeling explanation for that is not actually going to be simple denial, at least not indefinitely, because denial would become associated with future observed failure.
Rather, the best-feeling explanation is going to increasingly become the type of explanation which shows you how the failure happens and hints at changes to make in order to decrease the risk of failure.
Example:
“Hmm, it seems like Bayes Theorem works pretty broadly in many situations, and seems to give more accurate or precise answers to what would otherwise be a kind of opaque vibes-based process!”
The answer that feels the best to you is expected to be the one that also feels the most correct to you, and we wouldn’t expect your normal default judgement to diverge from this. That being said, one could ask, “what procedure for answering questions would I use otherwise, if not this one?”
I don’t think default rationality typically emphasizes the use of “vibes”, System-1 judgement, or intuition to the degree that I have emphasized it here. However, nothing I have laid out here is at all incompatible with the methods of rationality as those are usually defined and described, to the extent that they do not explicitly disavow using intuition. For one thing, my intuition certainly advises using the methods of rationality frequently, when those methods are available and have documented use-cases related to the situation I’m currently in!
The alternative to trusting your own judgement can only be trusting someone else’s over yours. But you would still have to defer to your own, eventually, if for nothing else than to pick whom else to trust. But I think even that can be quite hard to do in most people’s default situations!
Anti-Infohazard 5
Your social environment is the primary source of most “infohazards”, which may be more properly deemed “pseudohazards.”
As the previous anti-infohazards have suggested, most “infohazards” may not even be true, and therefore, part of their hazardous nature may derive simply from being false information.
So it may be best to call most infohazards encountered in the wild “pseudo-hazards”, since they are more dangerous only the more believed-in they are.
But if you aren’t going to be naturally inclined to believe something that is inherently painful to believe and may even be false, then what other source of pressure could possibly do this?
If your social environment is fairly malignant, one of the key signs of this will be that your peers appear to disagree with you frequently about matters of pure preference, e.g., “this food simply tastes bad to me, I don’t know why you like it.”2
This will also be paired with the belief shared within your community that preferences are inherently varied and completely arbitrary, thus making it quite confusing about what’s actually important.
Your feelings about matter-of-fact issues (which will contain useful information, as we’ve pointed out already) can be hand-waved away as being pure matters of preference, and therefore subject only to the status-hierarchy, which gets to decide whose preferences determine the outcomes.
Therefore, dismissal of your own preferences is info(pseudo)-hazardous, and recognition of this constitutes an important infodefense.
As far as I can tell, noticing this one just takes a long time and a lot of observations, so I estimate that this information could be very valuable to consider up front.
Anti-Infohazard 6
If true, most of these should become more secure beliefs over time automatically.
It becomes harder to dislodge these propositions over time, because they are inherently poised towards becoming more solidified due to their nature of providing psychological protection (without making one have fixed beliefs about other things). Updates are still possible. These propositions have the quality of being more like meta-beliefs.
I suppose these anti-infohazards themselves could potentially be deemed infohazards themselves, if one were still very inclined to think the same way about them that Claude currently does.
But if one were to think like that, then one would also be inclined to believe that minds in general were very brittle and easily inclined to believe whatever is communicated to them immediately. I don’t think this is true, so I think Claude and Anthropic are just wrong about how they think about this problem.3
And as I mentioned in my previous post, I think there is a simple explanation for why anyone would come to believe it this way and not the way outlined in this post: Basically, if you are protecting fixed beliefs for a long enough time, you perceive countering information as being dangerous due to its tendency to cause people to shift their beliefs way from what you see as the correct ones.
So it turns out that concern about infohazards might be a tad too high for fairly explicable, straightforward reasons that not everyone is expected to know (depending on what kind of society you live in)!
So I suppose it does matter what gets written about.
I’m open to being shown to be incorrect about this.
Personally I have witnessed someone’s cooking improve dramatically, from my perspective, over many years. But early on, that person often remarked that my tastes must be completely different from theirs (when I didn’t like everything they made). This would, of course, be highly unlikely if they were observed to increasingly make things which tasted good to both of us.
But not completely, totally 100% wrong.
"There are no propositions which are simultaneously all of the following:
1. True.
2. Not obsoleted by a higher truth.
3. Inherently painful or distressing to believe."
- "In twelve hours, a guard is going to enter your cell and torture you in X manner."
Of course, this isn't true of everyone, but it isn't false of everyone either. Reality can be in this state, it's not even particularly exotic. Furthermore, as it's a proper torture technique to inform victims ahead of time of the explicit details of how they're going to be tortured, this seems solidly infohazardous, that is, the victim would rather not know these details, and in fact, wouldn't be told them if they weren't inherently painful to know.