February 26 2018 RRG Notes

Information Hazard

Concept defined by Nick Bostrom
Defined as: “a risk that arises from the dissemination or potential dissemination of (true) information that may cause harm or enable some agents to cause harm
Pointed out as a contrast to the generally accepted principle of information freedom
Possibility of information hazard needs to be considered when making information policies
Typology of information hazards
- By information transfer mode
  - Data hazard
  - Idea hazard
  - Attention hazard
  - Template hazard
  - Signaling hazard
  - Evocation hazard
- By effect
  - Adversarial risks
    - Enemy hazard
    - Competitiveness hazard
    - Commitment hazard
    - Knowing-too-much hazard
  - Risks to social organizations and markets
    - Norm hazard
    - Information asymmetry hazard
    - Unveiling hazard
    - Recognition hazard
  - Risks of irrationality and error
    - Ideological hazard
    - Distraction and template hazard
    - Role model hazard
    - Biasing hazard
    - De-biasing hazard
    - Neuropsychological hazard
    - Information burying hazard
  - Risk to valuable states and activities
    - Psychological reaction hazard
      - Disappointment hazard
      - Spoiler hazard
      - Mindset hazard
    - Belief-constituted value hazard
      - Embarrassment hazard
  - Risks from information technology systems
    - Information system hazard
      - Information infrastructure failure hazard
      - Information infrastructure misuse hazard
      - Artificial intelligence hazard
  - Risks from development
    - Development hazard

Information Hazards: A Typology of Potential Harms From Knowledge

Abstract

Information hazards are risks that arise from the potential dissemination of true information
May cause harm directly or may enable some agent to cause harm
This paper lays out a taxonomy of information hazards

Introduction

Society has a commonly held presumption in favor of knowledge, truth and the uncovering and dissemination of information
However, we do tolerate some special cases where ignorance is deliberately cultivated
- Security
- Innocence
- Impartiality
Not concerned with the harms posed by false information
Therefore, an information hazard can be defined as “A risk that arises from the dissemination or potential dissemination of (true) information that may cause harm or enable some agent to cause harm”
Relative to their significance, some classes of information hazard are unduly neglected
Seek to create a vocabulary that allows us to talk about easily overlooked risks
Create a catalog of some of the ways information can be harmful

Six Information Transfer Modes

Data hazard: specific data, such as the blueprints for a nuclear weapon, or the genome of a lethal pathogen that, if disseminated, create risk
Idea hazard: a general idea that, if disseminated, creates a risk even without a data-rich specification
- Example: the idea that nuclear fission might be used to create a weapon is an idea hazard, even though it’s not the blueprints for a nuclear bomb
- Demonstrations can be idea hazards, insofar as they show that certain things are possible
Attention hazard: the mere drawing of attention to some particular ideas can increase risk, even when the ideas are already “public”
- Adversary faces a search problem in finding ways to do harm
- Anything that makes this search task easier can be an infohazard
- Adversary may look at the specific things we defend against to discover what we’re most concerned about
Template hazard: the presentation of a template enables distinctive modes of information transfer, and thereby creates risk
- Risk of a “bad role model”
- Risk caused by implicit forms of information processing or organization structure
Signaling hazard: Verbal and non-verbal actions can indirectly transmit information about some hidden quality of the sender, creating risk
- Academics might adopt excessive formalism when working in fields known to attract crackpots, slowing the pace of discovery
Evocation Hazard: Risk that a particular mode of presentation can cause undesirable mental states or processes

Adversarial Risks

Enemy hazard: By obtaining information, (potential) enemy becomes stronger and increases the threat they pose
- Most salient in national security contexts
- Research can increase enemy hazard if we generate information that would be disproportionately valuable to enemies
- Rational strategy for military research should give weight to enemy hazard
- Intentionally slowing progress in researching military technologies can be beneficial if the rate of information leakage is proportional to the gap between your country and its adversaries
Competitiveness Hazard: Risk that by obtaining information, a competitor becomes stronger, thereby weakening our competitive position
- One person’s information can cause harm to others, even when no harm is intended
- Example: another person knows more about the firm and gets job you were applying for
Intellectual Property Hazard: Firm A faces the risk that some other firm B will obtain A’s intellectual property, thereby weakening A’s market position
- Special case of competitiveness hazard
- Firms go to great lengths to protect their intellectual property (e.g. patents, copyright, NDAs, etc)
Commitment hazard: Risk that obtaining some information will weaken one’s ability to commit to some course of action
- Example: blackmail: knowing that someone has potentially incriminating evidence about you weakens your ability to commit to oppose that person
- Example: Schelling’s probabilistic threats: it can be possible to raise the risk of conflict, even when you can’t credibly commit to conflict
Knowing-too-much hazard:
- If you have information that can potentially be used against someone else, that person becomes your adversary even if you don’t mean them harm
- The mere possession of knowledge can make you a target for those who wish to suppress the truth

Norm Hazard: some social norms depend upon a coordination of beliefs or expectations among many subjects; new information may disrupt those expectations for the worse
- Example: self-fulfilling prophecies - if sufficiently many people believe an event will happen, that can cause the event to happen
- Example: information cascades: agents watch agents in front of them make decisions, and alter their decisions accordingly - if the first agent makes a bad decision, this can bias subsequent agents into also making suboptimal choices
Information asymmetry hazard: when one party to a transaction has the potential to gain information others lack, market failure may result
- Example: “lemon markets” are dominated by suboptimal goods because of the information asymmetry between buyers and sellers
- Example: insurance and genetic testing
  - Buyers of genetic testing gain an information advantage over insurers
  - Leads to adverse selection spiral
Unveiling hazard: The function of some markets and support for some social policies depends on the existence of shared uncertainty, and the lifting of that uncertainty can undermine those markets and policies
- Example: insurance markets only work because neither the buyer of insurance nor the seller is certain that there will be a loss
- Example: Rawlsian political philosophy holds that social policies would be more fair if people chosen while people were ignorant about their station in life
Recognition hazard: some social fictions depend on shared knowledge not becoming openly acknowledged common knowledge

Risks of Irrationality and Error

Ideological hazard: An idea might, by entering into an ecology populated with other ideas, interact in ways which, in the context of extant institutional and social structures, produce a harmful outcome, even in the absence of any intention to harm
- Example: if you hold that you are obligated to do everything that Scripture S commands, and someone (truthfully) informs you that S requires you to drink sea water, you have been harmed by the interaction of a true fact with your ideological background
Distraction and temptation hazard: Information may harm us by distracting us or presenting us with temptations
- Humans are not perfectly rational, nor do they have perfect self-control
- In the future, virtual environments and informational hyper-stimuli might be as addicting as drugs
Role model hazard: we can be corrupted by long-term exposure to bad role models
- Even if we know a role model is bad, we can be influenced by it
- Subjective well-being and body-mass are influenced by peers
Biasing hazard: when we are already biased, we can be led further astray by true information that triggers those biases
Debiasing hazard: when biases have individual or social benefit, information that erodes those biases may cause harm
Neuropsychological hazard: Information might have negative effects on our psyches because of the way our brains are structured, even if that same information would not have any effect on more “idealized” cognitive architectures
- Example: images that trigger epileptic siezures
Information burying hazard: Irrelevant information can make relevant information harder to find, raising search costs for agents with limited computational resources
- Example: steganography

Risks To Valuable States And Activities

Psychological reaction hazard: Information can cause harm by causing sadness, disappointment or some other reaction in the receiver
- Disappointment hazard: Emotional well-being can be adversely affected by hearing bad news
- Spoiler hazard: Fun that depends on ignorance and suspense is at risk of being destroyed by premature disclosure of truth
- Mindset hazard: Our basic attitude or mindset might change in undesirable ways as a result of our exposure to information of certain kinds
Belief constituted value hazard: if some component of our well-being depends on epistemic or attentional states, then information that affects those states may directly impact our well-being
- Distinct from psychological reactions
- We may hold that it’s valuable for someone to hear bad news, even though it will cause a psychological reaction because their well-being, broadly constructed, depends on them holding a true view of the world
- Alternatively, we might hold that there is some information that constitutes a negative contribution to well-being
  - Information that causes loss of innocence
  - Privacy
  - Desire to think of others in an appropriate manner (i.e. no TMI)
Embarrassment hazard: We may suffer psychological or reputational damage as a result of embarrassing facts about ourselves being disclosed
- Combines elements of psychological reaction hazard, belief constituted value hazard, and competitiveness hazard
- Self-esteem is not a wholly private matter, but is also a social signal that influences others’ opinions of us
- Risk of embarrassment can suppress frank discussion
- Can force individuals and organizations to remain committed to harmful courses of action, in order to avoid the embarrassment of admitting error

Risks From Information Technology Systems

Information system hazard: The behavior of some (non-human) information system can be adversely affected by some informational inputs or system interactions
- Information infrastructure failure hazard: the risk that some information system will malfunction, either accidentally or as a result of cyber attack; and, as a result, the owners or users of the system may be harmed or inconvenienced, or third parties whose welfare depends on the system may be harmed, or the malfunction may propagate through some dependent network, causing a wider disturbance
- Information infrastructure misues hazard: Risk that some information system, while functioning according to specifications, will service some harmful purpose
  - Example: government or private databases that collect large amounts of data on individuals
- Robot hazard: Risks that derive substantially from the physical capabilities of a robot system
  - Example: software on an armed Predator drone getting hacked or malfunctioning
- Artificial intelligence hazard: Computer related risks in which the threat derives primarily from the cognitive sophistication of the program, rather than the specific properties of the any actuators to which the system initially has access
  - Superintelligent AI may be able to talk or hack its way out of any restrictions placed on it
  - Threat posed by superintelligent AI may have more to do with its cognitive capabilities and goal structure than it on the physical capabilities with which it is initially endowed

Risks From Development

Development Hazard: Progress in some fields of knowledge can lead to enhanced technological, organizational, or economic capabilities, which can produce negative consequences
- Given the example of the Manhattan Project, it is no longer morally viable to proceed with research without thinking about its potential consequences
- The broad and interdisciplinary nature of modern science means that even innocuous looking advances may have implications for development hazard

Discussion

The catalog of information hazards above can help inform our choices by highlighting the sometimes subtle ways in which even true information can be harmful
In many cases, the best response to an infohazard is no response
- Benefits of information so outweigh the infohazard, that we still underinvest in information gathering
- Ignorance carries dangers that are as large as or larger than the dangers of knowledge
Mitigation of information hazards need not rely on suppression
- Invest less in certain areas of research
Sometime information hazards are caused by partial information, so the way to solve them is to get more information, not less
Discussion of information hazards can itself be an information hazard (norm hazard) if it undermines norms of truth seeking and truth reporting

The Hazard of Concealing Risk

Man Made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility
Hiding information prior to disasters contributed to making them possible and hindered rescue and recovery
In all cases, there was concealment going on at multiple levels
5 major clusters of information concealment
- External environment enticing concealment
- Risk communication channels blocked
- Internal ecology stimulating concealment or ignorance
- Faulty risk assessment and knowledge management
- People having incentives to conceal
Systemic problem - one or two of these factors can be counteracted by good risk management, but when you get more, the causes become more difficult to deal with
Once risks are hidden, it becomes much more difficult to manage them
Risk concealment can be counteracted
Some technologies that show signs of risk concealment
- Shale energy
- GMOs
- Debt and liabilities of US and Chinese economies
Patterns of concealment don’t predict imminent disaster, but would make a disaster worse if one should occur
No evidence that cited disasters were especially atypical from a concealment perspective - more likely explanation is that those are the instances in which people didn’t “get away with it”
Book is an important rejoinder to the concept of information hazards
Ignorance can be riskier than information
Institutional secrecy is designed to manage and contain information hazards, but can compartmentalize and blocks information flows regarding risk
A proper information hazard strategy needs to account for concealment risk

The Wonderful Thing About Triggers

Trigger warnings are the opposite of censorship
- Censorship tries to restrict what you read
- Trigger warnings allow you to read what you want
We should give people relevant information and allow them to make their own decisions
Analogy with book titles
- We print the titles of books on the outside so that people know approximately what the book is about
- We care more about trusting people’s judgement than we do about denying people the ability to avoid things they don’t want to read
Trigger warnings allow us to fight censorship by allowing those who engage with our ideas to do so with the full knowledge that they might be offended
People can misuse trigger warnings to avoid uncomfortable ideas, but this is a problem that occurs any time you give people more information
Do we, as a civilization, force people to be virtuous without their consent?
The strongest counterargument to trigger warnings is that they increase politicization
- Colleges put trigger warnings on everything that might offend liberals, but don’t put trigger warnings on materials that might offend conservatives
- Solution: put trigger warnings in small print on the page with publisher and copyright information, that everyone skips over
- Trigger warnings can be helpful, if used in good faith
Scott strongly disagrees with the argument that trigger warnings should be avoided to force people with PTSD to confront arguments that might trigger them
- You do not give psychotherapy to people without their consent
- Even if you can argue that people consented, triggers should be confronted at a time and a place of the person’s choosing, not randomly

Roko’s Basilisk

Introduction
- Roko’s basilisk is a thought experiment proposed by Roko on the Less Wrong forum
- Used ideas in decision theory to argue that a sufficiently powerful AI would have an incentive to torture anyone who imagined the agent and then did not immediately set about bringing the agent into existence
- Called a basilisk because hearing the argument would put you at risk of being tortured by this hypothetical agent
- Argument was broadly rejected on Less Wrong - AI agent would have no incentive to follow through on threats
  - Once the agent is in existence, the probability of its existence is certain
  - Torturing people at that point would be a waste of resources
- Discussion of Roko’s Basilisk was banned as part of a general site policy against spreading potential information hazards
- Ban had the opposite of the intended effect
- People assumed that the ban was because Less Wrong users accepted the argument
Background
- Roko’s argument ties together Necomblike problems in decision theory with normative uncertainty in moral philosophy
- Newcomblike-problem: Prisoner’s Dilemma
- Causal Decision Theory (CDT) endorses always defecting in the prisoner’s dilemma, even though this would lead to both agents defecting
- Eliezer Yudkowsky proposed an alternative to CDT, Timeless Decision Theory (TDT), which can achieve cooperation in prisoners dilemmas, provided both agents are running TDT
- Eliezer also created a concept called “Coherent Extrapolated Volition” - a hypothetical algorithm that could autonomously pursue human goals in a way compatible with moral progress
- Roko’s post was an attempt to use TDT to argue against Coherent Extrapolated Volition
Roko’s Post
- If two TDT agents are separated from each other in time, rather than space, then the later agent can “blackmail” the prior agent
- Since both agents have perfect access to each others’ source code, the later agent can credibly promise to hurt the prior agent if the prior agent does not leave it a large sum of money
- The later agent can do this, even though it doesn’t physically exist yet, just the source code is enough
- Roko proposed that a highly moral AI agent would want to be created as soon as possible
- Such an agent would use acausal blackmail to give humans a stronger incentive to create it
- The agent would specifically target people who had read this argument, since they would have a better chance of simulating the future actions of this agent
- Conclusion: any AI agent that reasons like a utilitarian optimizing for humanity’s CEV would be paradoxically detrimental to those values
- Response from Eliezer
  - The AI agent would gain nothing from following through on its threat, because it would be wasting resources punishing humanity for a decision that had already taken place
  - So, given that, why should we believe the Basilisk’s threat?
Topic moderation and response
- Yudkowsky deleted Roko’s post and the ensuing discussion
- Rejected the idea that a basilisk could be considered a friendly AI in any way, noting that even threatened torture would be contrary to humanity’s CEV
- Deletion and apparently strong response to the basilisk post caused others to assume that Less Wrong users took the threat of the basilisk seriously
- Deletion prevented people from seeing the original argument, leading to a wealth of secondhand, distorted interpretations
- Eliezer claims to have deleted the post not because he thought the Basilisk was an infohazard, but because he thought that some unknown variant might be
Big picture questions
- Blackmail resistant decision theories
  - The general ability to cooperate in prisoners dilemmas appears to be useful
  - At the same time, this appears to create opportunities for blackmail
  - It appears that the best way to defeat this blackmail is to pre-commit to never give in to the blackmailer’s demands, even when there are short-term advantages to doing so
  - Universal Decision Theory (UDT), a refinement derived by Wei Dai, may be more blackmail resistant than TDT
- Information hazards
  - Roko’s basilisk suggests that taboo information spreads more rapidly
  - Although Roko’s basilisk was not harmful, a real infohazard may spread in the same way
  - Nonspecialists spread the idea of Roko’s basilisk without first investigating the risks or the benefits in any serious way
  - Someone in possession of a real infohazard should exercise caution in visibly suppressing it
- “Weirdness points”
  - Talking about too many nonstandard ideas makes it less likely that any one of those ideas will be taken seriously
  - On the other hand, promoting weird ideas can help form a community of like minded people

Information Hazard

Information Hazards: A Typology of Potential Harms From Knowledge

Abstract

Introduction

Six Information Transfer Modes

Adversarial Risks

Risks To Social Organizations And Markets

Risks of Irrationality and Error

Risks To Valuable States And Activities

Risks From Information Technology Systems

Risks From Development

Discussion

The Hazard of Concealing Risk

The Wonderful Thing About Triggers

Roko’s Basilisk