Artificial Intelligence as a Positive and Negative Factor in Global Risk

  • Introduction
    • AI is deceptively difficult to understand
    • The embarrasment and overpromises of AI research stems from people imaginging that the understand AI when they really don’t
    • AI is the most difficult to discuss X-Risk
    • No settled science (as of January 2006)
    • The difficult of thinking about AI X-Risk makes it more important to be aware of cognitive biases
  • Anthropomorphic Bias
    • Humans are far more psychologically similar than they are different
    • Humans are naturally good at modeling other humans
    • Even when people are told that AIs are not humanlike, they still anthropomorphize the AI
    • The width of mind design space
      • Generalizing from the above, we find that all life on Earth has certain similarities due to common ancestry
      • AI systems are not bound by the restrictions of natural selection or the common ancestry of life
  • Prediction and Design
    • If AIs aren’t bound by natural selection or other natural restrictions on life, how can we tell what an AI will be like?
    • AIs (at least the initial versions of such) will be designed by humans
    • One disaster scenario is that researchers launch a superintelligent AI without know that it’s a superintelligent AI
      • Accretion of “simple” optimization algorithms combining to form a system that is smarter than humans
    • People think that programs written without ill intent will automatically be friendly
    • How do we know that an AI will be friendly or even comprehensible when it was not designed in the same evolutionary environment that resulted in our minds?
  • Underestimating the power of intelligence
    • We overexaggerate differences in human intelligence rather than similarities
    • The difference between you and Einstein is miniscule compared to the differnece between you and an ameoba
    • Not all intelligence boils down to Spearman’s g - charisma is a form of intelligence too
    • Intelligence is what allowed humans to dominate all other species
    • The peril of artificial intelligence comes from the fact that intelligence magnifies even small value misalignments
  • Capability and Motive
    • Just because an AI can do something does not mean that the AI will do that thing
    • The capabilities of AI are orthogonal to motives
    • Optimization processes
      • Separation of capability and motive is an anthropomorphic distinction [unclear - may be misinterpreting here]
    • Aiming at the target
      • It’s a fallacy to ask what AIs will want - wanting is a feture of the human brain
      • Instead we should treat AIs as blind optimization processes and make sure that the targets they’re optimizing towards are compatible with human values
      • Most optimization processes are unfriendly - nature is red in tooth and claw
  • Friendly AI
    • Friendly AI (FAI) can be thought of as a powerful optimization process with a particular target
    • Some argue that FAI is impossible because any powerful AI will be able to self-modify to overcome friendliness restrictions
    • But what would drive an AI to break those limits?
    • It would be like taking a pill that would turn you into a murderer
  • Technical and Philosophical Failures
    • An existential catastrophe is the destruction of earth-originated intelligent life or some portion of its long-term potential
    • 2 categories of FAI failure: technical and philosophical
    • Technical failure: building an AI that doesn’t work as it should
    • Philosophical failure: building an AI that has the wrong goals
    • Failure modes are not mutually exclusive
    • Example of philosophical failure: communism
      • Rather than implementing an AI to carry out a particular means, we should work on specifying a good end and let the AI figure out the best means to reach that end
      • Let the AI jump to a higher level of abstraction and know why decisions were programmed in, so that it can update itself if it finds a more optimal path
    • Example of technical failure: tank detection algorithm
      • System picked up on wrong distinguishing factors of test data
      • System worked in one context, and unexpectedly failed in a different context
      • A superintelligent AI needs to figure out what we mean and do that, rather than doing what we say
  • Rates of Intelligence Increase
    • AI might increase in intelligence very quickly
    • Recursive self-improvement
    • Humans can self-improve to a limited degree, but we cannot fundamentally change our neural architecture
    • We should not confuse the speed of AI research with the speed of an AI once built
      • AI might be like a nuclear reactor - once it goes critical, it gains an enormous amount of power in a very short period of time
    • We need to solve the friendly AI problem before we create superintelligent AI
    • We may not have any warnings of AI before we create a superintelligent system
    • We cannot assume that we will be able to monitor an AI against its will - the only sure protection is to make an AI whose goals specifically do not lead to it hurting you
  • Hardware
    • People like to track hardware as a proxy for AI because tracking hardware is easy
    • What we should be tracking is our understanding of intelligence
    • More powerful hardware means that less understanding of intelligence is required to make a superintelligent AI
    • Limit case: evolution, with zero understanding, produced intelligence in 4 billion years
  • Threats and Promises
    • 3 metaphors for AI
      • G-factor: AI is like a really smart human
      • History metaphors: AI is to us, what we would be to a medieval civilization
      • Species metaphors: AI is to us what we are to dogs, or ants
    • G-factor metaphors are most common in popular culture, since they give humanity the best chance of fighting back and overcoming the AI
    • It is a failure of imagination to say that AI will be limited by the speed with which it can affect the world
    • The first goal of a runaway AI would be to obtain the ability to manipulate the world as fast as it can think
  • Local and Majoritarian Strategies
    • How do we classify AI risk mitigation strategies?
      • Unanimous cooperation: strategy that requires everyone to perform or abstain from a particular action - can be defeated by individuals or small groups
      • Majority action - requires most but not all actors to behave in a certain way
      • Local action - requires a small concentration of will, funding and talent
    • Unanimous strategies are obviously unworkable
    • Majoritarian strategies are workable with enough lead time and effort
      • Requires years or decades of lead time to change opinion and set policy
    • Local strategies are the most practical
    • Majoritarian strategy assumptions:
      • A majority of friendly AIs can protect against a minority of unfriendly AIs
      • The first AI cannot do catastrophic damage
    • Local strategy assumptions
      • A single friendly AI plus human institutions can fend off any number of unfriendly AIs
      • The first AI cannot do catastrophic damage
    • It may not be safe to assume that the first AI will be incapable of catastrophic damage
    • First mover effect with AI
      • First unfriendly AI can wipe out humanity
      • First friendly AI can protect humanity against unfriendly AIs
    • Therefore it is necessary and sufficient that the first superintelligent AI be friendly
  • AI vs Human Intelligence Enhancement
    • Something will end up becoming smarter than baseline humans
      • AIs
      • Uploaded humans
    • Is uploading a mind possible?
      • Requires massively more computing power than a human brain
      • Need sophisticated molecular nanotechnology to scan a brain at the necessary resolution
    • Is upgrading an uploaded human mind possible?
      • Need to have a detailed understanding of the human brain at both high and low levels of abstraction
      • Human brains are not designed to be upgraded
      • Maybe any significant upgrade would cause the resulting psyche to be no longer recognizably human
    • Unlikely that we’ll figure out how to upload brains before developing superintelligent AI
    • Moreover, upgrading human intelligence to superintelligence poses many of the same safety risks as making a superintelligent AI
  • Interaction between AI and other technologies
    • AIs have the potential to speed up many technologies
    • Friendly AI should precede breakthroughs in “dangerous” technologies, like nanotechnology
  • Making progress on Friendly AI
    • The AI research community (as of 2006) does not seem concerned with safety
    • The result of AI’s repeated failures to deliver human level intelligence means that many researchers no longer believe that their techniques are powerful enough to be dangerous
    • As a result, current AI technologies, such as neural networks and evolutionary programming are very difficult to inspect for safety
    • Another problem is that the knowledge to construct friendly AIs is scattered across disciplines and few are working to bring it all together
  • Conclusion
    • The current level of civilization is a dynamic equilibrium
    • If we want to survive for another million years, we need to drive all existential risks to a level indistinguishable from zero
    • While we are the first general intelligences, we will not be the last
    • We need to make sure tha the intelligences that succeed us do us no harm