- Introduction
- AI is deceptively difficult to understand
- The embarrasment and overpromises of AI research stems from people imaginging that the understand AI when they really don’t
- AI is the most difficult to discuss X-Risk
- No settled science (as of January 2006)
- The difficult of thinking about AI X-Risk makes it more important to be aware of cognitive biases
- Anthropomorphic Bias
- Humans are far more psychologically similar than they are different
- Humans are naturally good at modeling other humans
- Even when people are told that AIs are not humanlike, they still anthropomorphize the AI
- The width of mind design space
- Generalizing from the above, we find that all life on Earth has certain similarities due to common ancestry
- AI systems are not bound by the restrictions of natural selection or the common ancestry of life
- Prediction and Design
- If AIs aren’t bound by natural selection or other natural restrictions on life, how can we tell what an AI will be like?
- AIs (at least the initial versions of such) will be designed by humans
- One disaster scenario is that researchers launch a superintelligent AI without know that it’s a superintelligent AI
- Accretion of “simple” optimization algorithms combining to form a system that is smarter than humans
- People think that programs written without ill intent will automatically be friendly
- How do we know that an AI will be friendly or even comprehensible when it was not designed in the same evolutionary environment that resulted in our minds?
- Underestimating the power of intelligence
- We overexaggerate differences in human intelligence rather than similarities
- The difference between you and Einstein is miniscule compared to the differnece between you and an ameoba
- Not all intelligence boils down to Spearman’s g - charisma is a form of intelligence too
- Intelligence is what allowed humans to dominate all other species
- The peril of artificial intelligence comes from the fact that intelligence magnifies even small value misalignments
- Capability and Motive
- Just because an AI can do something does not mean that the AI will do that thing
- The capabilities of AI are orthogonal to motives
- Optimization processes
- Separation of capability and motive is an anthropomorphic distinction [unclear - may be misinterpreting here]
- Aiming at the target
- It’s a fallacy to ask what AIs will want - wanting is a feture of the human brain
- Instead we should treat AIs as blind optimization processes and make sure that the targets they’re optimizing towards are compatible with human values
- Most optimization processes are unfriendly - nature is red in tooth and claw
- Friendly AI
- Friendly AI (FAI) can be thought of as a powerful optimization process with a particular target
- Some argue that FAI is impossible because any powerful AI will be able to self-modify to overcome friendliness restrictions
- But what would drive an AI to break those limits?
- It would be like taking a pill that would turn you into a murderer
- Technical and Philosophical Failures
- An existential catastrophe is the destruction of earth-originated intelligent life or some portion of its long-term potential
- 2 categories of FAI failure: technical and philosophical
- Technical failure: building an AI that doesn’t work as it should
- Philosophical failure: building an AI that has the wrong goals
- Failure modes are not mutually exclusive
- Example of philosophical failure: communism
- Rather than implementing an AI to carry out a particular means, we should work on specifying a good end and let the AI figure out the best means to reach that end
- Let the AI jump to a higher level of abstraction and know why decisions were programmed in, so that it can update itself if it finds a more optimal path
- Example of technical failure: tank detection algorithm
- System picked up on wrong distinguishing factors of test data
- System worked in one context, and unexpectedly failed in a different context
- A superintelligent AI needs to figure out what we mean and do that, rather than doing what we say
- Rates of Intelligence Increase
- AI might increase in intelligence very quickly
- Recursive self-improvement
- Humans can self-improve to a limited degree, but we cannot fundamentally change our neural architecture
- We should not confuse the speed of AI research with the speed of an AI once built
- AI might be like a nuclear reactor - once it goes critical, it gains an enormous amount of power in a very short period of time
- We need to solve the friendly AI problem before we create superintelligent AI
- We may not have any warnings of AI before we create a superintelligent system
- We cannot assume that we will be able to monitor an AI against its will - the only sure protection is to make an AI whose goals specifically do not lead to it hurting you
- Hardware
- People like to track hardware as a proxy for AI because tracking hardware is easy
- What we should be tracking is our understanding of intelligence
- More powerful hardware means that less understanding of intelligence is required to make a superintelligent AI
- Limit case: evolution, with zero understanding, produced intelligence in 4 billion years
- Threats and Promises
- 3 metaphors for AI
- G-factor: AI is like a really smart human
- History metaphors: AI is to us, what we would be to a medieval civilization
- Species metaphors: AI is to us what we are to dogs, or ants
- G-factor metaphors are most common in popular culture, since they give humanity the best chance of fighting back and overcoming the AI
- It is a failure of imagination to say that AI will be limited by the speed with which it can affect the world
- The first goal of a runaway AI would be to obtain the ability to manipulate the world as fast as it can think
- Local and Majoritarian Strategies
- How do we classify AI risk mitigation strategies?
- Unanimous cooperation: strategy that requires everyone to perform or abstain from a particular action - can be defeated by individuals or small groups
- Majority action - requires most but not all actors to behave in a certain way
- Local action - requires a small concentration of will, funding and talent
- Unanimous strategies are obviously unworkable
- Majoritarian strategies are workable with enough lead time and effort
- Requires years or decades of lead time to change opinion and set policy
- Local strategies are the most practical
- Majoritarian strategy assumptions:
- A majority of friendly AIs can protect against a minority of unfriendly AIs
- The first AI cannot do catastrophic damage
- Local strategy assumptions
- A single friendly AI plus human institutions can fend off any number of unfriendly AIs
- The first AI cannot do catastrophic damage
- It may not be safe to assume that the first AI will be incapable of catastrophic damage
- First mover effect with AI
- First unfriendly AI can wipe out humanity
- First friendly AI can protect humanity against unfriendly AIs
- Therefore it is necessary and sufficient that the first superintelligent AI be friendly
- AI vs Human Intelligence Enhancement
- Something will end up becoming smarter than baseline humans
- Is uploading a mind possible?
- Requires massively more computing power than a human brain
- Need sophisticated molecular nanotechnology to scan a brain at the necessary resolution
- Is upgrading an uploaded human mind possible?
- Need to have a detailed understanding of the human brain at both high and low levels of abstraction
- Human brains are not designed to be upgraded
- Maybe any significant upgrade would cause the resulting psyche to be no longer recognizably human
- Unlikely that we’ll figure out how to upload brains before developing superintelligent AI
- Moreover, upgrading human intelligence to superintelligence poses many of the same safety risks as making a superintelligent AI
- Interaction between AI and other technologies
- AIs have the potential to speed up many technologies
- Friendly AI should precede breakthroughs in “dangerous” technologies, like nanotechnology
- Making progress on Friendly AI
- The AI research community (as of 2006) does not seem concerned with safety
- The result of AI’s repeated failures to deliver human level intelligence means that many researchers no longer believe that their techniques are powerful enough to be dangerous
- As a result, current AI technologies, such as neural networks and evolutionary programming are very difficult to inspect for safety
- Another problem is that the knowledge to construct friendly AIs is scattered across disciplines and few are working to bring it all together
- Conclusion
- The current level of civilization is a dynamic equilibrium
- If we want to survive for another million years, we need to drive all existential risks to a level indistinguishable from zero
- While we are the first general intelligences, we will not be the last
- We need to make sure tha the intelligences that succeed us do us no harm