February 18 2019 RRG Notes

Prospecting For Gold

Gold, in this metaphor is a proxy for whatever we truly value
We can notice that some people accomplish more of what we altruistically value than others
How can we go and find opportunities like those systematically?
Techniques for finding gold
- Why are we using “gold” as a metaphor?
- Put the focus on means, rather than ends
- Replace big, complex values with something simple that we can maximize
Gold us unevenly spread
- Value, like gold, is unevenly spread
- We should work to find the areas where there’s more value
Heavy-tailed distributions
- If I want to find the average height, I can sample a few people and get a good idea of what the average is
- However, if I want an estimate of how much gold there is in the world, sampling a few places isn’t going to give me a good idea
  - Likely that none of the places will have gold, so my estimate will be low
  - On the other hand, I might get lucky and sample a gold mine, making my estimate unreasonably high
- Gold follows a heavy-tailed distribution
  - Most places have no gold
  - A few places have a lot of gold
  - There’s a long tail, where the probabilities aren’t dying off very fast – even massive amounts of gold have non-negligible probability
- In the case of heavy-tailed distributions, most of the value is concentrated in a few places
- More important to get to the right places than to get to all the places
- We know that actual gold follows a heavy-tailed distribution – is it the same for opportunities to do good?
  - Heavy-tailed distributions are pervasive in the world
  - Seem to arise naturally in complex systems with lots of interactions
Heavy-tailed property in opportunities for good
- We can look at data on developing world health interventions
- We see a distribution here the most effective intervention is roughly 10,000 times more effective than the least effective intervention
- Knowing that the distribution of effectiveness of interventions is heavy-tailed gives us some counterintuitive notions about the value of interventions
- For example, a 90th percentile intervention no longer looks as good – in a heavy-tailed distribution, most of the value is in the 99th percentile
- Naive empiricism doesn’t work
- You don’t have enough time or resources to randomly try things and hope to find the thing that’s 10,000 times better than the lowest performing intervention
To maximize gold, want…
- If we want to extract the most gold, we need:
  - A place where there is lots of gold
  - The right tools for extracting the gold
  - The right people for using those tools
- This analogizes directly to altruism
  - Measure effectiveness of cause area (find the place where we can have an outsized impact)
  - Measure effectiveness of intervention (find the intervention that will realize that impact)
  - Measure the ability of the team or organization implementing the intervention (find the people who can implement the intervention with maximum efficiency)
Value is roughly multiplicative
- Value is the product of (effectiveness of cause area) x (effectiveness of intervention) x (ability of the team)
- If we find a good team working in an area where the maximum impact is limited, it might make sense not to support that team
- Similarly, if we find an inefficient team working in a highly impactful area, it might make sense to not support that team and instead encourage other, more efficient teams to start working in that area
Recognizing gold
- A nice property of real gold is that when you dig it up, you can pretty easily determine whether it’s real
- Altruistic value isn’t the same – often have to infer the presence of value by using other tools
Running out of easy gold
- Real gold mining runs into the problem of diminishing returns
- EA interventions have the same issue in a lot of cases
  - Now that the Gates Foundation is funding mass vaccinations, additional funding for vaccinations isn’t going to be as cost-effective
  - The 101st book on AI X-Risk isn’t going to be as impactful as the first book
How do we find the right cause areas?
- Scale: All else being equal, we want to go to places where there is a lot of good that can be done
- Tractability: we want to go to places where we can make more progress per unit of work
- Uncrowdedness (neglectedness): We want to go to an area where value is reasonably easy to find
- Unfortunately, we won’t find a place that has high-scale, high tractability and low neglectedness
- So how do we trade off among the three?
- Look at the marginal value of the next unit of effort
- Multiply the value of the next little bit of the solution, the amount of progress that additional resources brings and the how much your contribution represents as part of the total contribution
- Thinking about marginal values quantitatively is a more precise version of the scale/tractability/neglectedness framework that people have been using for years
- Applying this framework:
  - Helping a bee: fails the scale test – ultimately an individual bee isn’t that important and no matter how matter how much you help it, you’ve only helped one bee
  - Perpetual motion machine research: fails the tractability test – perpetual motion would be fantastic to have, but unless you have a breakthrough that upends physics as we know it, there’re no avenues to make it happen
  - Climate change: massive scale, very tractable, but it’s a huge cause area that has received massive amounts of attention for decades and has millions of people working on it – it’s not clear that adding another dollar or year of effort to the pile will accomplish anything at the margin
Absolute and marginal priority
- Given two areas which both satisfy the scale/tractability/neglectedness framework, we have to decide where the next dollar of spending or the next hour of labor should go
- As individuals and small groups, we should think in terms of marginal impact – how much good will our additional dollar or hour do if spent on this cause area
- As societies, we should think of absolute impact – how much spending should there be in order to completely extract the value from a cause area?
Long-term gold
- Oftentimes there are technologies that unlock value in the short run at the cost of destroying some value in the process
- There are other technologies which operate more slowly but which are more efficient and allow you to extract more value in the long run
- Many philosophers, like Nick Bostrom argue that we should improve our decision-making skills as a society before developing technologies that threaten the long-term viability of civilization
Working together
- EA is fortunate in that most people who are EAs have pretty similar values
- Widespread agreement on criteria for evaluating goals
- We need to make sure that we’re getting people to go where they can do the most good
Comparative advantage
- Don’t just focus on where you’re absolutely the best, focus on where you’re the best relative to other people
- Maybe the most effective thing for you to do is the second best thing you’re good at, because there’s someone else who can also do the thing that you’re the best at but no one who can do the thing you’re second best at
Comparative advantage at multiple levels
- Comparative advantage applies at the group level as well as the individual level
- Another thing that we need to consider is temporal comparative advantage – what problems are we best suited to solve in 2019
Good local norms
- Need good norms to ensure the spread of good ideas
- Pay attention to why we believe things
  - Do you believe things because they’re things you’ve been told or things you’ve worked out for yourself
  - Not that working things out for yourself is necessarily better – you can make mistakes
  - Knowing why you believe something is important because it lets you communicate that knowledge to others
- Shortening the chain
  - Go back to original sources
  - When someone tells you that they’ve read a claim somewhere, go back to the original source and check it out
  - Having verification that the original source supports a claim makes you more confident in the claim
- Disagreement is an opportunity to learn
  - When you find yourself talking to someone whose point of view is unlikely to be correct, try to figure out how they came to that point of view
  - Not only is it polite, it also helps you build a clearer picture of the evidence that you do have
Retrospective: evidence for claims
- Heavy tailed distributions:
  - The fact that many distributions are heavy-tailed is a well-established property
  - “Heavy-tailedness” isn’t a binary property – whole continuum of distributions from standard Gaussian to extremely heavy-tailed
- Digression: altruistic market efficiency
  - One thing that often comes up in financial markets is that people come up with a wide variety of strategies, find out that most of them don’t work, and then rapidly converge on the strategies that are effective
  - Efficient allocation of resources makes the distribution of returns less heavy-tailed
  - EA, by building better metrics and feedback mechanisms is attempting to bring this property from financial markets over to altruism
- Factoring cost effectiveness
  - This is a simple point – not really space for it to be wrong
  - Might be more variation on some axes than others
- Diminishing returns
  - Some areas have diminishing returns but other places exhibit increasing returns to scale
  - Increasing returns to scale probably apply more at the organization level than at the domain level
- Scale, tractability, neglectedness
  - Obvious that all three of these matter
  - Factorization is obviously correct
  - Does the factorization break up our notion of effectiveness into things that are easier to measure?
  - Matches up with people’s informal heuristics, so it’s probably good
- Absolute and marginal priorities
  - Also a fairly trivial point
  - Easy to understand that some things that need more spending overall won’t benefit that much from my additional dollar
- Differential progress (i.e. tradeoff between “fast” technologies that destroy value and “slow” technologies that have less progress, but are more efficient)
  - Argument has appeared in some academic papers
  - However, it is counterintuitive, so we should give it more scrutiny
- Comparative advantage
  - Standard idea from economics
  - The new thing is that we’re adding a time component to comparative advantage which isn’t there in the original formulation
- Aggregating knowledge
  - We all want better ways of aggregating knowledge and agree that our current ways of aggregating knowledge suffer from shortcomings
  - The question is whether EA can actually invent better ways of aggregating knowledge
- Stating reasons for beliefs
  - Another common-sense thing
  - There are costs to giving reasons for beliefs in addition to the beliefs themselves
    - Slows down communication
    - Makes the community more off-putting for newcomers
  - However, these disadvantages are small compared to the advantage of having better epistemology
Conclusion
- We need to be careful about aiming at the right goals
- We need to spread broadly our knowledge about finding the right goals
- It’s important that we think about these things now, when the community is still young and norms can still be easily changed

How to compare different global problems in terms of impact

How do you figure out which area is most effective to work on?
Framework:
- Scale
- Neglectedness
- Solvability
- Personal fit
How we define the factors
- Estimating the amount of good we can do is difficult, so we break it down into pieces that are hopefully easier to estimate individually
- Scale: (good done / % of problem solved)
- Solvability: (% of problem solved / % increase in resources)
- Neglectedness: (% increase in resources / additional dollar or person working)
- The nice thing about breaking it down this way is that if you multiply scale x solvability x neglectedness you get (good done) / (extra person or $)
- Finally, add a factor for suitability, when attempting you decide which problems you want to personally work on
Define the problem carefully
- Make sure you have a clear definition of the scope of the problems you’re investigating
- Note that narrowly described problems tend to look better than broad problems
- Be careful because problems can be made to look better or worse by altering their definitions
Creating a (logarithmic) scale
- There are often huge differences between cause areas on the metrics listed above
- Using a logarithm allows us to efficiently compare causes that have huge differences
- Allows us to add logarithms rather than multiplying large numbers
- When comparing cost-effectiveness of various interventions, look at the differences of their log scores
How to assess scale
- Definition: if we solved this problem, how much would the world improve?
- Can be measured by its effect on well-being
- Can be increased by:
  - Affecting more people
  - Improving the life of each affected person more
- If you have different values, you can plug them into your definition of “scale”
- Measuring scale
  - Difficult, especially when considering long-term and indirect effects
  - Need to come up with “yardsticks”, such as GDP or QALYs when making comparisons between problems
- Most difficult when comparing across yardsticks – e.g. what is the comparison between reducing existential risk and curing malaria?
How to assess neglectedness
- How many people or dollars are being currently allocated to the problem?
- Why is it important?
  - Often, when a lot of resource have been allocated to a problem, you’ll hit diminishing returns
  - Example: mass vaccination is an effective intervention, but governments have already invested a lot of money into mass vaccination
  - Neglectedness also allows us to find new “low-hanging fruit” – if a problem hasn’t been extensively investigated, it might turn out to be easier than anticipated to solve
- How to assess it
  - Challenge: measuring direct vs. indirect effort
    - Even if there isn’t a lot of money being spent directly on the problem, there’s often a lot of money being spent on related fields which have spillover effects
    - Example: not a lot of money being spent on anti-aging research, but there’s a lot of money being spent on biomedical research which indirectly benefits anti-aging research
    - Indirect efforts are often difficult to measure and score, which is why we don’t consider them when thinking about neglectedness – they’re handled in the solvability calculation
  - Rather than trying to assess neglectedness directly, you can think about questions like
    - Why hasn’t this already been addressed by markets and/or governments
    - Is this a new field or a field that lies between two existing disciplines (for research)
    - If you don’t work on this problem, how likely is it that someone else will step in to work on the problem
    - If you work on this problem, will you learn more about how pressing it is in comparison to other problems?
- It’s important to assess scale and neglectedness together
- We care about the ratio of scale to neglectedness
- If several kinds of resources are required to solve a problem, assess neglectedness by the lowest value among the different kinds of input
How to assess solvability
- Definition: if we doubled the amount of direct effort on this problem, how much more the problem would we expect to solve
- Why is it important?
  - Even if a problem is highly important and very neglected, it might be neglected because there’s not much we can do about it
  - Aging is huge in scale and highly neglected, but direct research is neglected because it’s believed to be hard to solve
- How to assess it
  - Are there cost effective interventions for making progress with rigorous evidence behind them?
  - Are there promising but unproven interventions which can be cheaply tested?
  - Are there theoretical arguments that progress should be possible?
  - Are there interventions that could make a huge intervention, even if they’re unlikely to work?
- Find the best interventions and evaluate them based on:
  - Potential of upside
  - Likelihood of upside
- Use a Bayesian approach with the prior that any given intervention isn’t likely to be effective
- Challenges in assessment
  - Solvability is the most difficult to assess because it involves predicting the future
  - In some cases, we can use the cost-effectiveness of existing techniques
  - In other cases, we have to use judgment calls
  - Use an “expected value” approach to scoring – this allows us to judge incremental approaches and radical approaches with the same yardstick
  - Problems for which most of the work is occurring indirectly will likely be solved more slowly by increasing the direct work – many promising approaches will have been tried by groups outside the field
What do the summed scores mean:
- We can sanity-check the our scores by adding them up and then converting them back into a measure of actual impact from one additional person working on the problem
- Don’t put weight on the figures specifically, instead use the scores to make relative comparisons
How to assess personal fit
- Within a field, top performers have 10 to 100x the impact of median performers
- It’s important to choose a field that you’ll like and be good at
- Personal fit definition: Given your skills, resources, knowledge, connections and passions, how likely are you to excel in this area?
- How can it be assessed?
  - What’s your most valuable career capital? Is it particularly relevant to some problems and not others?
  - How motivated will you be if you work on this problem?
  - What specific roles could you take on solving this problem, and would you expect to excel at those roles?
- Personal fit matters more for some forms of altruism than others
  - If you’re planning on participating or contributing directly, it matters a lot
  - It matters less if you’re earning to give
Other factors for assessing career opportunities
- Need to think about factors other than impact when assessing career opportunities
- Influence
- Career capital
- Value of information from working on this option
How should we interpret the results
- Add the results for scale, neglectedness and solvability to get a rough idea of the most important problems
- If the difference is 4 or higher, one problem is clearly more important than the other
- If the difference is 3 or less, it’s a judgment call
How does this compare with ordinary cost-effectiveness criteria
- Alternative approach: compare cost-effectiveness of past interventions against a problem
- When comparing different problem domains, use a conversion factor (which adds uncertainty)
- Difficult to conduct in many circumstances
  - Political advocacy
  - Original research
  - Any field in which interventions are unknown or poorly studied
Advantages and disadvantages of quantitative problem prioritization
- Benefits:
  - Can help you notice large robust differences between problem areas or interventions
  - Helps avoid scope neglect
  - Going through the process helps test your understanding of the problem
  - Can help others critique your reasoning
- Disadvantages:
  - High levels of uncertainty
  - Different assumptions greatly affect the outcome of the analysis
  - Danger of being misled by an incomplete model, when it would have been better to go with qualitative analysis or common sense
- Don’t use this model alone, use it with other forms of evidence
Conclusion: it’s difficult to measure effectiveness precisely, but the large differences between problems and interventions means that even inaccurate measurements can be a useful guide

Four Focus Areas of Effective Altruism

EAs tend to:
1. Be globally altruistic
2. Value consequences
3. Try to do as much good as possible
4. Try to think quantitatively
5. Be willing to make significant life changes in order to be more altruistic
  - Change which charities they support
  - Change careers
  - Spend significant amounts of time investigating which areas are most cost-effective
  - Make other significant life changes
Despite this, EAs tend to be diverse and focus on a variety of issues
These issues tend to cluster in 4 groups:
1. Poverty reduction
  - Economic benefits, better health, better education
  - Major organizations:
    - GiveWell – most rigorous research on charitable causes, especially with regards to poverty reduction and global health
    - GoodVentures – works closely with GiveWell
    - The Life You Can Save – encourages people to pledge a fraction of their income to effective charities
    - Giving What We Can – does some charity evaluation and encourages people to pledge a fraction of their income to effective charities
  - In addition, some major foundations, such as the Bill and Melinda Gates Foundation fund cost-effective interventions in the developing world
  - In the future, EAs might focus on economic, political or research infrastructure changes that might achieve poverty reduction more directly
  - GiveWell Labs and The Vannevar Group are already staring to evaluate the cost-effectiveness of these measures
2. Meta-effective altruism
  - Raising awareness of EA
  - Helping EAs reach their potential
  - Doing research to decide what areas EAs should focus on
  - Major organizations:
    - 80,000 hours – highlights importance of helping the world through one’s career
    - Center for Applied Rationality (CFAR) – trains people in rationality skills, but are especially focused on the application of those skills to effective altruism
    - Leverage Research – focused on growing and empowering the EA movement
      - Hosts the EA Summit
      - Organizes the THINK student group network
      - Searches for mind-hacks which can make EAs more effective
3. The Far Future
  - Many EAs value future people as highly as currently existing people
  - Therefore, the vast majority of value is found in the astronomical numbers of people who could contribute in the far future
  - Focus on efforts to capture some of these benefits by reducing existential risk
  - Major organizations:
    - Future of Humanity Institute at Oxford University – main hub for research on existential risk mitigation
    - Machine Intelligence Research Institute – focuses on doing the research necessary to build Friendly AI, which could make the far future far better off
  - Other groups also study existential risks
    - NASA searches for asteroids which could be an existential threat
    - Many organizations, such as GCRI, study worst-case scenarios for nuclear war, climate change, or other disasters
4. Animal suffering
  - Reducing animal suffering in cost-effective ways
  - Animals vastly outnumber humans
  - Increasing evidence that animals consciously experience pleasure and suffering
  - The primary organization in this field is Effective Animal Activism
  - Major thinkers in this area include Peter Singer, David Pierce and Brian Tomasik
Other focus areas
- Effective environmental altruism
- Environmental movement is large and well-known
- Howevever, not many EAs take environmentalism as the most important thing for them to be working on
EAs should go out of their way to cooperate and learn from each other, even when they’re working in different focus areas

Why We Can’t Take Expected Value Estimates Literally Even When They’re Unbiased

Expected value calculations require a Bayesian adjustment in order to account for measurement and estimation uncertainty
Generally, GiveWell prefers to endorse areas where there is strong evidence that donations can do some good as opposed to weak evidence that donations can do a lot of good
The approach they oppose: Estimated Expected Value (EEV) decision-making:
- EEV approach generally involves and argument of the form:
  - Each dollar spent on program P has estimated value V
  - This estimate is extremely rough and unreliable, but it is unbiased,
  - Therefore we can use V as the per-dollar expected value of P
  - I don’t know how good charity C is at implementing P, but even if it wastes 75% of its money, the per-dollar expected value of each dollar donated to the charity is .25*V, which is still pretty good (if V is very large)
- Pascal’s mugging is the reductio ad absurdium of this
- The problem with the EEV approach is that it doesn’t incorporate a preference for better-grounded evidence over rougher estimates
- Ranks charities/actions based solely on their expected value, ignoring differences in the robustness of evidence for that value
Informal objections to EEV decision-making
- Nothing in EEV penalizes ignorance or poorly grounded estimates
- Because of this, a world in which people acted solely on EEV would be problematic in a number of ways
  - Nearly all altruists would put their money towards people or causes they know little or nothing about rather than helping themselves, their families or their communities
  - In such a world, once an action is determined to have high EEV, there is little or no incentive to to engage in costly skeptical inquiry into the actual value of the action
- Giving based on EEV seems to create bad incentives
  - Doesn’t value transparency of charities
  - Charities would have every incentive to announce that they’re working on high-expected value problems without disclosing details about their interventions
- Basing your decision-making on EEV leaves you open to Pascal’s Mugging
Applying Bayesian adjustments to the cost-effectiveness estimates of donations, actions, etc:
- Proposed model: normal or log-normal distribution of error around the cost-effectiveness estimate, with a mean of no-error
- Prior probability distribution for effectiveness of intervention is also normally or log-normally distributed
- The more confident one feels in their estimate for what their action should be, the smaller the variance of estimated error
- Effects:
  - A reliable estimate causes the Bayesian adjust conclusion to jump very close to the estimated value
  - When the estimate is relatively unreliable (large confidence intervals) the Bayesian adjustment causes the estimate to have virtually no effect on the conclusion
- The take-away is that having the mid-point of a cost-effectiveness probability distribution is not enough, you also need to understand the estimate error and degree of estimate error relative to the degree of variation in the estimated cost effectiveness of various interventions
Pascal’s Mugging
- Non-Bayesian approaches to Pascal’s Mugging ask, even if your analysis is wrong, are you sure that it’s 99.99…% wrong?
- However, in many of these cases, the lion’s share of variance in estimated expected value is coming from estimate error
- A Bayesian adjustment would divide the expected value of the action by the estimate of the error in the expected value
- The greater the uncertainty in the expected value, the greater the estimated error, so large EEV actions with high levels of uncertainty should affect your choices the least
Generalizing the the Bayesian approach
- One needs to quantify both the appropriate prior for cost-effectiveness and the confidence of an effectiveness estimate in order to quantify estimated cost-effectiveness
- However, when it comes to giving, reasonable quantifications of these things usually aren’t possible
- To have a prior, you need to have a reference class, and reference classes are debatable
- Our brains process a huge amount of information to come up with our intuitions
- Attempting to formalize our intuitions can result a reduced amount of information being considered
- When formulas are too rough, the loss of information outweighs the gains in transparency
- Incorrect approaches to Bayesian estimates
  - “I have a weak or uninformative prior, so I can take rough estimates literally”
    - You have more information than you think you do
    - Even a sense of the consequences to actions in your own life gives you an “outside view” and a starting probability distribution for estimating the consequences to altruistic actions
  - Making “downward adjustments” to an EEV estimate
    - How do you tell whether the downward adjustment went far enough?
    - As an extreme example, in the Pascal’s Mugging case, even a 99.99% downward adjustment wasn’t nearly enough
- Heuristics used to judge whether prior-based adjustments are correct
  - The more is asked of me, the more evidence I require: significant actions require more evidence than trivial actions
  - Pay attention to how much of the variation in estimates is likely to be driven by true variation vs. estimation error – when an estimate is so rough that estimate error accounts for the majority of the variation, apply a massive discount
  - Put more weight on conclusions that appear to be supported by multiple independent lines of analysis
  - Be hesitant to embrace arguments which have anti-common-sense implications without really strong evidence
    - Too weak priors can lead to absurd beliefs
    - Too weak priors remove the incentive to investigate strong claims
  - The prior for charity should be skepticism
    - Giving well is difficult
    - The more we dig on cost-effective estimates, the more unwarranted optimism we discover
    - Optimistic prior reward opaque charities
Conclusion
- Any giving approach that relies on estimated expected value alone is flawed
- Thus, when aiming to maximize positive impact, it’s not advisable to make decisions based solely on explicit formulas
- Proper Bayesian adjustments are important and difficult to formalize