2020 AI Alignment Literature Review and Charity Comparison - 87 minutes read
cross-posted to the EA forum here.
As in , , , and , I have attempted to review the research that has been produced by various organisations working on AI safety, to help potential donors gain a better understanding of the landscape. This is a similar role to that which GiveWell performs for global health charities, and somewhat similar to a securities analyst with regards to possible investments.
My aim is basically to judge the output of each organisation in 2020 and compare it to their budget. This should give a sense of the organisations' average cost-effectiveness. We can also compare their financial reserves to their 2020 budgets to get a sense of urgency.
I’d like to apologize in advance to everyone doing useful AI Safety work whose contributions I have overlooked or misconstrued. As ever I am painfully aware of the various corners I have had to cut due to time constraints from my job, as well as being distracted by 1) other projects, 2) the miracle of life and 3) computer games.
This article focuses on AI risk work. If you think other causes are important too, your priorities might differ. This particularly affects GCRI, FHI and CSER, who both do a lot of work on other issues which I attempt to cover but only very cursorily.
This document is fairly extensive, and some parts (particularly the methodology section) are largely the same as last year, so I don’t recommend reading from start to finish. Instead, I recommend navigating to the sections of most interest to you.
If you are interested in a specific research organisation, you can use the table of contents to navigate to the appropriate section. You might then also want to Ctrl+F for the organisation acronym in case they are mentioned elsewhere as well. Papers listed as ‘X researchers contributed to the following research lead by other organisations’ are included in the section corresponding to their first author and you can Cntrl+F to find them.
If you are interested in a specific topic, I have added a tag to each paper, so you can Ctrl+F for a tag to find associated work. The tags were chosen somewhat informally so you might want to search more than one, especially as a piece might seem to fit in multiple categories.
Here are the un-scientifically-chosen hashtags:
If you are new to the idea of General Artificial Intelligence as presenting a major risk to the survival of human value, I recommend by Kelsey Piper, or for a more technical version by Richard Ngo.
If you are already convinced and are interested in contributing technically, I recommend by Jacob Steinheart, as unlike this document Jacob covers pre-2019 research and organises by topic, not organisation, or from Critch & Krueger, or from Everitt et al, though it is a few years old now
FHI is an Oxford-based Existential Risk Research organisation founded in 2005 by Nick Bostrom. They are affiliated with Oxford University. They cover a wide variety of existential risks, including artificial intelligence, and do political outreach. Their research can be found .
Their research is more varied than MIRI's, including strategic work, work directly addressing the value-learning problem, and corrigibility work - as well as work on other Xrisks.
They run a Research Scholars Program, where people can join them to do research at FHI. There is a fairly good review of this . Unfortunately I suspect the pandemic may have reduced its effectiveness this year, as FHI has often favoured informal networking rather than formal management structures, but it seems to have worked well pre and hopefully post pandemic.
The EA Meta Fund supported a special program for providing infrastructure and support to FHI, called the . This reminds me somewhat of what BERI does.
In the past I have been very impressed with their work.
Bostom & Shulman's discusses the moral issues raised by the potential for uploads or other digital minds. By virtue of their number, speed, or specific design, these could be utility monsters - a term from Nozick for agents much more efficient than humans at turning resources into utility. Would we therefore be obliged to give up all our resources to them and eventually let meat humanity starve to death? This much has been discussed before - indeed, I alluded to this as an argument against a universal basic income as a response to AI-driven unemployment in previous versions of this article! - but this article both provides a canonical reference and also a good survey showing that such issues come up under a wide variety of ethical views and technological possibilities. I also enjoyed the discussion of the issues posed by rapid reproduction for 'democratic' political systems, where influence is the scarce resource. #Strategy
Ashurst et al.'s gives advice on how to write the new 'impact statements' that NeurIPS now requires. Seizing this gap in the market by writing the canonical piece that everyone will find when they google - my tests suggest they have the SEO - and filling it with a counterfactually valuable article is some good out-of-the-box thinking. As well as containing many very useful links, I liked the suggestion that even theoretical pieces should consider their impacts. #Misc
Kovařík & Carey's (When) Is Truth-telling Favored in AI Debate? provides some formalism and theorems around the properties of debate. I thought the section about debate length was very interesting, where it seems to show (at least for this class of debate) that debates are either long enough to produce the truth in a trivial manner (through full exposition) or else error can be arbitrarily high with even one fewer step, though they also identified plausible seeming sub-classes with much better performance. (the paper is technically from the very end of 2019 but I missed it last year) See also the discussion . #Amplification
Shevlane & Dafoe's The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? discusses whether increased AI publishing will generally be more useful for 'attack' or 'defence'. They argue that the 'publishing exploits is generally best practice (with a lag)' model from cybersecurity might not be best placed here - an important argument to rebut, as many people used it to criticise OpenAI's decision to be (initially) clopen with regard GPT-2. #Strategy
Ord's provides a detailed overview of existential risks and the future of humanity. It covers a variety of risks, including a good section on AGI, which Toby estimates as the largest risk at ~ 10% / century. There is also a huge amount of other material covered, including some novel ideas to me like the section on risk correlations, as well as some very motivational final chapters. I was pleasantly surprised to learn that 80% of DNA synthesis was being screened (in some way) for dangerous compounds. Probably replaces Bostrom and Ćirković as the best book on the subject now. #Overview
Carey et al.'s attempts to build a general theory of what sort of incentives lead agents to manipulate humans. This is basically causal diagram classification, revealing incentives to control and react to humans. It includes examples for both fairness incentives and also a possible way of reducing human manipulation incentivisation: optimising for a separately trained predictor. See also the discussion . Researchers from Deepmind were also named authors on the paper. #AgentFoundations
Clarke's attempts a more detailed analysis of the issues raised in Christiano's What failure looks like I liked the breakdown of lock-in mechanisms, which seem true to me.It provides a lot of examples, some of which I liked, like that of the Maori. However many of them were sufficiently simplified that I feel significant disanalogies were overlooked - for example, the Climate Change example neglects the very different incentives facing regulated utilities, and the agricultural revolution example seems to require a strong commitment to average utilitarianism, even though this is not a popular view of population ethics. Despite this I thought the underlying argument seemed pretty plausible. #Forecasting
Armstrong et al.'s introduces two desirable properties for agents who are trying to learn human values at runtime (unriggability and uninfluenceability) and proves they are broadly the same thing. As well as proving this result, it contains a series of examples of what can go wrong in the absence of either property - including sacrificing reward with probability 100% - and a brief discussion of how counterfactual rewards might address the problem. It ends with an extended gridworld example, but I found this a little hard to follow. See also the discussion . Researchers from Deepmind were also named authors on the paper. #ValueLearning
Tucker et al.'s discusses some of the strategic implications of ML systems that do not require as much data. They argue that it is not obvious that they will net benefit smaller firms - if the impact is multiplicative, it might benefit larger firms with more compliments (like market access) more - though I am not sure a multiplicative effect is really a good model for what people are thinking about when they talk about ML models needing less data. They also point out that due to threshold effect this might enable entirely new applications, and in particular IRL/amplification, as these rely on a very scarce source of data: humans. #Forecasting
Cohen & Hutter's Curiosity Killed the Cat and the Asymptotically Optimal Agent show that because any agent that is guaranteed to eventually find the optimal strategy can only do so by testing every option, any 'traps' in the environment will eventually be triggered with probability 1. (Unless traps are disabled after finite time). This is clearly kinda important - it is nice to be able to reason about asymptotic optimality, but we do not want an AGI that deletes humanity with p=1 en route. This suggests something of a bootstrap problem, where we need a 'mentor' to avoid such dangers. Researchers from Deepmind were also named authors on the paper. #RL
Cohen & Hutter's basically tries to make a conservative AIXI that defers to its mentor when it is not sure. It does this by comparing its worst-case estimates to its estimate of the mentor's expected case, and defers to the mentor more when the difference is higher (and less as t->oo). Hopefully the mentor will help keep the agent from being too conservative, as it seems there is a risk that it simply ends up doing nothing, and gets out-competed by an EV maximising agent? Researchers from Deepmind were also named authors on the paper. #RL
Nguyen & Christiano's provides an overview of Paul's IDA agenda. Probably the best such explanation so far; written by Chi when she was at FHI with in-line comments from Paul. Researchers from OpenAI were also named authors on the paper. #Amplification
Snyder-Beattie et al.'s The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare builds a bayesian model to try to get around the anthropic problem of estimating how easy it is for life to develop. Specifically, they use non-informative priors and update based on the distribution of various transitions (e.g. Eukaryotes), concluding (similar to previous work they cite) that the development of life is relatively hard. See also the discussion . #Forecasting
Ding & Dafoe's The Logic of Strategic Assets: From Oil to AI analyses what causes a product to be 'strategic' to a country. They decompose this into the product of its Importance, Externalities and Rivalosity, in contrast to previous analysis of simply 'military importance'. Some of the examples I might quibble with - for example, the paper claims that the spillovers from railways lead private agents to underinvest, which is somewhat in tension with the experience of the . I am also a bit sceptical that this analysis really subsumes the idea of dependency-strategic items - nitrates in WWI, and nuclear weapons now, both lack substitutes and are at risk of supply disruptions, but neither really seem to have massive externalities. It also would have been nice to see some analysis of why individual firms do not internalise the risk of supply disruption - is this due to anti-price gouging laws? It finishes with detailed discussion of two examples - British Jet Engines (reminding me of Attlee's disastrous mistake with another type of engine ) and US-Japanese rivalry. The report discusses several mistakes US policy made during this period - e.g. accidentally classifying cash registers as strategic, and missing rayon fibers - but these mistakes seem like they are adequately explained without the theory put forward by the paper. #NearAI
Cotton-Barratt et al.'s Defence in Depth Against Human Extinction: Prevention, Response, Resilience, and Why They All Matter provides a series of taxonomies for existential risks. In particular, they discuss distinctions between preventing and mitigating events, how events scale to be global, and how direct their effect is. See also the discussion . #Strategy
Cihon et al.'s Should Artificial Intelligence Governance be Centralised? Design Lessons from History discusses the advantages of centralised or fragmented international law approaches to AI. Most of the considerations are not AI specific. Researchers from CSER were also named authors on the paper. #Strategy
O'Brien & Nelson's Assessing the Risks Posed by the Convergence of Artificial Intelligence and Biotechnology discusses the impact of AI on biorisk. They first discuss the problems with several existing frameworks and the potential impact of AI on bio risk, before offering their own framework. #OtherXrisk
Cremer & Whittlestone's Canaries in Technology Mines: Warning Signs of Transformative Progress in AI attempt to identify possible signs of imminent AGI though expert-solicitation of causal influence diagrams. Basically a technology that is seen as a prerequisite for many others is a candidate for being a canary. However, I didn't feel the paper really addressed the issues raised in Eliezer's . Researchers from CSER were also named authors on the paper. #Forecasting
O'Keefe's How will National Security Considerations affect Antitrust Decisions in AI? An Examination of Historical Precedents surveys a bunch of historical antitrust actions in the US to see how national security arguments played into the outcome. He finds that it was pretty rare, especially recently, and when it did it was generally congruent with the main antitrust objectives, namely preventing artificial reductions in output. The idea here presumably is to suggest that the US government is unlikely to use antitrust as a tool in an AI race unless firms start overcharging for their services. O’Keefe also lists support from OpenPhil. #Politics
Bostrom et al.'s Written Evidence to the UK Parliament Science & Technology Committee's Inquiry on A new UK research funding agency . recommends that Cumming's new British DARPA focus on existential risks. I think this a worthwhile but big ask - DARPA seems more intended to fund risky things than to reduce risk - and now Cummings has left I worry the window for intervention here may have passed. Researchers from CSER were also named authors on the paper. #Politics
O'Keefe et al.'s The Windfall Clause: Distributing the Benefits of AI for the Common Good proposes that AI firms voluntarily commit to donating some % of profits over a high threshold to humanity in general. The idea is that the cost of this commitment is currently negligible, but would be extremely socially valuable if one firm gained a decisive strategic advantage. I think it's good to work on novel governance strategies, but I'm not very enthusiastic about this specific option, partly for reasons I outlined in lengthy but unfinished comments on the forum post, but mainly because I don't think it does much to reduce the existential risk, especially vs similar ideas like encouraging consolidation among AI firms. See also the discussion . #Politics
Garfinkel's and the associated document analyse the claim that economic growth has been accelerating in accordance with global GDP (or population). In general it finds the evidence for this to be somewhat weak. #Forecasting
Prunkl & Whittlestone's Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society proposes alternative divisions of the AI safety community other than near vs long term. These are: impacts, capabilities, certainty and scale. The paper argues that we should focus on these axis because 1) there is variance that is overlooked by a single short-vs-long axis and 2) this can cause misunderstandings. I did not really find this convincing: the purpose of any clustering is to summarize data, and I have yet to come across any examples of confusions that would be dispelled by their alternative axes. In fact, their motivating example - that of Etzioni's misreading of Bostrom - is a case where relying on the 'long term' stereotype would have given Etzioni more accurate beliefs! Similarly, their examples of 'intermediate' issues, like the long-term impact on inequality of algorithmic discrimination, seems to me like precisely the sort of political (and in my opinion mistaken) concern that everyone would agree falls into the 'short-term' camp. But perhaps, like , this paper is better understood as a speech act. See also the discussion . Researchers from Leverhulme were also named authors on the paper. #Strategy
FHI researchers contributed to the following research led by other organisations:
They also produced a variety of pieces on biorisk and other similar subjects, which I am sure are very good and important but I have not read.
According to Riedel & Deibel, over the 2016-2020 period, FHI accounted for by far the largest number of citations for meta-AI-safety work, and a respectable showing in technical AI safety.
FHI didn’t reply to my emails about donations, and seem to be more limited by talent than by money.
If you wanted to donate to them anyway, is the relevant web page.
CHAI is a UC Berkeley based AI Safety Research organisation founded in 2016 by Stuart Russell.. They do ML-orientated safety research, especially around inverse reinforcement learning, and cover both near and long-term future issues. One outside interpretation of their work from Alex Flint is .
As an academic organisation their members produce a very large amount of research; I have only tried to cover the most relevant below. It seems they do a better job engaging with academia than many other organisations, especially in terms of interfacing with the cutting edge of non-safety-specific research. The downside of this, from our point of view, is that not all of their research is focused on existential risks.
Rohin Shah, now with additional help, continues to produce the , covering in detail a huge number of interesting new developments, especially new papers. I really cannot praise these newsletters highly enough. Unfortunately for CHAI, but probably fortunately for the world, he has graduated and is moving to Deepmind.
They have expanded somewhat to other universities outside Berkeley and have people at places like Princeton and Cornell.
CHAI and their associated academics produce a huge quantity of research. Far more so than other organisations their output is under-stated by my survey here; if they were a small organisation that only produced one report, there would be 100% coverage, but as it is this is just a sample of those pieces I felt most interested in. On the other hand academic organisations tend to produce some slightly less relevant work also, and I have focused on what seemed to me to be the top pieces.
Critch & Krueger's is a super-detailed overview of the state of the field, and a research agenda. It provides a detailed explanation of key concepts and a categorisation schema of various possible scenarios, including new distinctions I hadn't seen clearly made before. This is a mammoth document, and I encourage the reader to attempt it if possible. A few interesting points for me were his argument that AI reseacher's discussions of 'near' AI problems as being the first steps towards admitting problems, or that Distributional Shift work might not be not neglected by Industry? Contrary to some others he argues that we should perhaps never make 'prepotent' AI (one that cannot be controlled by humans) - not even a defensive one to prevent other AI threats. There is also a lot of discussion of multi-polar scenarios - the idea that single agent alignment/delegation problems are less important to focus on, partly because the single-agent version is more likely to be solved by profit-maximising firms. See also the discussion . Researchers from BERI were also named authors on the paper. #Overview
Andreea et al.'s LESS is More: Rethinking Probabilistic Models of Human Behavior attempts to extend the model of Boltzmann rationality (where humans choose the best option, with noise, from a finite menu) to the continuous case. This is essentially by providing continuous measures of how 'similar' different options are, to show that e.g. driving at 41mph and 41.1mph are basically the same thing. #IRL
Christian's is a heavier-than-pop-sci book introduction to near and long-term AI issues. It does a good job connecting short-term worries (first part of book) to the bigger longer-term issues (second part of book), tying them together in multiple ways, and the scholarship seems very good. I enjoyed reading. #Overview
Critch's Some AI research areas and their relevance to existential safety describes Critch's views on a variety strategic research landscape questions. It contains some interesting ideas, like technical progress legitimising governance demands by making them credibly achievable. More importantly is the detailed and sophisticated analysis of each of these research areas in terms of their value and neglectedness. Notably for me were the sections arguing that research areas I have historically thought of as being pretty core to reducing AI X-risk, like Agent Foundations and Value Learning, as being not very useful, as well as a very positive view of studying Human-Robot interaction. However, I think it is a little credulous with regard to many near AI safety issues like fairness, to the point of supporting GDPR because more regulation is desirable, regardless of whether that regulation is good. #Strategy
Gleave et al.'s introduces a distance metric for reward functions. This allows us to judge whether two reward functions are 'the same' - at least relative to a certain environment. They might differ in a larger environment, as this pseudo-metric is weaker than utility functions’ being identical up to an affine transformation. It might be useful as a measure of how accurately RL agents have learnt the intended reward Researchers from Deepmind were also named authors on the paper. #RL
Reddy et al.'s attempts to learn safely by using hypothetical scenarios. Basically prior to letting the RL agent run around in the environment and potentially act unsafely, they procedurally generate hypotheticals in various ways and have the humans give feedback on them, so the agent can pre-learn before being let loose on the real environment. See also the discussion . Researchers from Deepmind were also named authors on the paper. #IRL
Freedman et al.'s introduces and analyses the implications of an IRL agent which has mistaken beliefs about its teacher's choice set. The obvious consequence would be assigning a low value on something that the human appears to have decided against - when it was actually inaccessible. The paper breaks this down into different cases, and shows (somewhat unsurprisingly) that the harm this does can vary from negligible to maximal. In some scenarios it is even helpful, by preventing an imperfectly rational human from mistakenly choosing a sub-optimal choice during training. #IRL
Shah's is a huge overview of AI alignment work from the prior two years. If you want to survey what people have been working on (as opposed to determining which organisations are best to donate to) this post is an excellent resource. #Overview
Russel & Norvig's , 4th Edition is the latest version of the famous textbook. It contains a chapter on AI ethics and safety, as previous editions did. The chapter is mainly focused on 'near' AI issues like discrimination; while it does provide an overview of some of the issues and techniques in AI alignment work, it doesn't really make the case for why this is so vitally important. #Textbook
Halpern & Piermont's presents a version of modal logic for logical uncertainty. Specifically, agents becoming 'aware' of propositions they had not previously considered. #AgentFoundations
CHAI researchers contributed to the following research led by other organisations:
According to Riedel & Deibel, over the 2016-2020 period, CHAI accounted for the second largest number of citations for technical AI safety.
They have been funded by various EA organisations including the Open Philanthropy Project and recommended by the .
They spent $2,000,000 in 2019 and $1,650,000 in 2020, and plan to spend around $2,200,000 in 2021. They have around $3,892,000 in cash and pledged funding, suggesting (on a very naïve calculation) around 1.8 years of runway. Their 2020 spending was about 20% below plan due to the pandemic.
If you wanted to donate to them, is the relevant web page. Unfortunately it is apparently broken at time of writing - they tell me any donation via credit card can be made by calling the Gift Services Department on 510-643-9789.
MIRI is a Berkeley based independent AI Safety Research organisation founded in 2000 by Eliezer Yudkowsky and currently led by Nate Soares. They were responsible for much of the early movement building for the issue, but have refocused to concentrate on research for the last few years. With a fairly large budget now, they are the largest pure-play AI alignment shop. Their research can be found . Their annual summary can be found .
In general they do very ‘pure’ mathematical work, in comparison to other organisations with more ‘applied’ ML or strategy focuses. I think this is especially notable because of the irreplaceability of the work. It seems quite plausible that some issues in AI safety will arise early on and in a relatively benign form for non-safety-orientated AI ventures (like autonomous cars or Minecraft helpers) – however the work MIRI does largely does not fall into this category. I have also historically been impressed with their research and staff.
Their agent foundations work is basically trying to develop the correct way of thinking about agents and learning/decision making by spotting areas where our current models fail and seeking to improve them. This includes things like thinking about agents creating other agents.
MIRI, in collaboration with CFAR, runs a series of four-day workshop/camps, the , which gather mathematicians/computer scientists who are potentially interested in the issue in one place to learn and interact. This sort of workshop seems very valuable to me as an on-ramp for technically talented researchers, which is one of the major bottlenecks in my mind. In particular they have led to hires for MIRI and other AI Risk organisations in the past. I don’t have any first-hand experience however, and presumably these were significantly suppressed by the pandemic.
They also support around the world, for people to come together to discuss and hopefully contribute towards MIRI-style work.
MIRI continue their policy of , something , which despite having some strong arguments in favour unfortunately makes it very difficult for me to evaluate them. I’ve included some particularly interesting blog posts some of their people have written below, but many of their researchers produce little to no public facing content.
They are (were?) also apparently considering leaving the bay area, which I think I would consider positively.
Most of their work is non-public. Here are three forum posts from the last year by staff that I thought were insightful.
Hubinger's An overview of 11 proposals for building safe advanced AI examines eleven different strategies for AI safety. It evaluates these on how promising they are for both the inner and outer alignment problems, as well as competitiveness - it is no good producing a 100% safe system if someone else out-competes you with a more risky one. This is the first post I've seen of this type and it does a great job. #Overview
Garrabrant's is a sequence of posts putting forward a new way of thinking, and associated mathematical formalism, about agency. The idea is to move away from dualistic AIXI style models, where the agent is outside the world, towards a system where we can examine different 'framings', each of which suggest a different thing as being agent-like - being able to make choices. This sensible philosophical motivation is then associated with a lot of category theory formalism, allowing you to do things like combining agents, decomposing agents, etc. #AgentFoundations
Abram Demski's presents a non-bayesian (ish) alternative account of probability. It is designed to take into account non-certain evidence, and allow for less rigid updating rules - in particular the fact that we can learn from thinking, not just from new sense data. I really enjoyed the dialogues, where I think the foil did a good job of presenting the objections I wanted to make. At the end of it I'm still not convinced what I think though - it seems a little unfair to compare a fully specified system, whose problems are easy to point out, with a somewhat hypothetical replacement. #AgentFoundations
According to Riedel & Deibel, over the 2016-2020 period, MIRI came in third for the number of citations in technical AI safety.
They spent $6,050,067 in 2019 and $7,500,000 in 2020, and plan to spend around $6,500,000 in 2021. They have around $13m380,000 in cash and pledged funding, suggesting (on a very naïve calculation) around 2.1 years of runway. 2020 spending was above plan; most orgs spent less due to the pandemic, but MIRI invested in sub-quarantine live/work spaces outside Berkeley so researchers could still benefit from in-person collaboration.
They have been supported by a variety of EA groups in the past, including OpenPhil.
They are not running a formal fundraiser this year but apparently would still welcome donations; if you wanted to donate to MIRI, is the relevant web page.
GCRI is a globally-based independent Existential Risk Research organisation founded in 2011 by Seth Baum and Tony Barrett. They cover a wide variety of existential risks, including artificial intelligence, and do policy outreach to governments and other entities. Their research can be found . Their annual summary can be found .
In 2020 they where they gave guidance to people from around the world who wanted to help work on catastrophic risks.
In 2020 they hired McKenna Fitzgerald as Project Manager and Research Assistant.
Baum's discusses the impacts and lessons from asteroid defence for other Xrisks, mainly nuclear war. It contains some interesting history about how congress came to care about asteroid defence - including that popular movies, while inaccurate, where quite helpful, and that many astronomers were relatively opposed. It also points out that using nuclear weapons or similar against an asteroid would probably be in violation of international law. Presumably in a disaster scenario the US would simply ignore this, but it might make preparation and practice ahead of time more difficult. #OtherXrisk
Baum's Quantifying the Probability of Existential Catastrophe: A Reply to Beard et al. responds to the CSER paper. It makes some methodological points, like about the importance of different thresholds for what constitutes a catastrophe, and ways in which this forecasting could be improved. See also the discussion . #Forecasting
Baum's discusses how AI could be used to aid research that joined multiple fields of research. For example, relatively basic AI could improve search engines by improving synonym handling, whereas more advanced AI could summarise papers. #NearAI
Baum's introduces the idea of Medium-Term AI risks. It argues these could be a unifying issue for those worried about near and long term risks. #NearAI
According to Riedel & Deibel, over the 2016-2020 period, GCRI accounted for the second largest number of citations for meta-AI-safety work.
They spent $250,000 in 2019 and $300,000 in 2020, and plan to spend around $400,000 in 2021. They have around $600,000 in cash and pledged funding, suggesting (on a very naïve calculation) around 1.5 years of runway. However, they tell me that for their core operations runway is close to one year, while the runway for external collaborators is longer.
If you want to donate to GCRI, is the relevant web page.
CSER is a Cambridge based Existential Risk Research organisation founded in 2012 by Jaan Tallinn, Martin Rees and Huw Price, and then established by Seán Ó hÉigeartaigh with the first hire in 2015. They are currently led by Catherine Rhodes and are affiliated with Cambridge University. They cover a wide variety of existential risks, including artificial intelligence, and do political outreach, including to the UK and EU parliaments - e.g. . Their research can be found . Their half-yearly review can be found .
They took on a number of new staff in 2020, most notably John Burden, Jess Whittlestone and Matthijs Maas. Jess joins from Leverhulme where I think she produced some of their best work.
Beard et al.'s An Analysis and Evaluation of Methods Currently Used to Quantify the Likelihood of Existential Hazards surveys a range of possible techniques for estimating the probability of different existential risks. They then score these on four criteria, and find that no method does well on all. The document contains a number of interesting points, including on the extreme dispersion in some estimates like Supervolcanoe. It also alludes to the use of 'bad, or even discredited' techniques being used in the existential risk community - this is a case where I wish they had named and shamed! #Forecasting
Belfield's Activism by the AI Community: Analysing Recent Achievements and Future Prospects reviews the prospects for successful activism by AI employees. It firstly reviews their historical successes, and then uses two different frameworks (as an epistemic community like scientists, and as workers) to analyse the issue, and concludes that AI workers are likely to continue to have significant power to change things through activism. I think this is basically true - my model for grand success runs basically through convincing this epistemic community. One thing the paper does not discuss is the question of getting the AI community to care about the right things though! #Strategy
Belfield et al.'s Response to the European Commission’s consultation on AI recommends the EU pass strict rules about AI. These largely cover more near term issues, and there is no explicit mention of catastrophic risks (that I noticed) but some could be long-run beneficial. The response generally seems written in a way that would appeal to policymakers. I wonder if part of the subtext is making EU AI deployment sufficiently arduous as to slow down AI progress (they deny this!). Researchers from Leverhulme were also named authors on the paper. #Politics
Beard et al.'s responds to the GCRI response to their earlier paper. #Forecasting
hÉigeartaigh et al.'s Overcoming Barriers to Cross-cultural Cooperation in AI Ethics and Governance discusses and advocates for international collaboration on AI safety. The lengthy discussion includes some interesting points about misconceptions and the prospects for common agreements in the presence of very different value systems, but is mainly an imperative piece rather than an analytical one. It focuses on Sino-American cooperation; three of the coauthors are Chinese. Researchers from Leverhulme were also named authors on the paper. #Politics
Beard & Kaczmarek's On the Wrongness of Human Extinction rebuts and argument that extinction would not be bad because non-existant people cannot be harmed. In particular they argue we wrong such future people by failing to benefit them, even though they have not been harmed. To the extent that responding to such arguments helps motivate people to prevent extinction this is a useful thing to do. (I guess if Extinction was actually good that would be good to know too as we could all stop working so hard!) #Ethics
Avin et al.'s describes a series of war games the authors ran about future AI development. This definitely a cool idea - I suspect I would enjoy taking part, and their sign-up sheet seems to be still live - and historically these exercises have proved useful in war, like the (in)famous . However, I am a bit skeptical of how much insight these particular games have produced - many of the conclusions (e.g. cooperation is important to produce a good outcome) seem both non-novel and also something that was plausibly 'fed into' the structure of the game. I am always a little suspicious of ideas that seem too much like fun! #Forecasting
Tzachor et al.'s Artificial intelligence in a crisis needs ethics with urgency discusses near-term AI risks related to the pandemic. It mentions things like fairness and privacy, but doesn't really have any specific examples of AI related problems, which aligns with my feeling that our pandemic response would have been better with less restrictions (e.g. our contract tracing could have been better without HIPAA). The intention appears to be to use this to establish an AI regulatory board to oversee novel techniques in the future. Researchers from Leverhulme were also named authors on the paper. #NearAI
Kemp & Rhodes's surveys the sorts of international governance structures for various Xrisks. #Politics
Burden & Hernandez-Orallo's argues for decomposing the risk of an AI agent into its Capabilities, Generality and our degree of Control. It suggests using Agent Characteristic Curves for this, and includes a toy example. Note that I think the lead author had not technically started at CSER when he wrote the paper. Researchers from Leverhulme were also named authors on the paper. #Capabilities
They also did work on various non-AI issues, which I have not read, but you can find on their website.
CSER researchers contributed to the following research led by other organisations:
According to Riedel & Deibel, over the 2016-2020 period, CSER accounted for the third largest number of citations for meta-AI-safety work.
They spent £801,000 in 2018-2019 and £854,000 in 2019-2020, and plan to spend around £1,200,000 in 2020-21. As with many organisations during the pandemic, their 2020 spending is below their expectations (£1,100,000). It seems that similar to GPI maybe ‘runway’ is not that meaningful - they suggested their grants begin to end in early 2021 and all end by mid-2024, the same dates as last year.
If you want to donate to them, is the relevant web page.
OpenAI is a San Francisco based independent AI Research organisation founded in 2015 by Sam Altman. They are one of the leading AGI research shops, with a significant focus on safety. Initially they planned to make all their research open, but changed plans and are now significantly more selective about disclosure - see for example .
One of the biggest achievements is GPT-3, a massive natural language algorithm that generates highly plausible continuations from prompts, which seems to be very versatile. Scott and Gwern managed to get GPT-2 , and see also other GPT-3 work by Gwern , including a to my mind convincing refutation of Gary Marcus’s criticisms ( ). The Guardian published an article in which GPT-3 argued that AGI was not a threat to humanity ; the article is not very much less convincing than is typical for such arguments.
Christiano's introduces translation between two languages where no mutual text exists as an analogy for advanced systems. This task seems do-able for a sufficiently advanced AI (I think, though probably some philosophers of language would disagree), but it would be very hard for humans to understand what was going on or to stay 'in-the-loop'. #Transparency
Brown et al.'s paper examines what happens to GPT-3's ability to learn a new task with very few examples when you massively increase the number of parameters. Essentially the idea is that as the number of parameters and number of co-authors gets large enough, it gains something like general purpose intelligence, which then allows it to learn new tasks with very few examples - like a human can. Performance on some of these tasks could even beat specially-trained models. The paper also has a detailed and professional section on potential for misuse in various near AI problems. #GPT-3
Barnes & Christiano's Writeup: summarises OpenAI's attempts to design mechanisms to allow non-experts to safely extract information for unaligned experts. It describes various problems they came across, like the deceptive use of ambiguity, or frame control, and their corrections to the mechanism design, like the addition of 'cross-examination'. Cross examination basically forces consistancy, and they analogise this to expanding the computational complexity class, but it is not clear how desirable this is - it seems intuitively to me like making something that worked locally with subgames would be ideal. I particularly liked the discussion of their iteration method, rather than just presenting the 'final' product sui generis. #Amplification
Brundage et al.'s describes a variety of ways to promote third-party verifiability of AI systems. This includes coding it into the AI (ideas like interpretability that we often discuss), hardware elements, and institutional reforms, like public bounties for people who find bugs. One of the most noteworthy parts of the document is the wide range of institutions represented in the author list, including many universities around the world. Researchers from FHI,CSER,Leverhulme,CSET were also named authors on the paper. #Strategy
Stiennon et al.'s trains a model for writing short text summaries based on human feedback. It first trains a reward model with supervised learning, and then uses that to train an RL agent. They invested in higher-than-usual quality feedback (hourly rate contractors vs Mturkers) and successfully produced summaries of Reddit posts and Daily Mail articles that were on average higher quality than the human written ones (though the latter were hardly Shakespeare). It is basically attempting to produce 'approved by humans' output, instead of just GPT-3 style 'looks like human written' - including testing how hard you can optimise for a proxy before you start getting perverse effects. I also liked the point that the model picked up that the reviewers liked longer summaries (similar to how Reddit likes EffortPosts?). #ValueLearning
Henighan et al.'s examines how transformer performance scales with compute in various cases. They find generally pretty similar and smooth relationships in multiple domains, implying a lack of (near) upper bound, and suggest that on the margin bigger models are more worth the computational effort than training smaller ones for longer. #Capabilities
OpenAI Researchers also contributed to the following papers lead by other organisations:
According to Riedel & Deibel, over the 2016-2020 period, OpenAI accounted for the third largest number of citations in technical AI safety.
OpenAI was initially funded with money from Elon Musk as a not-for-profit. They have since created an unusual corporate structure including a for-profit entity, in which .
Given the strong funding situation at OpenAI, as well as their safety team’s position within the larger organisations, I think it would be difficult for individual donations to appreciably support their work. However it could be an excellent place to apply to work.
Deepmind is a London based AI Research organisation founded in 2010 by Demis Hassabis, Shane Legg and Mustafa Suleyman and currently lead by Demis Hassabis. They are affiliated with Google. As well as being arguably the most advanced AI research shop in the world, Deepmind has a very sophisticated AI Safety team, covering .
I won’t cover their non-directly-safety-related work in detail, but one highlight is that this year Deepmind announced they had made significant progress on the with their AlphaFold architecture. While there’s still a ways to go yet before we can use it to build arbitrary proteins, this is clearly a big step forward, and shows the generality of their approach. See also discussion . Long-time followers of the space will recall this is a development Eliezer highlighted . See also very interesting speculation that Deepmind’s team-based private sector approach gave them a significant advantage over academia, and that their speed helped limit knowledge diffusion.
They also produced on one-shot object naming learning in a physical environment - so rather than having to show the agent a huge number of pictures of cows for it to learn what a cow is, it successfully learns new object names based on a very small number of samples. See also discussion .
Krakovna et al.'s is basically an introduction, with many examples, to the problem of AIs producing solutions you did not expect - or want. It discusses both failures of reward shaping as well as AIs manipulating the rewards. #ValueLearning
Gabriel's discusses the alignment problem from various philosophical perspectives. It makes some novel (at least to me) points, like the way that technical AI design may render some ethical systems unobtainable - for example, an optimiser that does not think in terms of 'reasons' is unacceptable to the extent that Kantian deontology is the case. The connection between IRL and virtue ethics was also cute. Overall I thought it was a quite sophisticated treatment of the subject. #Ethics
Krakovna et al.'s proposes a method for reducing side effects. We specify a default policy, and then penalise the agent for restricting our future options relative to that default policy. This helps avoid the risk of e.g. the agent being incentivised to undermine the human's attempts to shut it down. #Corrigibility
Uesato et al.'s addresses the problem of agents messing with their value functioning (by e.g. setting utility=IntMax in their params file) by querying a human for reward with regard actions other than those taken. They need to make some assumptions about the structure of the corruption that seem not obvious to me, but it seems like a cool idea. On my reading it doesn't strongly disincentive tampering - it just fails to reward it - which is still an improvement. They back this up with some toy models. #ValueLearning
Researchers from Deepmind were also named on the following papers:
Being part of Google, I think it would be difficult for individual donors to directly support their work. However it could be an excellent place to apply to work.
BERI is a (formerly Berkeley-based) independent Xrisk organisation, founded by Andrew Critch but now led by Sawyer Bernath. They provide support to various university-affiliated (FHI, CSER, CHAI) existential risk groups to facilitate activities (like hiring engineers and assistants) that would be hard within the university context, alongside other activities - see their for more details.
As a result of their pivot they are now essentially entirely on providing support to researchers engaged in longtermist (mainly x-risk) work at universities and other institutions. In addition to FHI, CSER and CHAI they added six new ‘trial’ collaborations in 2020 , and intend to do more in 2021. Here are the 2020 cohort:
I think this is potentially a pretty attractive task. University affiliated organisations provide the connection to mainstream academia that we need, but run the risk of inefficiency both due to their lack of independence from the central university and also the relative independence of their academics. BERI potentially offers a way for donors to support the university affiliated ecosystem in a targeted fashion.
They are apparently quite relaxed about getting credit for work, so not all the stuff they support will list them in the acknowledgments.
They spent $3,500,000 in 2019 and $3,120,000 in 2020, and plan to spend around $2,500,000 in 2021. They have around $2400000 in cash and pledged funding, suggesting (on a very naïve calculation) around 1 years of runway.
BERI is now seeking support from the general public. If you wanted to donate you can do so . Note that if you want to you can restrict the funding to their collaborations with FHI, CSER and CHAI if you want.
Ought is a San Francisco based independent AI Safety Research organisation founded in 2018 by Andreas Stuhlmüller. They research methods of breaking up complex, hard-to-verify tasks into simple, easy-to-verify tasks - to ultimately allow us effective oversight over AIs. This includes building computer systems and recruiting test subjects. Their research can be found . Their annual summary (sort of) can be found .
In the past they were focused on factored generation – trying to break down questions into context-free chunks so that distributed teams could produce the answer. I thought of them as basically testing Paul Christiano's ideas. They have moved on to factored evaluation – using similar distributed ideas to try to evaluate existing answers, which seems a significantly easier task (by analogy to P<=NP).
Saunders et al.'s provides a detailed analysis of some of Ought's 2019 work on factored evaluation. They tried to break down opinions about movie reviews into discretely checkable sections between a friendly and adversarial agent. The trees they ended up using are quite small - just two layers, plus the root node, presumably because of the problems they had previously encountered with massive tree growth. It's hard to judge the performance numbers they put out, because it's not obvious what sort of performance we would expect from such a circumsized test, even conditional on this being a good approach, but the efficacy they report does not look that encouraging to me. #Amplification
Byun & Stuhlmuller's Automating reasoning about the future at Ought describes Ought's new program of providing tools to help with people forecasting. This includes assigning probabilities and distributions to beliefs, vaguely similarly to Guestimate. They are now working on building a GPT-3 research assistant. #Amplification
They spent around $1,200,000 in 2019 and $1,200,000 in 2020, and plan to spend around $1,400,000 in 2020. Their 2020 spend was significantly below plan (around $2.5m) due to slower hiring and ending human participant experiments. They have around $3,100,000 in cash and pledged funding, suggesting (on a very naïve calculation) around 2.2 years of runway.
They are not looking for donations from the general public this year.
GPI is an Oxford-based Academic Priorities Research organisation founded in 2018 by Hilary Greaves and part of Oxford University. They do work on philosophical issues likely to be very important for global prioritisation, much of which is, in my opinion, relevant to AI Alignment work. Their research can be found .
They recently took on two new economics postdocs ( and ) and two new philosophy postdocs ( and )
Trammell & Korinek's applies a variety of models of economic growth to the introduction of AI. These consider both a variety of models and a variety of ways AI could matter - is it a perfect substitute for labour? Do AIs make more AIs? - and summarises the results of this mathematical analysis. I particularly liked the way that discrete qualitative changes in economic regime fell out of the analysis. Overall I thought it did a nice job unifying the two disciplines. #Forecasting
Mogensen's Moral demands and the far future argues that, contra most people's suppositions, egalitarian utilitarianism requires the present rich not to transfer resources to the present poor but to future generations. It argues this is true under various versions of population ethics. #Ethics
Tarsney & Thomas's argues that even average-utility type theories should care about the potential for adding many new happy people in the future, because all the past animals provide a large fixed utility background. This fixed utility makes the average behave like the sum, at least locally, so adding a large number of lives that are better off than the average historical rodent is very worthwhile. It's not clear what we should do about aliens. I have always regarded these ideas as something of a reductio of average consequentialism and similar views, but it is nice to have a proof to show that even those who are convinced should care quite a lot (if not quite as much) as totalists about Xrisk. #Ethics
Thorstad & Mogensen's Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making addresses the cluelessness problem - that the immense importance and uncertainty of the long run future leaves us clueless as to what do to - through the use of local heuristics. #DecisionTheory
Tarsney's suggests we can avoid some of the paradoxes of expected utility maximisation (e.g. St Petersburg Paradox) by using Stochastic Dominance. This basically comes down to arguing that we can make use of background assumptions to push the dominance condition to give us virtually all of the benefits of expectation maximisation, while avoiding the Pascalian type problems - and of course stochastic dominance is a prima facie attractive principle in itself. #DecisionTheory
Mogensen & Thorstad's Tough enough? Robust satisficing as a decision norm for long-term policy analysis advocates for 'robust satisficing', as an alternative to expectation maximisation, as a decision criteria in cases where there is 'deep' uncertainty. The aim is basically to give a firmer theoretical underpinning for engineers to use this relatively conservative approach in risky situations. #Strategy
John & MacAskill's described a number of potential governance changes we could make to try to represent the interests of future people better. These include impact assessments, people's assemblies and separate legislative houses. I think this is a good project to work on, but I'm sceptical of these specific proposals; they seem a bit like a list of 'policies that sound nice' to me, without really considering all the problems - for example, our current use of environmental impact assessments seems to have had very negative consequences for our ability to build any new infrastructure, and I think there are good reasons sortition has rarely been used in practice. See also discussion . #Politics
They spent £600,000 in 2018/2019 (academic year) and £850,000 in 2019/20, which was less than their plan of £1,400,000 due to the pandemic, and intend to spend around £1,400,000 in 2020/2021. They suggested that as part of Oxford University ‘cash on hand’ or ‘runway’ were not really meaningful concepts for them, as they need to fully-fund all employees for multiple years.
If you want to donate to GPI, you can do so .
CLR is a London (previously Germany) based Existential Risk Research organisation founded in 2013 and until recently lead by Jonas Vollmer (who has now moved to EA Funds). Until this year they were known as FRI (Foundational Research Institute) and were part of the Effective Altruism Foundation (EAF). They do research on a number of fundamental long-term issues, with AI as one of their top two focus areas (along with Malevolence, though that is still related). You can see their recent research summarised here.
In general they adopt what they refer to as ‘suffering-focused’ ethics, which I think is a quite misguided view, albeit one they seem to approach thoughtfully.
They recently hired Alex Lyzhov, Emery Cooper, Daniel Kokotajlo (from AI Impacts, possibly not permanent), and Julian Stastny as full-time research staff, Maxime Riché as a research engineer and Jia Yuan Loke as part-time.
Althaus & Baumann's analyses the dangers posed by very evil (score highly on the 'dark triad' traits) people, and suggests some possible techniques to reduce the risk. This detailed report, on an area I hadn't seen much before, includes the context of whole brain emulation, AGI, etc. #Politics
Clifton's describes the problem of ensuring desirable equilibria between multiple agents when they have different priors. The idea that different equilibria could be possible etc. is well known, but the contribution here is to point out that different priors between teams / agents could push you into a very bad equilibrium - for example, if your Saxons falsely believe the Vikings are bluffing. #AgentFoundations
Clifton & Riche's discusses the meta-game-theoretic problem of how to get AI teams to cooperate on the task of building AIs that will cooperate with each other. They introduce the idea of Learning TFT and run some experiments around its performance. #AgentFoundations
They spent around $1,400,000 in 2019, around $1,100,000 in 2020, and plan to spend around $1,800,000 in 2021. They have around $950,000 in reserves, suggesting (on a very naïve calculation) around 0.6 years of runway. Their 2019 spending was somewhat somewhat higher than they expected a year ago, based on FX changes and some unexpected items, especially related to travel and their move to the UK.
They have a collaboration with the Swiss-based , who have agreed to fund 15% of their costs.
If you wanted to donate to CLR, you could do so .
CSET is a Washington based Think Tank founded in 2019 by Jason Matheny (ex IARPA), affiliated with the University of Georgetown. They analyse new technologies for their security implications and provide advice to the US government. At the moment they are mainly focused on near-term AI issues. Their research can be found .
Hwang's discusses strategies for the US to compete with China in AI. In particular, these attempt to nullify the 'natural' advantages authoritarian or totalitarian states may have. #Politics
Imbrie et al.'s The Question of Comparative Advantage in Artificial Intelligence: Enduring Strengths and Emerging Challenges for the United States discusses the relative advantages of the US and China in AI development. #Politics
As they apparently launched with , and subsequently raised money from the , I am assuming they do not need more donations at this time.
AI Impacts is a San Francisco (previously Berkeley) based AI Strategy organisation founded in 2014 by Katja Grace and Paul Christiano. They are affiliated with (a project of, with independent financing from) MIRI. They do various pieces of strategic background work, especially on AI Timelines, AI takeoff speed etc. - it seems their previous work on the relative rarity of discontinuous progress has been relatively influential. Their research can be found .
During the year Kokotajlo left (temporarily?) for CLR, and Asya may be leaving for FHI.
A lot of the work on the website is essentially in the form of a continuously updated private wiki - see . This makes it a little difficult for our typical technique, which relies on being able to evaluate specific publications which are released at specific times. As such it is a little unfortunate that in the below we generally concentrate on their timestamped blogposts. They suggested readers might be interested in posts like these ones.
They have produced a series of pieces on how long it has historically taken for AIs to cover the human range (from beginner to expert to superhuman) for different tasks. This seems relevant because people only seem to really pay attention to AI progress in a field when it starts beating humans. These pieces include , , , and .
Grace's details their extensive research into examples of discontinuities in technological progress. They find 10 such examples, across construction, travel, weapons and compute. As well as being a very pleasant read, they had some interesting conclusions, for example that the discontinuities often occurred in non-optimised secondary features, and many occured when something became just good enough to pass a threshold on another feature. Especially interesting to me is some of the things they found to not be discontinuities: AlexNet and Chess AI. Could this mean that future progress could 'feel' discontinuous in some important sense even if it doesn't register as such on some objective benchmark ? The individual trend writeups (e.g. penacillin ) are also interesting. See also . #Forecasting
Kokotajlo's distinguishes between AI systems that will outperform, those that will be cheaper, and those that will arrive sooner. This is a very simple dichotomy that actually helped make things clearer; the post contains just enough to make the point and significance clear. #Strategy
Korzekwa's describes the difference between modelling how steady technological progress was in the past, and thinking about how predictable it was in the past. For example, the speedup that aeroplanes offered for transatlantic travel (relative to ships) was presumably quite predictable to someone who knew about progress in aeronautics, even though it was very sudden. #Forecasting
Kokotajlo's is a scenario simulator for different future developments. Basically you enter probabilities for a bunch of relevant things that could happen and it randomly generates a future. By clicking repeatedly, you can get a representative sense for the sort of futures your beliefs entail. #Forecasting
Korzekwa's attempts to find historical cases where humans have taken advance action to solve an unprecedented problem. It does not find any examples better than the classic Szilard case. This could be good news - that, in practice, there is always feedback, so the problem is not as easy as we thought - or it could be bad news - we have to solve a type of problem we have literally never solved before (or not very much news, to the extent it is only preliminary). #Forecasting
Grace's notes that AI mastery of Atari games seems to have arrived significantly earlier than experts previously expected. #Forecasting
They spent $315,000 in 2019 and $300,000 in 2020, and plan to spend around $200,000 in 2021. They have around $190,000 in cash and pledged funding, suggesting (on a very naïve calculation) around 0.95 years of runway.
In the past they have received support from EA organisations like OpenPhil and FHI.
MIRI administers their finances on their behalf; donations can be made .
Leverhulme is a Cambridge based Research organisation founded in 2015 and currently lad by Stephen Cave. They are affiliated with Cambridge University and closely linked to CSER. They do work on a variety of AI related causes, mainly on near-term issues but also some long-term. You can find their publications . They have a document listing some of their achievements .
Leverhulme-affiliated researchers produced work on a variety of topics; I have only here summarised that which seemed the most relevant to AI safety.
Hernandez-Orallo et al.'s AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues performs algorithmic analysis of AI papers to determine trends. One interesting thing they pick up on (perhaps obvious in retrospect) is that (generally near-term) 'safety' related papers peak within any given paradigm after the paradigm itself. Researchers from CSER were also named authors on the paper. #Strategy
Whittlestone & Ovadya's T he tension between openness and prudence in responsible AI research discusses the conflict between traditional CS openness norms and the new ones we are trying to create. They decompose this conflict in various ways. The focus of the paper is on near-term issues, but the principle clearly matters for the big issue. Researchers from Leverhulme were also named authors on the paper. #Strategy
Crosby et al.'s produces a series of tests for AI ability based on animal IQ tests. This is an alternative to traditional tests like Atari, with the appeal being their practical relevance and reduced overfitting (as some of the tests are not in the training data). Presumably the benefit here is to improve out-of-distribution performance. #Misc
Zerilli et al.'s discusses the problem of humans growing complacent and overly deferential towards AI systems they are meant to be monitoring. If the system is 'always right', eventually you are just going to click 'confirm' without thinking. #NearAI
Peters et al.'s discusses some ethical principles for engineers #NearAI
Bhatt et al.'s gathered focus groups to discuss how to make AI transparent to outsiders (not just designers) #NearAI
Cave & Dihal's worries that too many AIs are depicted as being coloured white. It seems to me it would be roughly equally (im)plausible to say it would be problematic if robots (from the slavic word for forced labour) were black. #NearAI
Leverhulme researchers contributed to the following research led by other organisations:
According to Riedel & Deibel, over the 2016-2020 period, Leverhulme accounted for the third largest number of citations for meta-AI-safety work.
AISC is an internationally based independent residential research camp organisation founded in 2018 by Linda Linsefors and currently led by Remmelt Ellen. They bring together people who want to start doing technical AI research, hosting a 10-day camp aiming to produce publishable research. Their research can be found .
To the extent they can provide an on-ramp to get more technically proficient researchers into the field I think this is potentially very valuable. But I haven’t personally experienced the camps, or even spoken to anyone who has.
Makiievskyi et al.'s try to train RL algorithms on various games to generalise to new environments. They generally found this was difficult. #RL
They spent $23,085 in 2019 and $11,162 in 2020, and plan to spend around $53,000 in 2021. They have around $28,851 in cash and pledged funding, suggesting (on a very naïve calculation) around 0.5 years of runway. They are run by volunteers, and are considering professionalising, depending on the amount of donations they receive.
If you want to donate, the web page is .
FLI is a Boston-based independent existential risk organization, focusing on outreach, founded in large part to help organise the regranting of $10m from Elon Musk. One of their major projects is trying to ban .
They wrote a letter to the EU advising for stricter regulation, with 120 signituries, .
They have a very good podcast on AI Alignment .
Aguirre's Why those who care about catastrophic and existential risk should care about autonomous weapons argues that we should work towards a ban on Lethal Autonomous Weapons. This is not only because they might be destabilising WMDs, but also as a 'practice run' for future regulation of AI. #NearAI
Convergence is a globally based independent Existential Risk Research organisation, of which Justin Shovelain founded an earlier version in 2015 and David Kristoffersson joined as cofounder in 2018. They do strategic research about Xrisks in general as well as some AI specific work. Their research can be found . Their short summary can be found .
Justin Shovelain and David Kristoffersson are the two full-time members of Convergence, but they have had other people on part-time for periods of time, such as Michael Aird in the first half of 2020, and Alexandra Johnson.
Shovelain & Aird 's Using vector fields to visualise preferences and make them consistent discusses the idea of using vector fields as a representation of local preferences, and then using curl as a measure of their consistency. I liked this as a clear and less blackboxy-than-ML account of how preferences were being represented. It would be good to see some more on whether the helmholtz theorem gives us the sorts of properties we want in addition to removing the curl. #ValueLearning
Aird's Existential risks are not just about humanity argues that, despite its being technically excluded from the definition, we should take into account the possibility of positive-value alien-originating life when we consider existential risks. #Strategy
Aird et al.'s Memetic downside risks: How ideas can evolve and cause harm discusses the risk of ideas becoming distorted over time in the retelling. This includes predictions about the average direction in which memes will evolve: for example, towards simplicity. (They suggested this might be a more important article on a similar subject but I haven't had time to read) #Strategy
They suggested readers might also be interested in this, this and this.
They spent $50,000 in 2019 and $13,000 in 2020, and plan to spend around $30,000 in 2021. They have around $37000 in cash and pledged funding, suggesting (on a very naïve calculation) around 1.2 years of runway.
Though they are not actively seeking donations at the moment, if you wanted to donate you could do so here.
Median is a Berkeley based independent AI Strategy organisation founded in 2018 by Jessica Taylor, Bryce Hidysmith, Jack Gallagher, Ben Hoffman, Colleen McKenzie, and Baeo Maltinsky. They do research on various risks, including AI timelines. Their research can be found .
Their website does not list any relevant research for 2020.
They did not reply when I asked them about their finances. Median doesn’t seem to be soliciting donations from the general public at this time.
The Program on Understanding Law, Science, and Evidence ( ) is part of the UCLA School of Law, and contains a group working on AI policy. They were founded in 2017 with a .
Their website does not list any research for 2020 that seemed relevant to existential safety.
I would like to emphasize that there is a lot of research I didn't have time to review, especially in this section, as I focused on reading organisation-donation-relevant pieces. So please do not consider it an insult that your work was overlooked!
Benadè et al.'s works on how to get people to share their preferences, and then combine this information. In particular they separate the preference-inferring step from the aggregation step, exploring multiple input and aggregation methodologies. Some of this paper was from 2016 but I missed it then and figured enough was new to warrant a mention here. #ValueLearning
Qian et al.'s is a collected volume of articles on governance from over 50 different authors. Both China and the West are well represented. (I have not read all the individual articles) Researchers from OpenAI,CHAI,CSER were also named authors on the paper. #Politics
Krakovna's Possible takeaways from the coronavirus pandemic for slow AI takeoff discusses the significance of our covid performance for AGI strategy. It discusses the ways in which, even though the pandemic was quite slow moving and clearly predictably disastrous, western governments failed to act, suggesting there might be similar failures in a slow AGI takeoff. I also recommend Wei's comment, which points out that the disaster easily became politicised - it is truly impressive (-ly dire) that in the US the partisan positions in the US managed to flip three times without ever producing an effective response. Indeed it seems plausible to me that on net government intervention . (The author works for FLI and Deepmind but this seems to be a separate 'personal' article). See also the discussion . #Strategy
Ngo's presents Richard's account of the case for AI risk. This is basically the idea that, by creating AGI, humankind might end up as only the world's second most powerful species. I think most readers will probably (unsurprisingly) agree with him here; it seems like a very good account of the core argument, which is nice to have newer versions of. #Overview
Ecoffet & Adrien's is the first paper I've seen trying to impliment and test different approaches to moral uncertainty in an RL setting. Obveously harkening to Will's thesis, though they restrict to theories with cardinal utilities only - which seems, to my mind, to assume away the hardest part. They compare expectation maximisation to voting systems, and test on trolley problems. #RL
Hendrycks et al.'s showcases a data set of moral examples (e.g. property damage is wrong) and trains various transformer text algorithms on it. I like the way they use deliberately uncontroversial examples; I think we will do much better if we can get agents who get 99% of situations correct that by re-litigating the culture war by proxy. As a first pass we should consider their results as a sort of benchmark for future work using the database. Researchers from CHAI were also named authors on the paper. #
Benaich & Hogarth's is an overview of the AI industry in 2020 by two investors. It is very detailed, but not that directly relevant. #Overview
Wilkinson's offers the first defence of EV maximisation fanaticism that I have ever seen. It includes both counterarguments against the common rejections (which lets face it often resemble David Lewis's incredulous stare), as well as two nice dilemmas for the non-fanaticism. See also the discussion . #DecisionTheory
Linsefors & Hepburn's describes a group they have created to try to support people entering the field. #Strategy
Aird's Failures in technology forecasting? A reply to Ord and Yudkowsky discusses the examples that Eliezer and Toby use as evidence for the difficulty in predicting technological development, and argues that it is not so clear that these really show this exactly. For example, the quote about Wilbur Wright doubting the possibility of flight looks more like a moment of depression than a forecast that would have been taken seriously by contemporaries. Overall I thought his "these examples seem somewhat cherry-picked" argument was the most convincing. #Forecasting
Scholl & Hanson's evaluate predictions of AI-driven unemployment. They find that these predictions have had low but positive explanatory value for predicting which jobs would be automated so far. Researchers from FHI were also named authors on the paper. #NearAI
Xu et al.'s discusses various ways of preventing a chatbot from saying offensive things. #ValueLearning
One of my goals with this document is to help donors make an informed choice between the different organisations. However, it is quite possible that you regard this as too difficult, and wish instead to donate to someone else who will allocate on your behalf. This is of course much easier; now instead of having to solve the Organisation Evaluation Problem, all you need to do is solve the dramatically simpler Organisation Evaluator Organisation Evaluation Problem.
LTFF is a globally based EA grantmaking organisation founded in 2017, currently lead by Matt Wage and affiliated with CEA, but probably becoming independent (along with the other EA funds under Jonas Vollmer) in 2021. They are one of four funds set up by CEA to allow individual donors to benefit from specialised capital allocators; this one focuses on long-term future issues, including a large focus on AI Alignment. Their website is . There are write-ups for their three grant rounds in 2020 are , and , and comments , and . As the round was not public when I wrote last year I have included it in some of the analysis below. They also did a AMA recently .
The fund is now run by five people, and the grants have gone to a wide variety of causes, many of which would simply not be accessible to individual donors.
The fund managers are currently:
Asya and Adam are new, replacing Alex Zhu. My personal interactions with the two of them are supportive of the idea they will make good grants. I was sad to see that Oliver plans to step back from some aspects of the fund as he felt that the marginal value of opportunities was diminished . All the managers have, up until now, been unpaid, but I understand this may change in 2021. Additionally, the grant managers will have to be re-appointed for their positions in 2021, so there may be some turnover.
In total for 2020 they granted around $1.5m. In general most of the grants seem at least plausibly valuable to me, and many seemed quite good indeed. There weren’t any in 2020 that seemed totally egregious. As there is a fair bit of discussion in the links, and no one grant dominated the rounds, I shan't discuss my opinions of individual grants in detail.
I attempted to classify the recommended by type. Note that ‘training’ means paying an individual to self-study. I have deliberately omitted the exact percentages because this is an informal classification.
Of these categories, I am most excited by the Individual Research, Event and Platform projects. I am generally somewhat sceptical of paying people to ‘level up’ their skills.
In their they mentioned a desire to “continue to focus on grants to small projects and individuals rather than large organizations.” Despite this, it appears to me that the amount of grants to large organisations actually increased in 2020 vs 2019, which is a bit disappointing. I can understand why the fund managers gave over a third of the funds to major organisations – they thought these organisations were a good use of capital! And some of these organisations are, to be fair, small rather than large. However, to my mind this undermines the purpose of the fund. (Many) individual donors are perfectly capable of evaluating large organisations that publicly advertise for donations. In donating to the LTFF, I think (many) donors are hoping to be funding smaller projects that they could not directly access themselves. As it is, such donors will probably have to consider such organisation allocations a mild ‘tax’ – to the extent that different large organisations are chosen then they would have picked themselves.
The fund donates a relatively large percentage to AI related activities; I estimate around 2/3. Many of the other grants, focused on other long-term issues, also seemed sensible to me. The only one I would question was subsidising a therapist to move to the bay area, which seems like a better fit for the if nothing else.
Richard Ngo’s PhD, which the fund managers recommended $150,000, was the largest single grant (just over 10% of the 2020 total), followed by MIRI, 80k and Vanessa Kosoy with $100,000 each.
All grants have to be approved by CEA before they are made; to my knowledge they approved all recommended grants in 2020.
One significant development in 2020 was their decision to make an anonymous grant (roughly 3% of total) to a PhD student. Based on their description of the purpose of the grant, the lack of reported conflicts and the use of an additional outside reviewer, I feel pretty confident that this specific grant was a decent one. I’m not aware of anyone with a ‘strong track record in technical AI safety’ for whom it would be a severe mistake for the LTFF to support. And I definitely understand a desire for privacy, especially when begging for money from weird people for a weird purpose - or so it could seem to outsiders. However by doing so they undermine the ability of the donor community to provide oversight, which is definitely a bit concerning to me. This would be especially true in the absence of the other details about the grant they provided.
If you wish to donate to the LTFF you can do so .
The Open Philanthropy Project (separated from Givewell in 2017) is an organisation dedicated to advising Cari and Dustin Moskovitz on how to give away over $15bn to a variety of causes, including existential risk. They have made extensive donations in this area and probably represent both the largest pool of EA-aligned capital and the largest team of EA capital allocators.
They recently described their strategy for AI governance, at a very high level, .
It is possible that the partnership with Ben Delo we discussed last year .
You can see their grants for AI Risk . It lists 21 AI Risk grants in 2020, plus 4 others for global catastrophic risks and several highly relevant ‘other’ grants. In total I estimate they spent about $19m on AI in 2020.
In contrast were only 4 AI Risk grants listed for 2019, though one of these (CSET) was for $55m.
The OpenPhil AI Fellowship basically fully funds AI PhDs for students who want to work on the long term impacts of AI. Looking back at the 2018 class (who presumably will have had enough time to do significant work since receiving the grants), scanning the abstracts of their publications on their websites suggests that over half have no AI safety relevant publications in 2019 or 2020, and only one is a coauthor on what I would consider a highly relevant paper. Apparently it is somewhat intentional that these fellowships are intended to be specific to AI safety , though I do not really understand what they are intended for. OpenPhil suggested that part of the purpose was to .
They are also launching a which seems more tailored to people focused on the long-term future, though it is not AI specific.
They produced a list of ; there were zero AI or existential risk opportunities.
Most of their research concerns their own granting, and is often non-public.
is a supremely detailed, yet still draft (!), report on how long we should expect the timeline to AGI to be. Impossible for me to do it justice, but essentially it attempts to model both the amount of computational power required to achieve transformative AGI (with current algorithms, the main focus), how much algorithms are improving, and how long it will take to accumulate this hardware. The report estimates doubling times of roughly 2-3 years for both compute and algorithm design. Interestingly, it also suggests that the costs of the final training run will fall as a fraction of overall costs. I liked the way it considers multiple different outside view 'anchors' for different perspectives on the problem - e.g. how much computing did evolution do to produce humans? #Forecasting
Carlsmith's How Much Computational Power Does It Take to Match the Human Brain? attempts to model the FLOPs of the human brain. This is part of their forecasting of when AI will develop to human level capacity (combined with Cotra's report). He does this using multiple methods, which produce generally relatively similar results - as in, not too many orders of magnitude different, generally centered around 10^15 ish. #Forecasting
To my knowledge they are not currently soliciting donations from the general public, as they have a lot of money from Dustin and Cari, so incremental funding is less of a priority than for other organisations. They could be a good place to work however.
SFF ( ) is a donor advised fund, advised by the people who make up BERI’s Board of Directors. SFF was initially funded in 2019 by a grant of approximately $2 million from BERI, which in turn was funded by donations from philanthropist Jaan Tallinn, now also distributing money from Jed McCaleb.
In its grantmaking SFF uses an innovative allocation process to combine the views of many grant evaluators (described ). SSF has published the results of one grantmaking round this year (described ), where they donated around $1.8m, of which I estimate around $1.2m was AI related; the largest donations in the round were to:
I would expect the H2 round, whose results are not yet public, to be at least as large.
80k provides career advice and guidance to people interested in improving the world, with a specific focus on AI safety.
80,000 Hours's collects various jobs that could be valuable for people interested in AI safety. At the time of writing it listed 80 positions, all of which seemed like good options that it would be valuable to have sensible people fill. I suspect most people looking for AI jobs would find some on here they hadn't heard of otherwise, though of course for any given person many will not be appropriate. They also have job boards for other EA causes. #Careers
They also run a very good podcast; readers might be specifically interested in or .
Waymo is (finally) offering a to the general public in Phoenix.
This document is written mainly, but not exclusively, using publicly available information. In the tradition of active management, I hope to synthesise many pieces of individually well known facts into a whole which provides new and useful insight to readers. Advantages of this are that 1) it is relatively unbiased, compared to inside information which invariably favours those you are close to socially and 2) most of it is and verifiable to readers. The disadvantage is that there are probably many pertinent facts that I am not a party to! Wei Dai has written about how much discussion now takes place in private google documents – for example apparently; in most cases I do not have access to these. If you want the inside scoop I am not your guy; all I can supply is exterior scooping.
We focus on papers, rather than outreach or other activities. This is partly because they are much easier to measure; while there has been a large increase in interest in AI safety over the last year, it’s hard to work out who to credit for this, and partly because I think progress has to come by persuading AI researchers, which I think comes through technical outreach and publishing good work, not popular/political work.
Many capital allocators in the bay area seem to operate under a sort of theory of investment, whereby the most important thing is to identify a guy to invest in who is really clever and ‘gets it’. I think there is a lot of merit in this (as argued for example); however, I think I believe in it less than they do. Perhaps as a result of my institutional investment background, I place a lot more weight on historical results. In particular, I worry that this approach leads to over-funding skilled rhetoricians and those the investor/donor is socially connected to. Also, as a practical matter, it is hard for individual donors to fund individual researchers. But as part of a concession to the individual-first view I’ve started asking organisations if anyone significant has joined or left recently, though in practice I think organisations are far more willing to highlight new people joining than old people leaving.
Judging organisations on their historical output is naturally going to favour more mature organisations. A new startup, whose value all lies in the future, will be disadvantaged. However, I think that this is the correct approach for donors who are not tightly connected to the organisations in question. The newer the organisation, the more funding should come from people with close knowledge. As organisations mature, and have more easily verifiable signals of quality, their funding sources can transition to larger pools of less expert money. This is how it works for startups turning into public companies and I think the same model applies here. (I actually think that even those with close personal knowledge should use historical results more, to help overcome their biases.)
This judgement involves analysing a large number of papers relating to Xrisk that were produced during 2020. Hopefully the year-to-year volatility of output is sufficiently low that this is a reasonable metric; I have tried to indicate cases where this doesn’t apply. I also attempted to include papers during December 2019, to take into account the fact that I'm missing the last month's worth of output from 2020, but I can't be sure I did this successfully.
In general I have tried to evaluate and summarise, at least briefly, the work organisations did that is primarily concerned with AI or general Xrisk strategy. But this has been a rather subjective and imperfectly applied criteria that was primarily implemented through my subjective sense of ‘does this seem relevant to the task at hand’.
My impression is that policy on most subjects, especially those that are more technical than emotional is generally made by the government and civil servants in consultation with, and being lobbied by, outside experts and interests. Without expert (e.g. top ML researchers in academia and industry) consensus, no useful policy will be enacted. Pushing directly for policy seems if anything likely to hinder expert consensus. Attempts to directly influence the government to regulate AI research seem very adversarial, and risk being pattern-matched to ignorant technophobic opposition to GM foods or other kinds of progress. We don't want the 'us-vs-them' situation that has occurred with climate change, to happen here. AI researchers who are dismissive of safety law, regarding it as an imposition and encumbrance to be endured or evaded, will probably be harder to convince of the need to voluntarily be extra-safe - especially as the regulations may actually be totally ineffective.
The only case I can think of where scientists are relatively happy about punitive safety regulations, nuclear power, is one where many of those initially concerned were scientists themselves, and also had the effect of basically ending any progress in nuclear power (at great cost to climate change). Given this, I actually think policy outreach to the general population is probably negative in expectation.
If you’re interested in this, I’d recommend you read t from a few years back.
I think there is a strong case to be made that openness in AGI capacity development is bad. As such I do not ascribe any positive value to programs to ‘democratize AI’ or similar.
One interesting question is how to evaluate non-public research. For a lot of safety research, openness is clearly the best strategy. But what about safety research that has, or potentially has, capabilities implications, or other infohazards? In this case it seems best if the researchers do not publish it. However, this leaves funders in a tough position – how can we judge researchers if we cannot read their work? Maybe instead of doing top secret valuable research they are just slacking off. If we donate to people who say “trust me, it’s very important and has to be secret” we risk being taken advantage of by charlatans; but if we refuse to fund, we incentivize people to reveal possible infohazards for the sake of money. (Is it even a good idea to publicise that someone else is doing secret research?)
For similar reasons I prefer research to not be behind paywalls or inside expensive books, but this seems a significantly less important issue.
More prosaically, organisations should make sure to upload the research they have published to their website! Having gone to all the trouble of doing useful research it is a constant shock to me how many organisations don’t take this simple step to significantly increase the reach of their work. Additionally, several times I have come across incorrect information on organisation’s websites.
My basic model for AI safety success is this:
One advantage of this model is that it produces both object-level work and field growth.
There is also some value in arguing for the importance of the field (e.g. Bostrom’s Superintelligence) or addressing criticisms of the field.
Noticeably absent are strategic pieces. I find that a lot of these pieces do not add terribly much incremental value. Additionally, my suspicion is that strategy research is, to a certain extent, produced exogenously by people who are interested / technically involved in the field. This does not apply to technical strategy pieces, about e.g. whether CIRL or Amplification is a more promising approach.
There is somewhat of a paradox with technical vs ‘wordy’ pieces however: as a non-expert, it is much easier for me to understand and evaluate the latter, even though I think the former are much more valuable.
There are many problems that need to be solved before we have safe general AI, one of which is not producing unsafe general AI in the meantime. If nobody was doing non-safety-conscious research there would be little risk or haste to AGI – though we would be missing out on the potential benefits of safe AI.
There are several consequences of this:
One approach is to research things that will make contemporary ML systems safer, because you think AGI will be a natural outgrowth from contemporary ML. This has the advantage of faster feedback loops, but is also more replaceable (as per the previous section).
Another approach is to try to reason directly about the sorts of issues that will arise with superintelligent AI. This work is less likely to be produced exogenously by unaligned researchers, but it requires much more faith in theoretical arguments, unmoored from empirical verification.
Many people want to connect AI existential risk issues to ‘near-term’ issues; I am generally sceptical of this. For example, autonomous cars seem to risk only localised tragedies (though if they were hacked and all crashed simultaneously that would be much worse), and private companies should have good incentives here. Unemployment concerns seem exaggerated to me, as they have been for most of history (new jobs will be created), at least until we have AGI, at which point we have bigger concerns. Similarly, I generally think concerns about algorithmic bias are essentially political - I recommend - though there is at least some connection to the value learning problem there.
Some people argue that work on these near AI issues is worthwhile because it can introduce people to the broader risks around poor AI alignment. However, I think this is a bad idea - not only does it seem somewhat disingenuous, it risks putting off people who recognise that these are bad concerns. For example, rejects the precautionary principle for AI on the basis of rejecting bad arguments about unemployment - had these pseudo-strawman views not been widespread, it would have been harder to reach this unfortunate conclusion.
It’s also the case many of the policies people recommend as a result of these worries are potentially very harmful. A good example is GDPR and similar privacy regulations (including HIPAA) which have made many good things much more difficult - including degrading our ability to track the pandemic.
Some interesting speculation I read is the idea that discussing near AI safety issues might be a sort of immune response to Xrisk concerns by raising FUD. The ability to respond to long-term AI safety concerns with “yes, we agree AI ethics is very importance, and that’s why we’re working on privacy and decolonising AI” seems like a very rhetorically powerful move.
Charities like having financial reserves to provide runway, and guarantee that they will be able to keep the lights on for the immediate future. This could be justified if you thought that charities were expensive to create and destroy, and were worried about this occurring by accident due to the whims of donors. Unlike a company which sells a product, it seems reasonable that charities should be more concerned about this.
Donors prefer charities to not have too much reserves. Firstly, those reserves are cash that could be being spent on outcomes now, by either the specific charity or others. Valuable future activities by charities are supported by future donations; they do not need to be pre-funded. Additionally, having reserves increases the risk of organisations ‘going rogue’, because they are insulated from the need to convince donors of their value.
As such, in general I do not give full credence to charities saying they need more funding because they want much more than a 18 months or so of runway in the bank. If you have a year’s reserves now, after this December you will have that plus whatever you raise now, giving you a margin of safety before raising again next year.
I estimated reserves = (cash and grants) / (2021 budget). In general I think of this as something of a measure of urgency. However despite being prima facie a very simple calculation there are many issues with this data. As such these should be considered suggestive only.
In general I believe that charity-specific donation matching schemes , despite my having provided matching funding for at least one in the past.
Ironically, despite this view being (albeit in 2011), this is essentially of OpenPhil’s policy of, at least in some cases, artificially limiting their funding to 50% or 60% of a charity’s need, which some charities have argued effectively provides a 1:1 match for outside donors. I think this is bad. In the best case this forces outside donors to step in, imposing marketing costs on the charity and research costs on the donors. In the worst case it leaves valuable projects unfunded.
Obviously cause-neutral donation matching is different and should be exploited. Everyone should max out their corporate matching programs if possible, and things like the continue to be great opportunities.
Partly thanks to the efforts of the community, the field of AI safety is considerably more well respected and funded than was previously the case, which has attracted a lot of new researchers. While generally good, one side effect of this (perhaps combined with the fact that many low-hanging fruits of the insight tree have been plucked) is that a considerable amount of low-quality work has been produced. For example, there are a lot of papers which can be accurately summarized as asserting “just use ML to learn ethics”. Furthermore, the conventional peer review system seems to be extremely bad at dealing with this issue.
The standard view here is just to ignore low quality work. This has many advantages, for example 1) it requires little effort, 2) it doesn’t annoy people. This conspiracy of silence seems to be the strategy adopted by most scientific fields, except in extreme cases like anti-vaxers.
However, I think there are some downsides to this strategy. A sufficiently large milieu of low-quality work might degrade the reputation of the field, deterring potentially high-quality contributors. While low-quality contributions might help improve ’ citation count, they may use up scarce funding.
Moreover, it is not clear to me that ‘just ignore it’ really generalizes as a community strategy. Perhaps you, enlightened reader, can judge that “How to solve AI Ethics: Just use RNNs” is not great. But is it really efficient to require everyone to independently work this out? Furthermore, I suspect that the idea that we can all just ignore the weak stuff is somewhat an example of typical mind fallacy. Several times I have come across people I respect according respect to work I found clearly pointless. And several times I have come across people I respect arguing persuasively that work I had previously respected was very bad – but I only learnt they believed this by chance! So I think it is quite possible that many people will waste a lot of time as a result of this strategy, especially if they don’t happen to move in the right social circles.
Having said all that, I am not a fan of unilateral action, and am somewhat selfishly conflict-averse, so will largely continue to abide by this non-aggression convention. My only deviation here is to make it explicit. If you’re interested in this you might enjoy by 80,000 Hours.
Much of the AI and EA communities, and especially the EA community concerned with AI, is located in the Bay Area, especially Berkeley and San Francisco. It does have advantages - like proximity to good CS universities - but it is an extremely expensive place, and is dysfunctional both politically and socially. Aside from the lack of electricity and aggressive homelessness, it seems to attract people who are extremely weird in socially undesirable ways – and induces this in those who move there - though to be fair the people who are doing useful work in AI organisations seem to be drawn from a better distribution than the broader community. In general I think the centralization is bad, but if there must be centralization I would prefer it be almost anywhere other than Berkeley. Additionally, I think many funders are geographically myopic, and biased towards funding things in the Bay Area. As such, I have a mild preference towards funding non-Bay-Area projects.
The size of the field continues to grow, both in terms of funding and researchers. Both make it increasingly hard for individual donors. I’ve attempted to subjectively weigh the productivity of the different organisations against the resources they used to generate that output, and donate accordingly.
My constant wish is to promote a lively intellect and independent decision-making among readers; hopefully my laying out the facts as I see them above will prove helpful to some readers. Here is my eventual decision, so you can do come to your own conclusions first (which I strongly recommend):
Svanyyl, V pbagvahr gb yvxr gur YGSS. V’z n yvggyr pbaprearq nobhg hcpbzvat cbffvoyr crefbaary punatrf jura gurl fcva bhg bs PRN, naq jbhyq cersre vs gurl qvqa’g tenag gb betnavfngvbaf ynetr rabhtu gb eha gurve bja shaqenvfvat pnzcnvtaf (naq urapr pna or rinyhngrq ol vaqvivqhny qbabef). Ohg birenyy V guvax vg vf irel nggenpgvir gb shaq fznyy cebwrpgf, naq V nz abg njner bs nal bgure nirahr sbe fznyy qbabef gb genpgnoyl qb guvf. Fb V jvyy or qbangvat gb gurz ntnva guvf lrne.
However, I wish to emphasize that all the above organisations seem to be doing good work on the most important issue facing mankind. It is the nature of making decisions under scarcity that we must prioritize some over others, and I hope that all organisations will understand that this necessarily involves negative comparisons at times.
Thanks for reading this far; hopefully you found it useful. Apologies to everyone who did valuable work that I excluded!
If you found this post helpful, and especially if it helped inform your donations, please consider letting me and any organisations you donate to as a result know.
If you are interested in helping out with next year’s article, please get in touch, and perhaps we can work something out.
I have not in general checked all the proofs in these papers, and similarly trust that researchers have honestly reported the results of their simulations.
I was a Summer Fellow at MIRI back when it was SIAI and volunteered briefly at GWWC (part of CEA). My wife has done some contract work for OpenPhil. I have no financial ties beyond being a donor and have never been romantically involved with anyone else who has ever worked at any of the other organisations.
I shared drafts of the individual organisation sections with representatives from LTFF, FHI, MIRI, CHAI, GCRI, CSER, Ought, AI Impacts, BERI, CLR, GPI, OpenPhil, Convergence.
My eternal gratitude to my anonymous reviewers for their invaluable help, and especially Jess Riedel for the volume and insight of his comments. Any remaining mistakes are of course my own. I would also like to thank my wife and daughter for tolerating all the time I have spent/invested/wasted on this. Negative thanks goes to The Wuhan Institute of Virology and .
This is a list of all the articles cited who with their own individual paragraph. It does not include articles that are only referenced in-line, typically with the word ‘here’.
Aird, Michael - Existential risks are not just about humanity - 2020-04-27 - Michael - Failures in technology forecasting? A reply to Ord and Yudkowsky - 2020-05-08 - Michael; Shovelain, Justin - Using vector fields to visualise preferences and make them consistent - 2020-01-28 - Michael; Shovelain, Justin; Kristoffersson, David - Memetic downside risks: How ideas can evolve and cause harm - 2020-02-25 - Anthony - Why those who care about catastrophic and existential risk should care about autonomous weapons - 2020-11-11 - Seth - Quantifying the Probability of Existential Catastrophe: A Reply to Beard et al. - 2020-08-10 - Simon; Kaxzmarek, Patrick - On the Wrongness of Human Extinction - 2020-02-21 - Simon; Rowe, Thomas; Fox, James - An Analysis and Evaluation of Methods Currently Used to Quantify the Likelihood of Existential Hazards - 2019-12-03 - Haydn - Activism by the AI Community: Analysing Recent Achievements and Future Prospects - 2020-02-26 - Haydn; Hernández-Orallo, José; hÉigeartaigh, Seán Ó; Maas, Matthijs M.; Hagerty, Alexa; Whittlestone, Jess - Response to the European Commission’s consultation on AI - 2020-02-19 - Andreea; Scobee, Dexter R.R.; Fisac, Jaime F.; Sastry, S. Shankar; Dragan, Anca D. - LESS is More: Rethinking Probabilistic Models of Human Behavior - 2020-01-13 - Nick; Belfield, Haydn; Hilton, Sam - Written Evidence to the UK Parliament Science & Technology Committee's Inquiry on A new UK research funding agency. - 2020-09-16 - Jungwon, Stuhlmuller, Andreas - Automating reasoning about the future at Ought - 2020-11-09 - Joseph - How Much Computational Power Does It Take to Match the Human Brain? - 2020-09-11 - Peter; Maas, Matthijs M.; Kemp, Luke - Should Artificial Intelligence Governance be Centralised? Design Lessons from History - 2020-01-10 - Michael; Hutter, Marcus - Curiosity Killed the Cat and the Asymptotically Optimal Agent - 2020-06-05 - Owen; Daniel, Max; Sandberg, Anders; - Defence in Depth Against Human Extinction: Prevention, Response, Resilience, and Why They All Matter - 2020-01-24 - Carla; Whittlestone, Jess - Canaries in Technology Mines: Warning Signs of Transformative Progress in AI - 2020-09-24 - Andrew - Some AI research areas and their relevance to existential safety - 2020-11-18 - Jeffrey; Dafoe, Allan - The Logic of Strategic Assets: From Oil to AI - 2020-01-09 - Seán Ó; Whittlestone, Jess; Liu, Yang; Zeng, Yi; Liu, Zhe - Overcoming Barriers to Cross-cultural Cooperation in AI Ethics and Governance - 2020-05-15 - Jose; Martinez-Plumed, Fernando; Avin, Shahar; Whittlestone, Jess; hÉigeartaigh, Seán Ó - AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues - 2020-08-10 - Evan - An overview of 11 proposals for building safe advanced AI - 2020-05-29 - Andrew; Kania, Elsa; Laskai, Lorand - The Question of Comparative Advantage in Artificial Intelligence: Enduring Strengths and Emerging Challenges for the United States - 2020-01-15 - Vojtěch ; Carey, Ryan - (When) Is Truth-telling Favored in AI Debate? - 2019-12-15 - Victoria - Possible takeaways from the coronavirus pandemic for slow AI takeoff - 2020-05-31 - Will - Are we living at the hinge of history? - 2020-09-01 - Andreas - Moral demands and the far future - 2020-06-01 - Andreas; Thorstad, David - Tough enough? Robust satisficing as a decision norm for long-term policy analysis - 2020-11-01 - Cullen; Cihon, Peter; Garfinkel, Ben; Flynn, Carrick; Leung, Jade; Dafoe,Allan - The Windfall Clause: Distributing the Benefits of AI for the Common Good - 2020-01-30 - John; Nelson, Cassidy - Assessing the Risks Posed by the Convergence of Artificial Intelligence and Biotechnology - 2020-06-17 - Cullen - How will National Security Considerations affect Antitrust Decisions in AI? An Examination of Historical Precedents - 2020-07-28 - Carina; Whittlestone, Jess - Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society - 2020-01-13 - William; Rachbach, Ben; Evans, Owain; Byun, Jungwon; Stuhlmüller, and Andreas - Evaluating Arguments One Step at a Time - 2020-01-11 - Toby; Dafoe, Allan - The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? - 2020-12-27 - Andrew; Sandberg, Anders; Drexler, Eric; Bonsall, Michael - The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare - 2020-11-19 - David; Mogensen, Andreas - Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making - 2020-06-01 - Asaf; Whittlestone, Jess; Sundaram, Lalitha; , Seán Ó hÉigeartaigh - Artificial intelligence in a crisis needs ethics with urgency - 2020-12-02 -
Source: Lesswrong.com
Powered by NewsAPI.org
The graphics in moto x3m are vibrant and well-designed, enhancing the gameplay experience.