Summary of my views on AI risk

Many effective altruists believe that efforts to shape artificial general intelligence (AGI) – in particular, solving the alignment problemmay be a top priority. Part of the reasoning behind this is that sudden progress in AI capabilities could happen soon, and might lead to a decisive strategic advantage for a single AI system. This could mean that the evolution of values reaches a steady state in the near future – the universe would be shaped according to the values of that AI. This, in turn, offers exceptional leverage to shape the far future by influencing how that AI is built. 

I’m sceptical about this narrative, for the following reasons: 

  • I think that the probability that advanced AI will be developed very soon, i.e. in the next 10 or 20 years, is very small. (See Thoughts on short timelines and What does the stock market tell us about AI timelines?)
  • The case for focusing on shaping AI is, or at least was1, often based on the (implicit or explicit) assumption that advanced AI will come in the form of one or several superintelligent AI agents that are built at a crisp point in time. I don’t think this is what the transition to advanced AI will most likely look like. The following are plausible alternative scenarios:
    • Eric Drexler argues that comprehensive AI services will collectively form artificial general intelligence. This is closer to present-day trends in computing – using software as a tool or service – than the idea of a single, superintelligent AI. (See also Brian Tomasik’s Artificial general intelligence as a continuation of existing software and societal trends.)
    • The transition to advanced AI may be a distributed and gradual process of accumulating many small innovations, comparable to e.g. the internet or industrialisation. (See here for more on this.)
    • Whole brain emulations might become feasible before de novo AGI. (See e.g. Robin Hanson’s The Age of Em.)
    • AI could be used to augment human intelligence, not as standalone systems – “augmented intelligence” instead of artificial intelligence. 
    • Humans and AI systems could merge over time as more advanced brain-computer interfaces are developed.
  • My default future scenario is a gradual increase in the collective capabilities of our civilisation – which can be seen as a form of “intelligence explosion” – rather than a sudden jump in the capabilities of a single actor. (Cf. Magnus Vinding’s Reflections on intelligence.) 
    • I don’t think that AI will attain superhuman performance across all or most (relevant) domains at the same time – instead, it will become superhuman in one domain after another. That is, the distribution of capabilities over different domains will differ significantly between biological and artificial intelligence, at least during the transition period. 2
    • In this view, it is less clear whether it makes sense to talk about AGI as something that’s built at a certain point in time. (See A framework for thinking about AI timescales.)

AI risk in different scenarios

Different views on what the future of AI will look like give rise to different safety concerns. Given my own views, it seems misguided to worry that a single uncontrolled AI system will take over the world. That doesn’t mean that there will be no serious issues – for instance, escalating conflicts between “human-AI systems” may lead to large amounts of disvalue.

Even if AGI will come in the form of one or more unified, autonomous agents, I think society will already change dramatically due to advances in narrow AI before we get to AGI. I would agree that if you magically dropped an AI system with superhuman capabilities into our world, then it would likely find a way to obtain a decisive strategic advantage – but I don’t think transition to advanced AI will look like this. It’s a mistake to imagine an AGI against the backdrop of our contemporary society. 

It is also not clear at all whether the development of artificial general intelligence will quickly result in a steady state (the “end of history”, as the leading AI achieves a decisive strategic advantage). While that is a possible scenario, it seems more likely to me that no single actor will be able to seize all power, which means that there will still be many actors with differing goals for at least some time after the transition to advanced AI. Even if civilisation eventually reaches a steady state with a single, unified AI, it would be shaped by actors in the earlier multipolar world, which may persist for a long time.

If advanced AI does not quickly result in a steady state, then it’s not clear if influencing its development is a good lever to affect the far future. This could be true even if the technology has a large impact on society; similar to how e.g. electricity had a large impact but “shaping the development of electricity” arguably was not a top priority for past altruists.

Given all that, I think it’s not clear whether altruists should focus on (directly) shaping AI. It’s a bit of a reach to think one can accurately predict technological developments many decades (or even centuries) down the line; it’s even more of a reach to think one can anticipate and make progress on problems arising from these future technologies. On the other hand, influencing the future is difficult in general, and shaping the development of advanced AI could be high-leverage even if it is gradual and distributed. (For more on this, see Should altruists focus on artificial intelligence?.) 

If (directly) shaping advanced AI is a priority, then it’s still unclear whether alignment with human values is the most important aspect to focus on. The main reason for this is that there will be strong pressures to solve alignment (since everybody wants AI systems that are aligned to their values), so I expect that AI systems will, by and large, be aligned with the values of their operators, at least in a narrow sense. (See Why I expect successful (narrow) alignment.) 

Also, alignment may not be sufficient, and possibly not even robustly positive, in terms of reducing future suffering, which is why I’m mostly interested in implementing worst-case safety measures that are aimed at preventing s-risks from AI. (See An introduction to worst-case AI safety and Focus areas of worst-case AI safety.)

A broader perspective

Advanced AI constitutes just one example of a new growth mode, i.e. a significant acceleration of economic and technological progress, comparable to industrialisation. A plausible mechanism for how this would come about is a far greater degree of editability of the minds that drive technological innovation. Human minds can also adapt to some degree (see e.g. Flynn effect), but the available mechanisms are limited in scope. If future minds can be edited more liberally, they can be optimised for productivity, which could lead to much higher growth rates (and arguably also faster change in general). 

That could happen if powerful AI systems play a larger role in technological progress, as software seems far more malleable than human brains. But this is only one example – greater editability could also become possible through strong forms of biological enhancement (e.g. iterated embryo selection) or through whole brain emulations, assuming that emulations are easier to edit than biological brains. (However, I don’t think these technologies are around the corner either.)

From this perspective, the key question is how to design such minds to ensure good outcomes (in terms of preventing both incidental and agential s-risks). In a certain sense, this is a generalization of AI safety, which considers the question of how to design safe AI systems.

 

Footnotes

  1. My impression is that many people’s views have shifted on this over time towards higher credence in a gradual, distributed transition to advanced AI.
  2. To be clear, I’m not saying that humans will remain better at certain domains forever, just that some will take longer than others – e.g. because they are hard to represent formally, or because not much cleaned up data is available.

One comment

  1. Hey Tobias, great post.

    I am sympathetic to your views on how AI as a general purpose technology is more likely to develop over time. Since there is no free lunch in our complex society we should more likely expect a development with different hardware and software architectures outperforming humans in a wide range of specific tasks (with maybe a few architecture achieving some level of flexibility such as some deep neural architectures we see today). This still makes the narrow alignment potentially relevant because narrow systems could have sufficient agency and/or eventually still pose X risks (example: even though not a X risks a dumb self replicating virus can already cause vast damage). Or some architectures might achieve a level generality significantly superior to any actors in the economy sizing up more and more of it, you seems to disagree about that point but what probability would you give toward the later scenario not happening and narrow alignment not posing X risks?

    Yes there are good reasons to believe that incentives are there to solve the narrow alignment problems but I still find it hard to believe we could be 95% certain it would be a less than 1% chance of X risk from misaligned AI, which given the stakes would be terribly bad in a moral uncertainty framework (containing not only NU but sympatetical to other values systems)

    But in the end I guess it might come down more to do you think X risks are more important than S risks (if there is such a tradeoff which seems plausible)

    What do you think?

    Cheers =)

Comments are closed.