Thoughts on short timelines

Some rationalists and effective altruists have argued (1, 2, 3) that there is a non-negligible chance that artificial intelligence will attain human or super-human levels of general intelligence very soon.

In this post, I’d like to outline why I’m not convinced that this scenario has non-negligible probability. To clarify, I’m arguing against the hypothesis that “artificial general intelligence (AGI) is 10% likely to be built in the next 10 years”, where AGI is defined as the ability to successfully perform any intellectual task that a human is capable of. (My favoured definition of “AGI” is that autonomous intelligent machines contribute at least 50% to the global economy, as outlined here, but I don’t think the precise definition matters much for purposes of this post.)

The simplest counterargument is to look at the rate of progress we’re seeing so far and extrapolate from that. Have there been any ground-breaking results over the last years? I’m not talking about “normal” results of machine learning papers; I’m talking about milestones that constitute serious progress towards general intelligence. We are surely seeing progress in the former sense – I don’t mean to belittle the efforts of machine learning researchers. (An example of what that I’d consider “ground-breaking” is advanced transfer between different domains, e.g. playing many board or video games well after training on a single game.)

Some people considered AlphaGo (and later AlphaZero) ground-breaking in this sense. But this (the match against Lee Sedol) was in March 2016, so it’s already more than 2 years ago at the time of this writing (late 2018) – and it seems that there haven’t been comparable breakthroughs since then. (In my opinion, AlphaGo wasn’t that exceptional anyway – but that’s a topic for another post.)

Conditional on short timelines, I’d expect to observe ground-breaking progress all the time. So that seems to be evidence that this scenario is not materializing. In other words, it seems clear to me that the current rate of progress is not sufficient for AGI in 10 years. (See also Robin Hanson’s AI progress estimate.)

That said, we should distinguish between a) the belief that current rate of progress will lead to AGI within 10 years, and b) the belief that there will be significant acceleration at some point, which will enable AGI within 10 years. One could reject a) and still expect a scenario where AGI arrives within 10 years, but for some reason we won’t see impressive results until very near ‘the end’. In that case the lack of ground-breaking progress we see now isn’t (strong) evidence.

But why expect that? There’s an argument that progress will become discontinuous as soon as recursive self-improvement becomes possible. But we are talking about progress from the status quo to AGI, so that doesn’t apply: it seems implausible that artificial intelligences would vastly accelerate progress before they are highly intelligent themselves. (I’m not fully sold on that argument either, but that’s another story for another time.)

Given that significant resources have been invested in AI / ML for quite a while, it seems that discontinuous progress – on the path to AGI, not during or after the transition – would be at odds with usual patterns of technological progress. The reference class I’m thinking of is “improvement of a gradual attribute (like intelligence) of a technology over time, if significant resources are invested”. Examples that come to mind are the maximal speed of cars, which increased steadily over time, or perhaps computing power and memory space, which also progresses very smoothly.

(See also AI Impact’s discontinuous progress investigation. They actually consider new land speed records set by jet-propelled vehicles one of the few cases of (moderate) discontinuities that they’ve found so far. To me that doesn’t feel analogous in terms of the necessary magnitude of the discontinuity, though.)

The point is even stronger if “intelligence” is actually a collection of many distinct skills and abilities rather than a meaningful, unified property (in the context of machine intelligence). In that case it requires progress on many fronts, comparable to the “overall quality” of cars or computer hardware.

It’s possible that progress accelerates simply due to increased interest – and therefore increased funding and other resources – as more people recognise its potential. Indeed, while historical progress in AI was fairly smooth, there may have been some acceleration over the last decade, plausibly due to increased interest. So perhaps that could happen to an even larger degree in the future?

There is, however, already significant excitement (perhaps hype) around AI, so it seems unlikely to me that this could increase the rate of progress by orders of magnitude. In particular, if highly talented researchers are the main bottleneck, you can’t scale up the field by simply pouring more money into it. Plus, it has been argued that the next AI winter is well on its way, i.e. we actually start to see a decline, not a further increase, of interest in AI.

One of the most common reasons to nevertheless assign a non-negligible probability – say, 10% – is simply that we’re so clueless about what will happen in the future that we shouldn’t be confident either way, and should thus favor a broad distribution over timelines.

But are we actually that ignorant? It is indeed extremely hard, if not impossible, to predict the specific results of complex processes over long timespans – like, which memes and hashtags will be trending on Twitter in May 2038. However, the plausibility or implausibility of short timelines is not a question of this type since the development of AGI would be the result of a broad trend, not a specific result. We have reasonably strong forms of evidence at our disposal: we can look at historical and current rates of progress in AI, we can consider general patterns of innovation and technological progress, and we can estimate how hard general intelligence is (e.g. whether it’s an aggregation of many smart heuristics vs. a single insight).

Also, what kind of probability should an ignorant prior assign to AGI in 10 years? 10%? But then wouldn’t you assign 10% to advanced nanotechnology in 10 years because of ignorance? What about nuclear risk – we’re clueless about that too, so maybe 10% chance of a major nuclear catastrophe in the next 10 years? 10% on a complete breakdown of the global financial system? But if you keep doing that with more and more things, you’ll end up with near certainty of something crazy happening in the next 10 years, which seems wrong given historical base rates. So perhaps an ignorant prior should actually place much lower probability on each individual event.

But perhaps one’s own opinion shouldn’t count for much anyway, and we should instead defer to some set of experts? Unfortunately, interpreting expert opinion is tricky. On the one hand, in some surveys machine learning researchers put non-negligible probability on “human-level intelligence” (whatever that means) in 10 years. On the other hand, my impression from interacting with the community is that the predominant opinion is still to confidently dismiss a short timeline scenario, to the point of not even seriously engaging with it.

Alternatively, one could look at the opinions of smart people in the effective altruism community (“EA experts”), who tend to assign a non-negligible probability to short timelines. But this (vaguely defined) set of people is subject to a self-selection bias – if you think AGI is likely to happen soon, you’re much more likely to spend years thinking and talking about that – and has little external validation of their “expert” status.

A less obvious source of “expert opinion” are the financial markets – because market participants have a strong incentive to get things right – and their implicit opinion is to confidently dismiss the possibility of short timelines.

In any case, it’s not surprising if some people have wrong beliefs about this kind of question. Lots of people are wrong about lots of things. It’s not unusual that communities (like EA or the machine learning community) have idiosyncratic biases or suffer from groupthink. The question is whether more people buy into short timelines compared to what you’d expect conditional on short timelines being wrong (in which case some people will still buy into it, comparable to past AI hypes).

Similarly, do we see fewer or more people buy into short timelines compared to what you’d expect if short timelines are right (in which case there will surely be a few stubborn professors who won’t believe it until the very end)?

I think the answer to the second question is “fewer”. Perhaps the answer to the first question is “somewhat more” but I think that’s less clear.

All things considered, I think the probability of a short timeline scenario (i.e. AGI within 10 years) is not more than 1-2%. What am I missing?

6 comments

  1. I think you are missing that AI related performance and hardware growth has a discontinuity moment around 2012 and is growing now in many important metrics with speed 10 times a year (or doubling time of 3.5 months, see e.g. “OpenAI’s “AI and compute”). I collected other evidence for 10 per cent in 10 years here: https://philpapers.org/rec/TURPOT-5

  2. I am someone who probably has shorter timelines than most. I consider the likelihood of artificial general intelligence coming into existence to be somewhat likely within 10 years. Although, to be fair, my estimates change frequently based on my current thought patterns and evidence that I happen to consider at the moment.

    To be more specific, I define artificial general intelligence as an artificial system that can perform at roughly human level or better across nearly all precisely specified cognitive domains. I believe that the chances of a single system arising with these properties is at least 20% within 10 years (by 2030), and I rate the chances of a system arising such as the one I have specified at at least 40% within 20 years (by 2040).

    The reason why my estimates are high is due to a multitude of factors. Here are some theses which help inform my estimates.

    1. I don’t think that creating intelligence requires a lot of special insights. I think that generally intelligent systems are generally intelligent because are good at a diverse range of cognitive tasks and can coordinate and structure the performance of these tasks in a way that is efficient.

    2. Generally, I think that intelligence is something that scales with computation. It isn’t necessarily a linear relationship. Putting in twice as much compute isn’t going to make the system twice as intelligent. However, I think roughly speaking, the path toward artificial general intelligence will be driven primarily by something like the creation of faster TPUs, increases in funding for training ML systems, and in turn, larger neural networks.

    3.In my opinion, almost all performance gains in artificial intelligence have occurred within the last 10 years.

    I believe that thesis 1 is true because it seems intuitively obvious that natural selection could not have created generally intelligent agents unless intelligence was something that could be built incrementally. If the human brain was constructed incrementally, then it seems clear that there is no special sauce that suddenly made everything work together. Rather, it was the gradual accumulation of modules, and specific parts which completed various tasks that aided survival and reproduction, that eventually culminated in an intelligent system.

    Thesis 2 is supported by both current evidence from ML systems. If you think that intelligence is just the culmination of being good at a diverse set of tasks, then you must also agree that ML systems as a whole have each been getting more intelligent over time. If you look at any individual task, we know that throwing more compute at the problem has helped the field immensely. In a sense, computation has been more useful than algorithmic insight: most of the key insights into ML were discovered decades ago, yet the performance gains on tasks have only been realized as fast parallelized training has become commonplace.

    Thesis 3 is true because although computers were good at a lot of tasks before the year 2010, the ones that seem relevant to intelligence — cognition, motor control, and general reasoning — have all been realized in the most recent decade. If we combine this thesis with 1 and 2, I think there’s a clear case to be made that as the field progresses, more and more tasks will reach and exceed human baselines, until all of them are exceeded. Since almost all the progress has been recent, it seems to me like a reasonable extrapolation to say that AI is soon.

    To put this all together, I think that human level performance on any single dimension of measurement will probably be exceeded relatively soon by computers, perhaps even within 10 years. If the power of intelligence comes from efficiently coordinating cognitive specialized modules, then we should expect that any system good at coordinating a wide variety of cognitive applications to satisfy my conditions for an artificial general intelligence.

    Since this debate is over something still yet far away, I find it useful to bet on nearer term things which test our models of how we expect the future to pan out. If you disagree with my analysis, then I urge you to state your disagreement on one of the following near term predictions. If you disagree, we could bet on a single narrowly defined prediction, to put our ideas to the test. I think both of the following statements can be made into a fairly precise bet.

    * ML systems will gradually pass every narrowly defined task benchmark that we can precisely define, and many of the most impressive ones will happen within a decade. By “many of the most impressive” I include benchmarks on language processing tasks, like predicting the next most plausible sentence in a text, or summarizing text, and generating new text which cannot be reliably distinguished from a human journalist in news media. I also include being able to use mathematical reasoning, for example being able to do high school mathematics (see for instance https://arxiv.org/abs/1904.01557), and ability to control a robotic hand at least as well as a small animal (see https://openai.com/blog/learning-dexterity/).

    * If hardware costs stop going down, our ability to speed up parallelization goes down, and the amount of money going into ML research remains constant, then I expect progress on ML benchmarks to dramatically halt. This is a prediction which is a direct consequence of thesis 2. However, I don’t expect at least one of the three conditions to occur, which is why I *don’t* expect our current progress to slow down dramatically.

    1. Interesting comments, Matthew. 🙂

      My biggest disagreement might be with this part of your case: “If the power of intelligence comes from efficiently coordinating cognitive specialized modules, then we should expect that any system good at coordinating a wide variety of cognitive applications to satisfy my conditions for an artificial general intelligence.” I actually agree with that statement, but I think having a system that can effectively coordinate domain-specific applications is plausibly the most difficult part of AGI. That’s the thing that makes AGI hard. 🙂 Neither of your proposed bets addresses this source of disagreement. (I don’t necessarily disagree with your predictions about those near-term bets.)

      As a secondary point, I disagree with thesis 3. If you calculate progress in terms of % improvement, my sense is that progress has been pretty good for many decades. Many of the graphs in https://srconstantin.wordpress.com/2017/01/28/performance-trends-in-ai/ only show the last decade, but those showing a longer time horizon tend to reveal a pretty linear improvement trend even before 2009.

      > the ones that seem relevant to intelligence — cognition, motor control, and general reasoning — have all been realized in the most recent decade

      I’m not sure what you mean. 🙂 Various degrees of cognition, motor control, and general reasoning have existed since the 1960s or earlier, and I’d conjecture that the performance trends were fairly linear over time, or at least not strongly discontinuous recently, though we’d have to discuss specific examples and metrics to debate that.

      1. Hi Brian.

        >I actually agree with that statement, but I think having a system that can effectively coordinate domain-specific applications is plausibly the most difficult part of AGI. That’s the thing that makes AGI hard.

        That makes sense. I had a narrower definition of AGI in mind than the one I think people (probably) assume and perhaps the one I wrote above. I think implicit anthropomorphization of AGI tends to create an image of a point in time where a single system suddenly passes a universality threshold, but I don’t predict that this will happen.

        That’s why my definition was mainly about a system that performed at least par human level on every precisely specified domain. That way, we can point to a specific domain and say “Computers still can’t do X, so it’s not an AGI.” This definition has the advantage of being relatively precise, since there are plenty of AI benchmarks to choose from.

        However, to the extent that cohesive coordination of tasks *is* a precisely specified domain, then this weakens my point. So, point taken.

        I could imagine devising a test for coherent coordination of tasks. For example, there’s the proposed test of whether an AI system could sit in on college classes and obtain a degree. In practice, I assume tests of that nature have such a large time cost and produce little actionable feedback, which makes their use fairly limited. So, I don’t think we’ll get AI systems like this soon, even if it was possible in principle at our current level of technology.

        A more recent and relevant example of something which could qualify (as a very weak test) would be the ongoing MineRL competition. I’d be interested if you have any pessimistic predictions of whether people will be able to produce RL systems which can play the game. Info here: http://minerl.io/competition/

        I also think that the Turing test is a pretty good example of an AGI test, since natural conversation can be quite complex. When properly specified, it allows the experimenter to test the AI on a variety of domains. For instance, one could ask the AI to write pseudocode specifying a particular subroutine, to prove that it’s really a human (who can code). On that front, I do expect there to be significant progress within the next decade. I am not saying that my median prediction is that the Turing test will be passed by 2030, but it does seem a bit likely that Kurzweil will win his bet (http://longbets.org/1/).

        I suppose my main reply to this point is just that I would prefer that this debate be as specific as possible. I don’t know what would prove me right, but I have a good idea of what would prove me wrong. Point to a specific measurable task (which has an economic incentive to automate) that AI would not be able to do any time soon (and preferably not something like “take over the universe”). Then we can bet on it and wait it out.

        >Many of the graphs in https://srconstantin.wordpress.com/2017/01/28/performance-trends-in-ai/ only show the last decade, but those showing a longer time horizon tend to reveal a pretty linear improvement trend even before 2009. […] Various degrees of cognition, motor control, and general reasoning have existed since the 1960s or earlier, and I’d conjecture that the performance trends were fairly linear over time

        I can see why my thesis can appear dubious. I wasn’t very specific about what I qualified as general reasoning, cognition or motor control. There’s a very broad sense in what those things could refer to which may lend doubt on my thesis.

        I think if you look at the type of things which the human brain is good at, we tend to think about heuristics and pattern recognition, predictive world models and resilient motor control. These things are what I point to when I am referring to intelligence.

        Things like Chess, Go, and Shogi really aren’t good examples of general reasoning, because the way that computers play them really is different than the things which our brains are good at. Maybe, you could say, this is just a motte-and-bailey where I say that computers weren’t intelligent before 2010 but whenever you point to an example of intelligence before 2010 I just say, “yeah but that’s not fuzzy/deep intelligence.” But I have a strong intuition that there really is a difference.

        Take this quote from the article on performance trends,

        “Deep learning provides discontinuous jumps relative to previous machine learning or AI performance trendlines in image recognition and speech recognition; it doesn’t in strategy games or natural language processing, and machine translation and arcade games are ambiguous (machine translation because metrics differ; arcade games because there is no pre-deep-learning comparison.)”

        I would argue that this is only true if the strategy games you look at are Chess, Go, Shogi etc. My understanding was that if you tried to build a Starcraft bot pre-deep learning, it wouldn’t really work at all. Sure, it could beat a beginner and technically follow the rules, but as far as beating someone who knows the basic strategy — not a chance. I will happily say that I am wrong if you provide examples.

        The same, I understand, is true about natural language understanding. You could have some markov model that produced kinda sort-of superficially coherent text, but GPT-2 and the transformer architecture really does seem to be a discontinuous jump in capabilities.

        My thoughts on why some of the other stuff seems linear: I can’t respond to everything, but a lot of times these statistics are misleading because a linear trend in a chart could correspond to a discontinuous effective increase in capabilities, due to the fact that an almost working system is not nearly as good as a working system. This seems to be at least roughly the case in machine translation, and is probably the case in robotics too.

        I should say that I’m not very confident about any of the things I said, so take them with a grain of salt. Thanks for taking the time to respond. 🙂

        1. Thanks for the clarifications. 🙂

          > Point to a specific measurable task (which has an economic incentive to automate) that AI would not be able to do any time soon (and preferably not something like “take over the universe”).

          One of the classic examples, which I think is a good example, is being a corporate executive doing management, strategic planning, meetings with various stakeholders, etc. Many specific tasks within this job can be automated (and many already have been), but I think having an AI do the entire job as well as a human is AGI-complete (to the extent AGI is a meaningful concept; I agree with your view that there’s no specific cutoff point).

          > MineRL competition

          Interesting. 🙂 I only read a super brief description and therefore can’t really comment, but I do think sample efficiency is a key metric. A few months ago I was asked what would be evidence for shorter timelines, and I said that having, say, an Atari-playing AI that can learn to play in roughly the same number of tries as it takes a human would be extremely impressive. I could be wrong, but my impression is that DeepMind’s RL agents usually brute-force their way to victory by trying tons of stuff until they rather blindly find what works.

          > heuristics and pattern recognition

          In some sense, all supervised machine learning could be said to be heuristics and pattern recognition, such as image classification or speech recognition, which have existed for many decades…

          > My understanding was that if you tried to build a Starcraft bot pre-deep learning, it wouldn’t really work at all.

          I don’t know enough about Starcraft or pre-deep-learning bots to comment.

          > GPT-2 and the transformer architecture really does seem to be a discontinuous jump in capabilities.

          This is again not something I know much about, but I would bet that if someone created a performance graph for language models, it would be reasonably continuous over time. In “[Dissecting the Controversy around OpenAI’s New Language Model](https://www.youtube.com/watch?v=LWDbAoPyQAk&t=18m14s)” Anima Anandkumar says that in comparison to GPT-2, “the earlier models[…] are nearly as good if not as good.” Of course, maybe she’s referring to earlier Transformer models, and maybe other people disagree with that assessment.

          > This seems to be at least roughly the case in machine translation

          How so? People have been using automated translations for decades when they’ve been particularly needed. As the quality increases, people use them more and more often. Human translators are still used for particularly high-value translations, but as machines improve, the number of cases where humans are used should decrease in a continuous fashion. For example, if machine translation were almost comparable to human translation, I would be less inclined to bother putting translations of articles on my website. Different people will make this decision to abandon human translators at different points depending on costs and benefits.

          > and is probably the case in robotics too.

          That’s again out of my expertise, but I tend to assume that there are many different thresholds for performance, and each step adds value. For example, I’ve heard that in Amazon fulfillment centers, robots already transport items around but aren’t yet capable enough to manipulate items the way humans can with their hands. Even with manipulating objects, there are many degrees of dexterity required (unpacking a box, tying a shoe, doing brain surgery, etc).

          1. Coming back to this thread 7 months later, I think I said a lot of things that were right, but I also said a lot that was wrong or at least misleading.

            Compared to 7 months ago, I’m much more willing to take long-term trends at face value, and to try not to emphasize specific achievements or impressive demonstrations. Part of the reason for my mistake was that I am still new to the field, and so I wan’t aware of the multitude of AI benchmarks that all show fairly continuous progress over the years. I also think there’s a bias in assigning undue weight to material that I had just heard about, rather than taking a more holistic perspective.

            A primary way my thoughts have shifted is that I prefer the economic framing of AI takeoff a lot more now. If AI becomes meaningful, this will probably show up in economic growth rates. By contrast, if we used the definition of AGI I gave above, which didn’t mention anything about economic growth rates, we could be left with systems that can perform well on a wide variety of narrow benchmarks, but are still not general enough to start automating labor. Thus we would be in a “post-AGI” world, but this isn’t really important for altruistic purposes unless the AGIs are causing a lot of material changes to our world.

            Part of what motivates my shift to longer timelines is by reading Robin Hanson’s writings. I have a lot of little critiques about the methods he uses to determine timelines, and I still think he gets a lot wrong. But on some level, the argument just “makes sense.” As argued by Tobias above, conditional on short timelines, you’d expect DeepMind and OpenAI to be making groundbreaking discoveries. In other words, if we were in a short-timelines world it would be obvious. It’s not obvious, and there’s still a lot of debate, which to me indicates that we should probably be highly skeptical.

            I do however still take short timelines very seriously, and I think a decent case can still be made. Specifically, I think there is a moderately strong argument from machine learning model size and complexity. Historically most machine learning models tended to be small, and it’s only been quite recently that models with billions of parameters — which are still so tiny that they can fit on a single GPU — have been created. Now we are reaching the point at which machine learning models are experiencing an explosion of complexity. Microsoft recently released a blog post about a new library for model parallelism, and they suggested that with a large GPU cluster, we could train models with trillions of parameters. My own personal calculations indicate that training a model with hundreds of trillions of parameters is within the reach of a project with several billions of dollars in funding (though my calculations are very rough).

            If a model was trained with two hundred trillion parameters, this would rival the complexity of a human brain by synapse count. Obviously, the comparison between synapses and parameters is extremely weak, but this argument by itself doesn’t lend support to long timelines: the human brain could be even less efficient than a machine learning model by parameter count. In practice, evolution didn’t get many chances to rapidly prototype and redesign entire brains with efficiency in mind, which makes me think we could potentially do better.

            Many people have said that we’ve already had supercomputers as complex as human brains and this has not yielded AGI because we don’t have the right algorithms. This is true, but deep learning seems a lot closer to what our brains do than traditional supercomputer use-cases, such as simulating phenomena. Deep learning has a sort of generality, which if combined with vast complexity, could be very powerful.

            Many people — such as Gary Marcus — have pointed out that our deep learning models still lack basic functionality that the brain uses, including symbolic reasoning, causal reasoning, and sample efficiency. I think this is a fair point, but I think the argument is not as strong as it may first appear. A common theme in AI is that old insights generally sit around waiting for massive amounts of compute and large datasets before becoming useful. Neural networks were once thought hard to train, but now are incorporated as part of a novice toolkit. LSTMs were invented in 1997 but really started making a difference when they could be used to process large datasets of sequences. I think we might see a similar pattern happen in the future when we start attempting to incorporate symbolic and causal reasoning in AI.

            Various schemes for incorporating logic into neural networks have already been proposed: there are neural Turing machines and various end-to-end differentiable logical layers. Similarly, sample efficiency in the brain seems to come from a form of native semi-supervised learning, and the machine learning community has known about schemes like that for decades. The challenges are indeed tough, but I don’t see any insurmountable barriers.

            Part of why AI could achieve rapid progress in the next few decades is because once the utility of large models is demonstrated on large supercomputers, government-sized investments could follow, and this could reduce prices quickly. As you have pointed out before, governments can really speed up the development of AI. A few trillion dollars of AI R&D could radically reshape the field as we know it.

            That said, I do think that it’s important to not put much weight on this sort of speculation. Overall I think that AI is probably at least 30 years away, and timelines of 75 years or more do seem plausible. But I think it’s worth engaging in the arguments for short timelines, which I think are quite interesting, and not obviously false.

Leave a Reply

Your email address will not be published. Required fields are marked *