Moore's Law For Intelligence
Will the amount of intelligence start to double every two years? The first in a series of essays on AI.
The Manhattan Project, a turning point for the world. Image source: National Park Service.
In 2009, the computer scientist Shane Legg – who went on to co-found DeepMind, an artificial intelligence (AI) research lab – predicted that artificial general intelligence (AGI) matching human capability would be reached between 2025 and 2028.
We are now in 2024, and indeed, OpenAI’s GPT-4 seems not far off human level on many tasks and exceeds humans at some. If AI continues to move as fast as this trendline suggests, it’s safe to predict that the world will look radically different by 2030. COVID-19’s rapid initial growth caught many world governments flat-footed; AI’s phenomenal growth threatens to do the same.
How were scientists like Legg able to predict AI capabilities so far ahead?
Legg understood that intelligence was a function of compute (capacity for performing computational tasks), and a crucial driver of compute is Moore’s Law, which stated that the number of transistors in an integrated circuit would double about every two years. This observation has had astonishing longevity. From this, he was able to extrapolate.
The growth of AI is occurring at a similarly rapid rate. It is likely that we are at the beginning of a new Moore’s Law: one in which the amount of intelligence doubles every two years.
A New Moore’s Law for Intelligence
One under-appreciated aspect of Moore’s Law is that the improvements Moore predicted didn’t come from just one thing. Instead, there seemed to be a constant stream of new improvements.
Between 1965 and 2005, Dennard scaling (roughly, smaller transistors → more transistors per chip) was the primary driver of the technology curve produced by Moore’s Law. Eventually, though, the law began to reach its limits and break down, leading many to conclude that Moore’s Law was ending.
Moore’s Law continued, but from 2005 to 2020, growth was primarily driven by innovations like die size increases and bigger chips. New innovations, such as the use of nanomaterials, are going to drive the next decade of growth.
Thus, Moore’s Law was not just a simple prediction or extrapolation. Rather, it functioned as an aspiration for the industry, guiding it to produce a constant stream of innovation in order to keep up with the curve.
The same is true – perhaps surprisingly – for modern AI: we can expect improvements to continue, and to compound over time.
To understand why, we need to understand the factors that go into improving AI models. We can talk about four major factors:
Data (quality + quantity). Large language models (LLMs) are currently trained on a large corpus of data taken from the internet and digitized books, among other sources. Higher quality data and more of it results in the bigger gains to model performance.
Compute (and scale of model). The scale of an LLM is measured in the number of parameters: the larger the model, the more parameters. The number of parameters in turn is correlated with the amount of compute (computational tasks) needed to train the model. The key variable that drives compute is compute capacity, the hardware resources – the number and speed of CPUs and GPUs amount of RAM and storage available for the compute. Compute capacity is doubling roughly every 9 to 10 months.
Algorithmic / sample efficiency. There’s compelling reasoning to believe that the way models learn has been inefficient. After all, human babies do not need to ingest billions of words from the internet to learn how to speak! Indeed, multiple new techniques are enabling language models to learn far more efficiently from the data they are given, which in turn reduces the amount of data required to achieve a given level of capability in the AI.
Other innovations. Several other innovations have played a crucial role and have had a dramatic effect in improving AI performance. For example, the invention of the transformer took LLMs across a critical threshold, and reinforcement learning from human feedback (RLHF) is one thing that makes GPT-3.5 feel more lifelike and perhaps closer to being able to pass the Turing test than GPT-3. Other examples might include true synthetic data generation for LLMs.
These factors, especially the first two, are quantifiable, and they grow at predictable rates. We will do a deep-dive into each factor in the essays that follow this one. For now, we note that the growth of LLM capability can be accurately extrapolated to a surprising degree.
Implications
If this analysis of AI improvement in terms of its “factors of production” is correct, it has some important, and counter-intuitive, implications.
First, human-level AI will eventually be trainable by someone no more experienced than a graduate student using the resources offered in any standard-sized university lab cluster. Although AI models are currently called large language and can only be trained after billions of dollars in capital have been raised, the cost and efficiency curves at play here suggest that powerful future AI models will not need to be especially large.
As a simple example, when ChatGPT was launched, OpenAI founder and CEO Sam Altman estimated that the product cost “several cents” to run per chat. As of publication, services of open-source model Mixtral 8x7b is being offered at $0.50 per million tokens, which represents a 100-fold decrease in cost from a year earlier. We can expect this to continue with more powerful models such as GPT-4: their costs will decrease rapidly after launch.
Second, open source models will improve extremely fast. Again, this is all driven by the dynamics mentioned earlier: better chips mean models of a given quality can be produced at a cheaper cost, all else equal; and algorithmic improvements will ensure that model quality keeps increasing. There is intense competition in the open source space to produce better, faster, cheaper language models, driven by big labs (Microsoft, Meta) as well as independent companies such as Mixtral.
Third, taken as a whole, these facts imply that so-called safety measures such as “pauses” or “stops” would be self-defeating. Even if AI development were to be regulated or slowed down, chips would continue to get faster and open source models would continue to improve. As a result, AGI would remain within arm’s reach for any capable academic department with access to a lab cluster. In this scenario, we would be ultimately less prepared for a world with AGI – because we’ve done less empirical work in the interim – but we would get it anyway. Thus, the best way forward is empirical and pragmatic: continue to develop and test models and learn as we go. We have no other way to go but forward.
Self-improving?
The surprising predictability of the data and compute scaling curves puts us in the unusual situation where we can forecast, to reasonable resolution, when human-level AI might be achieved. Though it is unclear how to exactly define the threshold for when “true AGI” appears, it is already evident that the threshold is a matter of degree rather than of sharp distinction.
The biggest unknown, with the largest implications if true, is whether, or when, AI will become self-improving, able to contribute to research on itself. This would create a critical inflection point. After the inflection point, improvements will only compound further: AI conducting research on AI will improve AI, leading to further and further improvements, and so on.
Note that this is not pure science fiction: it is already true in at least two ways. First, the programmers who write the code that make up AI models already use AI assistance to help them write that code faster. Second, there are techniques such as RLAIF (reinforcement learning from AI feedback) that involve using AIs to rate AI responses, thereby enabling further training. This latter process was typically done by people (reinforcement learning from human feedback, or RLHF) but is increasingly being scaled using automated techniques. Thus, AI is already helping us develop AI faster.
If the logic of this argument is correct, nobody knows where this process ends. Thus, a key question for governments and policymakers is when we can expect to get to this critical inflection point.
The shocking implication of what we have seen in this piece so far is that there may be no great, transformative breakthroughs needed to get to the critical inflection point. We already have the ingredients. As Ilya Sutskever likes to say, “the machine just wants to learn” – data, compute, and the right algorithms result in intelligence of a particular kind, and more of those inputs results in more intelligence as an output!
Skeptics would do well to heed the words of Rich Sutton’s famous “bitter lesson” essay, which argues that greater scale plus the application of general methods we already know is all that’s needed:
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation.
This series forms an introduction to some empirical facts about AI in the 2020s. It is intended to give a general audience grounding in some of the key trends and facts, as well as draw out some of the implications of AI scaling. The next post will look at data.
Nabeel S. Qureshi is a Visiting Scholar at Mercatus. His research focuses on the impacts of AI in the 21st century.
Hi,
I've done some research on the number of "human brain equivalents" ("HBEs") added per year, from an historical perspective, and extrapolated into the future. Here are the results:
https://markbahner.typepad.com/random_thoughts/2016/02/recalculating-worldwide-computing-power.html
There are some interesting results from that graph. I'll focus on the line based on a human brain being equal to 20 petaflops (20 quadrillion flops, or 20,000 teraflops). That's based on Ray Kurzweil's 2008 estimate that the human brain can perform about 20 quadrillion operations per second.
If we look at that line on the graph, the number of HBEs was only *one* in 1993.(!) That is, in 1993, all the computing power in the world only added one human brain equivalent. And the number was only up to 1 million HBEs added in 2015. Still tremendously small, compared to a global population growth of approximately 87 million that year.
However, by 2026, the number is more like 1 billion (with a "b") human brain equivalents added. And by 2037, it's more like 1 trillion (with a "t"!) human brain equivalents added.
Bottom line from that analysis: expect changes in the next 5 to 15 years to be spectacular. Nothing in human history will compare.
Is it a coherent position to expect AGI in a couple of years and to say that we can and should do nothing to delay it? The implications of AGI aren't well understood, and it will be an extremely powerful technology. If we think we only have a year or two, then the case for stretching out the timeline seems stronger rather than weaker. An extra year to figure out the consequences and prepare for them seems very valuable. Even if chips and open source continue to improve, the overall rate of improvement will be lower if we restrict the largest training runs. Otherwise, the size of the model would presumably be a multiplier on the rate of improvement from other sources, everything else being equal. I'm not necessarily a pause proponent, but the more we think AGI is imminent, the more a pause makes sense, even if only partially effective.