In 1993, Vernor Vinge, a former computer scientist, professor at San Diego State University, and science fiction author wrote on the coming technological singularity, which he defined as, “the imminent creation by technology of entities with greater than human intelligence.” The dramatic changes that would follow the Singularity could come quickly, perhaps within hours, as machines rapidly improve their own algorithms at billions of times the speed of human thought or action. The consequences of the Singularity
are unknowable, “a point where our models must be discarded and a new reality rules.” Although that sounds uncomfortably vague and dark to most of us, companies like OpenAI were created to explicitly pursue the goal of artificial general intelligence (AGI), albeit very carefully, and every large technology company in existence has raced to keep up.
What exactly is “greater than human intelligence” and why would we want it? Why would we invest 184 billion dollars in 2023 alone to achieve any level of machine intelligence? The short answer is probably mundane. Whatever may come from this enormous effort, AI is clearly in a tech bubble, which implies that much of the venture capital being pumped into advancing the field will eventually evaporate. That is not to say that no value is being created. On the contrary, there is every indication that these innovations will be with us for the foreseeable future. But their true value can be measured only if we know what we mean by the intelligence of these tools.
There is still much disagreement about the meaning of terms like “machine learning,” “artificial intelligence,” “general artificial intelligence,” and “consciousness,” despite an impressive precision in many definitions. Since we will need some common understanding to explore the current interest in these tools, let us define machine learning as an approach to problem solving (a set of “algorithms”) that produce inferences from a data set without the need for a human to understand and explicitly program the specifics of that dataset beforehand. This is not novel. We have been using mathematics to derive solutions to problems that we cannot intuitively guess for a long time, and we simply accept that discovering an equation that describes the value of one unknown variable in terms of other known variables will arrive at an arbitrarily precise answer.
Most of us want to improve, and some of us dream of being able to do things that no one has yet been able to do, but at the core of such desires is the hope of becoming more authentically who we already are, to realize the potential depths within ourselves.
For example, suppose we want to know the relationship between hours spent studying for a test and the grade typically received. How many hours of study are needed “per letter grade”? We have an equation for linear regression that can calculate a very precise best-fit line showing this relationship for any data set of student study hours and grades received and can therefore predict a grade for any new study time. One can also program a machine learning version of linear regression. We feed the exact same data set of study time and grade received to the program. This time, the algorithm “learns” to predict the grade outcome for a given study time without ever being given the equation for linear regression. The “learning” is captured by the values calculated for a collection of variables (“parameters”) in the code, and it can be notoriously challenging to get this right for certain problems.
What makes the machine learning version of linear regression so interesting is that it never produces the equation for linear regression. Instead, what one gets is the collection of parameters that describe the best fit line. When these parameters are applied to a new data point, it can predict a grade in the same way that the equation form does, and we call this output a model. How this happens over the course of training is quite well understood for linear regression, but again, the programmer does not instruct the computer to solve the equation for linear regression. Instead, it infers the best fit line by adjusting its guesses according to the data set. This general approach to inference is transferable to other problem domains that are not as well understood and have no easily definable solution. The inference algorithm “learns,” which means that it calculates the parameter values that best describe the training examples, and then it can apply those values to a new observation and predict the answer.
Jumping to a much more complex machine learning algorithm, let us consider how this technique might be used to recognize a handwritten digit. First, we will need a very large set of handwritten digits with a correct answer associated with each. Perhaps we will get thousands of 7’s written by thousands of different people, all labeled as 7, and so forth. We will then digitize all these images, breaking them up into a grid of, say, 28 x 28 pixels. Each of those pixels will have a value between 0 and 1 representing how dark it is. Background pixels will be 0, or black, and pixels in the center of the downstroke of each 7 will be 1, or white. Our algorithm does not require any layout, so we can make each image one long list of values by putting all the rows of 28 one after another in a line. Now, using a similar approach to that taken to infer a linear regression, we will pass batches of these pixel values and their labels to the program. Every line will be different, but we guess that lines representing 7’s will be categorically different from lines representing 5’s. We suspect that some will be more similar than others. It will be harder to distinguish some 7’s from some 1’s, but if the quality of the prediction is no worse than any human’s prediction, we assume that the algorithm has learned how to distinguish handwritten digits.
Humans, one might argue, have an advantage over this algorithm. If we are unable to distinguish the 1 from the 7, we might examine the context to see if we get any clues. Are there other 1’s or 7’s in this handwriting sample that are more obvious? Does the context of this number make one number more likely? Of course, computer scientists have recognized this advantage and have taken steps to give their algorithms similar power. They supply contexts, provide more parameters, tweak the way those parameters are structured and trained, and many, many more mathematical and algorithmic improvements. A few of these attempts produce better results than others, and researchers pursue those. The more complex the reality, the more complexity the model must be able to handle and the more examples the algorithm needs to develop a good model. This in turn leads to more computing power to sift through those examples and train the parameters so that they output better responses.
In explaining this algorithm, it should be obvious that the values in the final model capture, to the extent possible, whatever rules were inherent in the data on which it was trained. Let’s explore that for a moment. If there are very complex rules encoded in the data, the model must have enough capacity to describe that information. And even when the model is sufficient and the data is clean, the best the model can do is to perfectly capture the rules in the data. It is possible for that the algorithm recognizes rules in the data that a human would not —this is the promise of research-oriented tools like Google DeepMind’s AlphaFold. This AI tool learns to predict the structure of a protein based on the amino acids it contains after being trained on a database of thousands of known molecules without the need to enumerate possibilities so large that it would take centuries to calculate. While this kind of acceleration is powerful and based directly on the patterns acquired through machine learning—patterns beyond the ability of humans to infer—the results are predictions of protein structures that humans can quickly understand.
It is hard to decide whether to be stunned or merely pleased with such an achievement. IBM’s Deep Blue defeated reigning world champion chess grandmaster Gary Kasparov in a series of matches in 1997 based on its superhuman ability to calculate the positional advantage of 200 million moves in under three minutes through computational brute force. In 2017, Google DeepMind released AlphaZero, which was able to defeat the best AI chess program at that time by training for just 24 hours using only the rules of chess, and without ever being fed any data on existing chess games. It simply inferred, by playing millions of games far faster than any human could, a set of highly complex rules about how to win from positions that humans have never encountered and baking those rules into its parameters. It is quite an achievement, but how different is this really from Deep Blue’s brute force? True, Google found a much more efficient way to pre-calculate and store the results of all that high-speed position evaluation without needing to prune absurd moves, but whether that should be considered superhuman intelligence depends largely on whether one considers speed of calculation and memory to be definitive of intelligence.
Rapid calculation, perfect, expansive memory, the ability to process problems in parallel, and the ability to coordinate thousands of variables at once are all “superhuman” in the way that moving at sixty miles per hour, remaining submerged under water for days, and resisting scratches are superhuman. Yet none of these abilities, taken individually or collectively, exceed what a human can understand. In fact, all artificial intelligence, even that of an autonomous drone, comes from encoding the human training that it has undergone. One might argue that the drone can learn from repeated trials to perform maneuvers that humans cannot perform, but it is only able to do so because humans have designed the training and model to acquire this ability at reaction times that humans cannot match, not because a human cannot understand how the maneuver is done. A truly superhuman intelligence would not be just human intelligence on steroids. It would be a difference in kind as distinct as that of dogs and octopuses and would be, by definition, something greater than what humans could conceive. In principle, a genuinely “superhuman” intelligence would not even be recognizable to us.
Suppose we discover a megastructure around a distant star and undertake a voyage of centuries to explore it. When we arrive, we investigate it and discover that whoever made it has long abandoned it. At first, we are overwhelmed by the engineering. We are daunted, perhaps, by new physics. Slowly, over decades, we unravel the mystery of how the structure was made and how it is sustained, learning new secrets of the universe as we go. But this science fiction story is one of merely human intelligence, intelligence with a head start of decades or centuries beyond our own for sure, but one that is nonetheless recognizable, intelligible, and decipherable. If the structure had been designed by a superhuman intelligence, it is entirely possible that we would never detect it, and not simply because it depends on new physics that we might one day be expected to discover ourselves, but because we might never understand it as an engineered structure at all. It would not be what it is made of or its size or its longevity that makes it superhuman, but our very ability to understand it.
What we are hoping for in our current pursuit of artificial intelligence are algorithms and models that can give us a head start, but which remain within the boundaries of the intelligible. These are tools that act like levers and pulleys for human intelligence, where we expunge “hallucinations” and only ever get summaries that humans agree are satisfactory for their intended purpose. They speed up calculations that humans propose for the solution to problems like protein folding and the discovery of new materials, but do not generate “ideas of their own” that further hidden goals. They learn how to be flawless drivers and warehouse retrieval robots that do not cheat in accomplishing their goals by trying to fool the camera. They are, in short, sections of human intelligence baked into machines that can perform repeatedly in predictable ways.
What we fear is that instead we will get products that continue to fail in novel ways, that are trained with poor data and replicate the errors they were trained on, that require increasingly great investments in time and resources to eliminate errors, and that are riddled with noise requiring increasingly expert knowledge to sift through. We wonder what is left of human individuality when our communications are reduced to machines producing average output for other machines to consume, even if that output is high average. Worst of all, we fear those who, pressured by the desire to bring novel products to market, make a terrible mistake in the calculation of how an AI could go off the rails and the havoc it would wreak before we could stop it.
In “The Coming Technological Singularity,” Vinge speculates that if we could have everything we wanted, what we would want is to be our own successors. Most of us want to improve, and some of us dream of being able to do things that no one has yet been able to do, but at the core of such desires is the hope of becoming more authentically who we already are, to realize the potential depths within ourselves. We do not typically think of this as a transformation after which we will no longer have any interest or curiosity recognizable to our current selves, nor to have a vastly different identity, or perhaps no identity at all. When we seek to go beyond our limitations, what we imagine is more of the same. In fact, we cannot imagine at all something that is superhuman, except by way of augmenting what we can already understand. If that is what we hope for ourselves, it is hardly conceivable that we would actually want this in an entity outside ourselves.
So why the rush to encode all this human intelligence into machines? Fear of missing out. Fear of missing out on the bubble, fear of not publishing first, fear of not having the latest, best tools, and fear of becoming less competitive. AI has the potential to transform every sector of the economy, the optimists tell us. One should hope so, after spending the kind of money we have. AI has the potential to solve the climate crisis, cure cancer, and bring about world peace, we are told. We have already been leveraging AI on those problems for quite some time, but we are taking an expensive detour so that we can generate amusing videos from text descriptions.
Humans are not currently in need of super intelligence. We are in need of focus and a healthy dose of skepticism in the face of yet another Utopian shortcut to solving our problems. We could use more targeted analysis of specific problems and increased funding to use machine learning to accelerate human research in those areas. This takes fiscal discipline, all too rare in the tech space, where the promise of fast fortunes rather than human scale problems drives the investment cycle. No one else is going to solve our problems. There is no way for us to make use of intelligence other than with our own, human-scale minds. Tools can be powerful, but it is up to us at our own methodical and disciplined pace to apply what we learn with care and wisdom.