Sunday, April 7, 2019

The sum of three cubes problem and why it is interesting for artificial intelligence


The holy grail of artificial intelligence is the quest to develop general artificial general intelligence.  A considerable amount of progress has been made on specific forms of intelligence. Computers are now able to perform many tasks at suprahuman levels.  But extending these specific capabilities to more general intelligence has so far proven elusive.

One of the most important barriers  to extending computational intelligence from specific to general capabilities is an inadequate understanding of the different kinds of problems that a general intelligence system would have to solve.  Research has focused on one narrow range of problems with the apparent expectation that solving those problems will lead eventually to general problems solving.  These are problems that can be solved by parameter optimization.  Optimization means that the learning algorithm adjusts the values of its model parameters to gradually approximate the desired output of the system.  But there are problems that cannot be solved using these methods.

The sum of three cubes problem is one of these problems.  Conceptually, it is not very complicated. It could straightforwardly be solved by brute force, that is by trying numbers until a solution is found.  Still it has resisted solutions for some numbers despite more half a century of effort.
 
In general form, the three cubes problem is this: For any integer k, express that number as the sum of three integers cubed.  For example, the integer 29 can be expressed as 29 = 3³ + 1³ + 1³ (29 = 27 + 1 + 1). It is easy to determine that some numbers cannot be represented as the sum of three cubes.  For example, the number 32 cannot be expressed as the sum of three cubes, but until just recently, no one knew whether the integer 33 could be.  Is there some set of three integers that satisfy the equation 33 = x³+ y³+ z³?  In fact, until recently, there were only two numbers below 100 for which a solution was unknown, 33 and 42.  All of the others were either known to be impossible or the three integers were known.

There is no known optimization method for finding the three numbers that when cubed sum up to 33 or 42.  There are no known methods to gradually approximate a solution.  Once the correct three integers have been found, it is easy to verify that they are, in fact, correct, but there is no solution that is partially correct, only solutions that are correct or incorrect.  The best that one can do is to guess at likely numbers.  Andrew Booker, at the University of Bristol, was recently able to solve the problem for k = 33 by improving somewhat the methods used to guess potential solutions.  His method reduced the number of integers that needed to be searched by an estimated 20%, but even after this improvement, his solution consumed 23 core-years of processing time.  That is a substantial amount of effort for a fairly trivial problem.  According to Booker, “I don’t think [finding solutions to the sum of three cubes problems] are sufficiently interesting research goals in their own right to justify large amounts of money to arbitrarily hog a supercomputer.”

Why this problem is interesting for artificial intelligence

The sum of three cubes problem has resisted solution for over half a century.  This problem is very easy to describe, but difficult, or at least tedious, to solve.  Understanding the difficulty posed by this kind of problem and how that challenge was addressed is, I think, important for understanding why general intelligence is a challenge and what can be done to meet that challenge.

Current versions of machine learning can all be described in terms of three sets of numbers.  One set of numbers maps properties of the physical world to numbers that can be used by a computer.  One maps the output of the computer to properties of the physical world.  The third set of numbers represents the model that maps inputs to outputs.  Machine learning consists of adjusting this set of model numbers (using some optimization algorithm) to better approximate the desired relation between inputs and outputs.  This kind of framework can learn to recognize speech, to create novel musical creations, and play chess, go, or Jeopardy.  In fact, some version of this approach can solve any problem that can be represented in this way.

But it is still the case, that the success of these systems relies heavily on the ability of human designers to construct these three sets of numbers and select optimization algorithms to adjust the model.  The sum of three cubes problem is not amenable to an optimization approach because there is no way to determine which changes to X, Y, and Z, will bring it closer to the desired solution.  There is no way to define closer.

In 1965, I. J. Good raised the possibility of an ultraintelligent computer system that would surpass human intelligence:

“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion,' and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.”

Presumably, solving the sum of three cubes problem would also be among the intellectual activities that such a machine would address, since it continues to be a problem addressed by humans.  This problem is conceptually much simpler than designing intelligence programs, but it may be even less tractable. 

Booker’s improved algorithm was not discovered automatically.  There is no algorithm that we know of that can produce new algorithms like the one he produced.  It took humans over 64 years to come up with one even this good, despite fairly widespread interest in the problem.  We do not know how Booker came up with the insight leading to this new algorithm, nor do we know how to go about designing a method that could do so predictably.  General intelligence will require computers to be able to generate new problem representations and new algorithms to solve new problems, but we have little idea of how to get there.

Even this new method faced a huge combinatoric challenge.  There are just so many combinations of three numbers that could be the solution to the problem.  No matter how intelligent a system is, there ultimately may be no amount of intelligence that can eliminate this combinatoric problem.  If even the problem of finding three numbers can be combinatorically challenging, what will a general intelligence system face when trying to solve problems with even more variables?  The time required to test a large number of integers and their cubes may be reduced, but it cannot be eliminated.

To this point, no one has come up with a computer system that can design its own models.  Deep learning systems that are said to come up with their own representations actually still work by adjusting parameters in a prestructured model.  The transformations that occur within the model (moving from one layer to the next) are determined by the architecture of those layers.  For example, a linear autoencoder layer does not learn an arbitrary representation of the data, it “learns” to perform a principal component analysis, a well-known statistical technique.   So far someone still has to come up with the design of the network and optimization methods used to solve the problems.

The sum of three cubes problem could be solved by simple brute force if we were to allocate sufficient resources to its solution.  With other kinds of problems even the space in which to apply the brute-force search may be obscure.  Some insight problems, for example, are difficult until the solver finds the right representation, at which point they are typically easy to solve.  Like the sum of three cubes problem, insight problems do not admit of partial solutions that can be selected through optimization.  The key to solving is to think of the problem in the right way.  Solving these problems requires a switch in how those problems are represented. 

Here’s an insight problem whose solution may be familiar:  Curt and Goldie are lying dead on a wet rug in a locked room.  The room is in an old house near some railroad tracks.  How did they die?

Once you come up with the right model for this situation, solving it is trivial, but the difficult part is often coming up with the right representation.  There are many other insight problems, but these have not at all been studied by computer scientists so far as I am aware.  But the problem of coming up with good representations has been the very mechanism of progress in artificial intelligence.  So far it has been done slowly and painstakingly by people. 

There are many other problems that a generally intelligent system will have to address if it is ever to achieve general intelligence, let alone superintelligence.  We may someday be able to create artificial general intelligence systems that can address these problems, but it will require a different computational approach than any we have available today.  


Monday, February 18, 2019

The Singularity Called: Don't Wait Up


Dylan Azulay at emerj has just published another in a series of surveys that have been conducted over the last several years by different groups about when the technological singularity is likely to happen.  The singularity is the idea that computers will get so smart that their intelligence will grow explosively.

The notion of a technological singularity was initially proposed by Vernor Vinge in 1993, expanding on some ideas from I. J. Good and John Von Neumann.

Good wrote:
“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an "intelligence explosion," and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.”  Good, I. J. (1965). Speculations Concerning the First Ultraintelligent Machine, in Advances in Computers, vol 6, Franz L. Alt and Morris Rubinoff, eds., 31-88, Academic Press.

According to Vinge: “It's fair to call this event [the explosion in machine intelligence] a singularity (‘the Singularity’ for the purposes of this piece). It is a point where our old models must be discarded and a new reality rules, a point that will loom vaster and vaster over human affairs until the notion becomes a commonplace.”

The notion of the singularity combines the idea of artificial general intelligence, with the idea that such a general intelligence will be able to grow at exponential velocity.  General intelligence is a difficult enough problem, but it is solvable, I think.  But, contrary to the speculations of Good, Vinge, Bostrom, and others, it will not result in an intelligence explosion.

To understand why there will be no explosion, we can start with the 18th Century philosophical conflict between Rationalism and Empiricism.  Simplifying somewhat, the rationalist approach assumes that the way to understanding, that is intelligence, lies principally in thinking about the world.  The empiricist approach says that understanding comes from apprehension of facts gained through experience with the world.  In order for there to be a singularity explosion, the rationalist position has be completely correct, and the empiricist position has to be completely wrong, at least so far as computational intelligence is concerned.  If all it took to achieve explosive growth in intelligence was to think about it, then the singularity would be possible, but it would leave a system lost in thought.

If understanding depends on gleaning facts from experience, then a singularity is not possible because the rate at which facts become available is not changed by increases in computational capacity.  In reality, neither pure Rationalism nor pure Empiricism is sufficient, but if we view intelligence as including the ability to solve physical world, not just virtual, problems, then a singularity of the sort Vinge discussed is simply not possible.  Computers may, indeed, increase their intelligence over time, but well designed machines and being good at designing them are not sufficient to cause an explosive expansion of intelligence.

Imagine, for example, that we could double computing capacity every few (pick one) months, days, or years.  As time goes by, the size of the increase becomes indistinguishable from vertical, and an explosion in computing capacity can be said to have occurred.  If all the computer had to do was to process symbols or mathematical values, then we might achieve a technological singularity.  The computer would think faster and faster and faster and be able to process more propositions more quickly.  Intelligence, in other words, would consist entirely of the formal problem of manipulating symbols or mathematical objects.  A computer under these conditions could become super-intelligent even if the entire universe around it somehow disappeared because it is the symbols that are important, the world is not.  But the world is important.

The board game go is conceptually very simple, but because of the number of possible moves, winning the game is challenging.  Go is a formal problem, meaning that one could play go without actually using stones or a game board, just by representing those parts symbolically or mathematically.  It is the form of the problem, not its instantiation in stones and boards that is important.

In fact, when AlphaGo played Lee Sedol, its developers did not even bother to have the computer actually place any stones on the board. Instead, the computer communicated its moves to a person who placed the stones and recorded the opponents responses.  It could have played just as well without a person placing the stones because all it really did was manipulate symbols for those stones and the board.  The physical properties of the stones and board played no role and contributed nothing to its ability to play.  The go game board and stones were merely a convenience for the humans, they played no role in the operation of the computer.

AlphaGo was trained in part by having two versions of the game play symbolically against one another. With more computer power, it could play faster and thus, theoretically learn faster.  Learning to play go is the perfect rationalist situation.  Improvement can be had just by thinking about it. No experience with a physical world is needed.  With enough computer power, its ability to play go might be seen to “explode.”

But playing go is not a good model for general intelligence.  After playing these virtual games,  it knew more because of its ability to think about the game, but intelligence in the world requires different capabilities beyond those required to play go.  Go is a formal, perfect information problem.  The two players may find it challenging to guess what the future state of the game will be following a succession of moves, but there is no uncertainty about the current state of the game.  The positions of the stones on the playing grid are perfectly known by each player.  The available moves at any point in time are perfectly known and the consequences of each move, at least the immediate consequences of that move are also perfectly known. Learning to play consisted completely of learning to predict the future consequences of each potential move.

Self-driving vehicles, in contrast, do not address a purely formal problem.  Instead, their sensors provide incomplete, faulty, information about the state of the vehicle and its surroundings.  Although some progress can be made by learning to drive a virtual simulated vehicle, there is no substitute for the feedback of driving a physical vehicle in a physical world.  Learning to drive is not a purely rationalist system. Rather it depends strongly on the system’s empirical experience with its environment. 

At least some of the problems faced by an artificial general intelligence system will be of this empiricist type.  But a self-driving vehicle that computed twice as fast, would not learn at twice the rate, because its learning depends on feedback from the world and the world does not increase its speed of providing feedback, no matter how fast the computer is. This is one of the main reasons whey there will be no intelligence explosion.  The world, not the computer, ultimately controls how fast it can learn. 

Most driving is mundane.  Nothing novel happens during most of the miles driven so there is nothing new for the computer to learn.  Unexpected events (why simulation is not enough) occur with a frequency that is entirely unrelated to the speed or capacity of the computer.  There will be no explosion in the capabilities of self-driving vehicles.  They may displace truck and taxi drivers, but they will not take over the world, and they will not do it explosively.

There are other reasons why the singularity will be a no-show.  Here is just one of them.  Expanding machine intelligence will surely require some form of machine learning.  At its most basic, machine learning is simply a method of modifying the values of certain parameters to find an optimal set of values that solve a problem.  AlphaGo was capable of learning to play go because the DeepMind team structured the computational problem in an important new way.  Self-driving cars became possible because the teams competing in the second DARPA grand challenge figured out a new way to represent the problem of driving.  Computers are great at finding optimal parameter values, but so far, they have no capability at all for figuring out how to structure problem representations so that they can be solved by finding those parameter values.

Good assumed that “the design of machines is one of these intellectual activities” just like those used to play go or drive, but he was wrong.  Structuring a problem so that a computer can find its solution is a different kind of problem that cannot be reduced to parameter value adjustment, at least  not in a timely way.  Until we can come up with appropriate methods to design solutions, artificial general intelligence will not be possible.  Albert Einstein was not known as brilliant for his ability to solve well-posed problems, rather he was renowned for his ability to design new approaches to solving certain physics problems—new theories.  Today’s computers are great at solving problems that someone has structured into equations, but none is able yet to build create new structures.  General intelligence requires this ability, and it may be achievable, but as long as general intelligence depends on empirical feedback, the chances of a technological singularity are nil.