Thursday, May 13, 2021

Expected impact of the executive order improving cybersecurity

The President of the United States has issued an executive order concerning cybersecurity.  https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/

This is my rapid-response analysis.

The order was undoubtedly in preparation for a considerable amount of time, but it was released following the ransomware attack on the Colonial Pipeline in the US, which carries petroleum products along most of the East Coast.

Technically, the executive order applies only to US government contractors, but many of the provisions apply to the entire supply chain leading up to these directly specified customers.  As a result, most of the orders will impact to varying degrees any company that does business with a government contractor as well as any government t contractor or supplier.  An executive order cannot compel companies that are not government contractors to change what they do, but those who do not may be excluded from doing business with the government and any of its contractors or suppliers, so it, in effect, applies to almost all companies.

The executive order is intended to do several things:

·       Remove barriers to sharing threat information, primarily between the government and private entities, but it will also have the effect of making it easier to share information among private entities.

·       Strengthen cybersecurity standards.

·       Mandate the wider use of zero-trust methods and architectures.

·       Require software developers to maintain greater visibility.

·       Make public security information so that consumers can evaluate the security of a software system.  As an outcome it establishes an “Energy-Star” like program for rating software security.

·       Mandate the use of multi-factor authentication where appropriate.

·       Strengthen the requirements around encryption at rest and for data in motion.

·       Establish a cybersecurity review board.

·       Create a standard playbook for responding to cyber-incidents. I predict that this will end up being a mandate that each company have a standard procedure for dealing with cyber-incidents.

·       Improve capabilities to detect cybersecurity incidents

·       Improve investigative and remediation capabilities.

Analysis

The order provides a lot of common sense ideas for how to improve cybersecurity—common sense, that is, if you spend your time thinking about cybersecurity.  Nothing in the order seems outlandish or overly burdensome.  Cybersecurity is the grand challenge of the 21st Century and it is increasingly obvious that we need to pay a lot more attention to it.  Cybersecurity failures are expensive and highly damaging to the reputations of those organizations that are attacked.

The order discusses removing the contractual barriers that prevent companies from sharing information about cyberattacks.  Although strictly, these barriers include only those in US federal contracts, there will be increasing pressure to share information among all concerned parties.  Any information relevant to cyber incidents or potential incidents must be reported promptly to relevant government agencies, using industry-recognized formats. The extent of sharing will certainly increase, but it will still require a careful balance among business interests, privacy, and coordinated defense.

The focus of the order is to bring systems up to modern cybersecurity standards. NIST, the National Institute of Standards and Technology has been very active in creating these standards.  Organizations may need to review their security standards to be sure that they meet current standards.  I would expect, in addition, that future standard will be developed that will require additional investments.  The order contains an intention to invest in technology and personnel to match the modernization goals.  It will require congressional action, however, to actually fund these good intentions.

The order mandates transitioning to Zero Trust Architecture.  The order defines Zero Trust Architecture as “a set of system design principles, and a coordinated cybersecurity and system management strategy based on an acknowledgement that threats exist both inside and outside traditional network boundaries.”  This framework allows users full access to the specific computational features that they need to perform their jobs.  Traditional security architectures put all of their effort in defending the perimeter of a network.  Once through the firewall, an attacker would essentially have free range because all machines within the firewall were considered fully protected.  Zero Trust Architecture reverses that assumption.  Every machine is suspect, no matter where it located until it is verified that the machine has a need for access to a resource and permission to access it. 

Defenders have to correctly defend their systems every time, but attackers need only succeed once.  It is no longer a matter of whether attackers will pierce the firewall, it is when and how will they find a way to do it.  Therefore, internal as well as peripheral defenses are necessary, and Zero-Trust Architectures provide a framework for that internal + periphery protection.

The order requires new documentation and compliance frameworks.  These requirements may impose some additional requirements on how companies document their processes and products.

One of the most impactful features of the new order is its focus on preventing supply chain attacks.  It requires software that can resist attacks and detect tampering.  Each provider will be required to verify that its software has not been compromised, including any software that is used for development and deployment as well as in the components that are used.  The government, with the involvement of the relevant parties, will be developing guidelines that can be used to evaluate software security, including the practices of developers and suppliers.  These parties will need to demonstrate their conformance with secure practices.  The guidelines are expected to include (quoting from the order):
          (i)     secure software development environments, including such actions as:
              (A)  using administratively separate build environments;
              (B)  auditing trust relationships;
              (C)  establishing multi-factor, risk-based authentication and conditional access across the enterprise;
              (D)  documenting and minimizing dependencies on enterprise products that are part of the environments used to develop, build, and edit software;
              (E)  employing encryption for data; and
              (F)  monitoring operations and alerts and responding to attempted and actual cyber incidents;
          (ii)    generating and, when requested by a purchaser, providing artifacts that demonstrate conformance to the processes set forth in subsection (e)(i) of this section; 
          (iii)   employing automated tools, or comparable processes, to maintain trusted source code supply chains, thereby ensuring the integrity of the code;
          (iv)    employing automated tools, or comparable processes, that check for known and potential vulnerabilities and remediate them, which shall operate regularly, or at a minimum prior to product, version, or update release;
          (v)     providing, when requested by a purchaser, artifacts of the execution of the tools and processes described in subsection (e)(iii) and (iv) of this section, and making publicly available summary information on completion of these actions, to include a summary description of the risks assessed and mitigated;
          (vi)    maintaining accurate and up-to-date data, provenance (i.e., origin) of software code or components, and controls on internal and third-party software components, tools, and services present in software development processes, and performing audits and enforcement of these controls on a recurring basis;
          (vii)   providing a purchaser a Software Bill of Materials (SBOM) for each product directly or by publishing it on a public website;
          (viii)  participating in a vulnerability disclosure program that includes a reporting and disclosure process;
          (ix)    attesting to conformity with secure software development practices; and
          (x)     ensuring and attesting, to the extent practicable, to the integrity and provenance of open source software used within any portion of a product.

Companies will need to provide a software bill of materials that reflects all of the components included in the code.  Modern code often contains many components, some of which are purchased, some are open source, and some are developed in house. Each of those components could introduce malware in what is called a supply chain attack.  The attacker corrupts the component during its development, without the producer’s knowledge.  The producer distributes this corrupt component and certifies that actually comes from the producer, lulling its users into a false sense of security.

The order includes an effort to build a security rating system that can be applied to IoT (Internet of Things) and other systems.  This rating system is intended to mimic the Energy Star ratings and make it easy for customers (individuals and government agencies) to determine that a system has been evaluated for security and what its status is.

The order mandates the development of a standard set of procedures for dealing with cybersecurity incidents.  From the private sector point of view, I expect that this mandate will end up being a requirement that each party have in place standard operating procedures for dealing with these attacks and for communicating about them with the government and the public. 

An important opportunity for innovation is the mandate to improve detection of cybersecurity vulnerabilities.  Right now, we are very effective at blocking malicious activity the periphery (e.g., at the firewall), but we have seen that not all attacks come in through the same channels.  We would benefit from a capability that identified evidence of an attack from within the network. Eventually attackers will get into the internal systems and we will need Endpoint Detection and Response measures to detect the presence of attackers and remove them.

The order is very clear about the need for security-related logs to be protected by cryptographic methods.  Providers may need to adjust some logging procedures to meet this requirement.

Conclusion

Overall, many of the mandates of this order involve features that are already known or are in development.  The mandates for how suppliers develop and deliver software are likely to be the most impactful.  If nothing else, this order highlights the need for enhanced cybersecurity, which should make it easier to persuade organizations of the importance of these measures. 

Tuesday, May 4, 2021

Regulating artificial intelligence: EU Proposal for a regulation laying down harmonised rules on artificial intelligence

In late April, the European Commission proposed new rules that they intend to support transforming Europe into a global hub for artificial intelligence (AI).  Unfortunately, from my perspective, these proposed rules fall far short of the mark.  The authors of this report do not seem to have an adequate understanding of what AI does, how it does it, or how to evaluate it.  They make supportive comments for funding future AI research, but the bulk of the document seems to be about regulating it, rather than supporting it, and they miss the mark on those regulations.

The report begins with four examples of practices that would be prohibited under the regulation.  These include systems that seek “to manipulate persons through subliminal techniques beyond their consciousness.” Or to “exploit vulnerabilities of specific vulnerable groups such as children or persons with disabilities in order to materially distort their behaviour.” Systems that would “classify the trustworthiness of natural persons based on their social behaviour in multiple contexts or known or predicted personal or personality characteristics.” They also highlight a prohibition against “the use of ‘real-time’ remote biometric identification systems in publicly accessible spaces for the purpose of law enforcement.”

The first of these highlighted prohibitions has nothing to do with artificial intelligence and is a long debunked practice in any case.  Subliminal messaging is the idea introduced by James Vicary in 1957 that one could present messages so briefly, below the threshold (the meaning of subliminal) of conscious detection, presumably so that the message could avoid filtering by the conscious mind and directly control behavior through the subconscious.  Vicary eventually admitted that he made up the original experiment on which this idea is based and no one has ever demonstrated that it does work.  Subliminal messaging does not work and never has.  How artificial intelligence would be used to disseminate it is a mystery to me as is their focus on this pseudo-phenomenon in this report. 

Similarly, the second practice that they highlight, exploiting vulnerabilities of a person due to their age or mental disability also does not materially involve artificial intelligence.  First, it is insulting to imply that age alone makes one cognitively disabled.  Second, it is hard to see what role artificial intelligence might play in taking advantage of that disability.  Third, there are other laws already available to protect cognitively vulnerable individuals and it is unclear why new laws, let alone new laws about artificial intelligence in this context are needed.

Their third example also does not depend on artificial intelligence.  It prohibits the use of a social scoring system for evaluating the trustworthiness of people, in which the social score could lead to unjustified or disproportionately detrimental treatment.  One could use artificial intelligence, I suppose to keep track of these evaluations, but they do not depend on any specific kind of computational system.  We have had creditworthiness scoring for more years than we have had effective computational systems to implement them.

Finally, their fourth highlighted example, seeks to outlaw the use of “real-time” remote biometric identification. That may be a worthy goal in itself, but again, it does not require artificial intelligence to implement it.

Except for the subliminal messaging concern, one could argue that artificial intelligence makes these to-be-prohibited activities more efficient, and therefore, more intrusive.  One could employ teams of expert face recognizers, so-called “super recognisers,” to monitor a public space, but even their prodigious capabilities and the number of people that can be monitored simultaneously is limited relative to a machine’s. I don’t think that we know how the accuracy of these super-recognisers compares with that of machines, but the machines never need breaks, and never take vacations.  But that is not the point of the proposed regulation, it is to control surveillance, not the technology. My point is that it is possible and, I think, preferable, to make laws about directly about prohibited activities than it is to try to regulate them through regulation of technology.

Every form of communication has the potential to exploit vulnerabilities of people.  Sometimes without them noticing it.  Influencing people’s behavior through communication is often called advertising.  

Another problem with this report is its definition of artificial intelligence.  They note that the “notion of AI system should be clearly defined to ensure legal certainty.”  They then provide a definition in Annex I that is so broad that it would encompass every form of computer programming and statistical analysis.  They include machine learning, logic and knowledge based approaches, knowledge representation, knowledge bases, deductive engines, statistical approaches, and search and optimization methods.  That’s just about anything done with a computer.

All programming can be categorized as using logic methods.  All forms of knowledge must be stored in some kind of structure, which could be called a knowledge base.  Every form of statistical analysis is necessarily a statistical approach.

There are many other issues with these proposed regulations, but I want to focus on just one more here.  The authors intend for the proposed legislation to make sure that AI can be trusted.  But there is actually very little here on trusting AI and much of what there is, concerns trusting the people who develop and use the software.  The software is only as good as the uses to which it is put—by the people who deploy it. The report discusses several general principals with which the AI usage should comply, but again these principles concern the use of the software, rather than the software itself.  The closest they come, I think, to suggesting how trustworthiness of software can be assessed is this: “For high-risk AI systems, the requirements of high quality data, documentation and traceability, transparency, human oversight, accuracy and robustness, are strictly necessary to mitigate the risks to fundamental rights and safety posed by AI and that are not covered by other existing legal frameworks.”  Other than saying that artificial intelligence systems should be trustworthy by complying with EU standards for privacy and so on, they offer almost nothing about how that trustworthiness could be effectively assessed.  They argue (correctly, I agree) that the use of such systems should not disadvantage any protected groups, but they do not offer any suggestions for how that disadvantage might be identified and so offer no advice about how it might be mitigated.  If they want to build trust among the public for the use of AI and build some level of legal certainty to promote the development of cutting edge technology (which the regulation is intended to support), then they should provide more advice about how to determine the compliance of any system, with its human appliers, to the principals that are widely articulated. 

The problem with this report is not any antipathy toward artificial intelligence, but rather a lack of understanding of the problems that the government really does need to solve and the means by which to solve them.  As it sits, its definition of artificial intelligence includes just about anything that can be done using rules, programs, automation, or any other systematic approach.  It seeks to regulate things that are unrelated to artificial intelligence.  I don’t argue that those things should not be regulated, but they should be regulated for their own consequences, not for the means by which they are implemented. 

The key is to find a balance between the risks and benefits of any process.  How can the process be used for good while protecting the people who will use the system or be subjected to it?  How can we support innovation while providing protections? Ones obligations to society do not depend on the technology used to identify or meet them.  On the other hand, it is very clear that artificial intelligence and innovation in artificial intelligence are likely to be huge economic drivers in the coming years.  If the EU, any of its member states, or any other country in the world is to succeed in that space, it will require government support, encouragement, and, again, a balance between opportunity and responsibility.

Artificial intelligence in this proposal is a kind of MacGuffin.  It is a “plot device” that brings together several disparate human ethical and legal responsibilities.  But it is the humans’ behavior that is regulated by law.  This proposal would benefit from a more clear-eyed analysis of just how that behavior should be guided for the benefit, rather than the detriment, of society.

Sunday, April 7, 2019

The sum of three cubes problem and why it is interesting for artificial intelligence


The holy grail of artificial intelligence is the quest to develop general artificial general intelligence.  A considerable amount of progress has been made on specific forms of intelligence. Computers are now able to perform many tasks at suprahuman levels.  But extending these specific capabilities to more general intelligence has so far proven elusive.

One of the most important barriers  to extending computational intelligence from specific to general capabilities is an inadequate understanding of the different kinds of problems that a general intelligence system would have to solve.  Research has focused on one narrow range of problems with the apparent expectation that solving those problems will lead eventually to general problems solving.  These are problems that can be solved by parameter optimization.  Optimization means that the learning algorithm adjusts the values of its model parameters to gradually approximate the desired output of the system.  But there are problems that cannot be solved using these methods.

The sum of three cubes problem is one of these problems.  Conceptually, it is not very complicated. It could straightforwardly be solved by brute force, that is by trying numbers until a solution is found.  Still it has resisted solutions for some numbers despite more half a century of effort.
 
In general form, the three cubes problem is this: For any integer k, express that number as the sum of three integers cubed.  For example, the integer 29 can be expressed as 29 = 3³ + 1³ + 1³ (29 = 27 + 1 + 1). It is easy to determine that some numbers cannot be represented as the sum of three cubes.  For example, the number 32 cannot be expressed as the sum of three cubes, but until just recently, no one knew whether the integer 33 could be.  Is there some set of three integers that satisfy the equation 33 = x³+ y³+ z³?  In fact, until recently, there were only two numbers below 100 for which a solution was unknown, 33 and 42.  All of the others were either known to be impossible or the three integers were known.

There is no known optimization method for finding the three numbers that when cubed sum up to 33 or 42.  There are no known methods to gradually approximate a solution.  Once the correct three integers have been found, it is easy to verify that they are, in fact, correct, but there is no solution that is partially correct, only solutions that are correct or incorrect.  The best that one can do is to guess at likely numbers.  Andrew Booker, at the University of Bristol, was recently able to solve the problem for k = 33 by improving somewhat the methods used to guess potential solutions.  His method reduced the number of integers that needed to be searched by an estimated 20%, but even after this improvement, his solution consumed 23 core-years of processing time.  That is a substantial amount of effort for a fairly trivial problem.  According to Booker, “I don’t think [finding solutions to the sum of three cubes problems] are sufficiently interesting research goals in their own right to justify large amounts of money to arbitrarily hog a supercomputer.”

Why this problem is interesting for artificial intelligence

The sum of three cubes problem has resisted solution for over half a century.  This problem is very easy to describe, but difficult, or at least tedious, to solve.  Understanding the difficulty posed by this kind of problem and how that challenge was addressed is, I think, important for understanding why general intelligence is a challenge and what can be done to meet that challenge.

Current versions of machine learning can all be described in terms of three sets of numbers.  One set of numbers maps properties of the physical world to numbers that can be used by a computer.  One maps the output of the computer to properties of the physical world.  The third set of numbers represents the model that maps inputs to outputs.  Machine learning consists of adjusting this set of model numbers (using some optimization algorithm) to better approximate the desired relation between inputs and outputs.  This kind of framework can learn to recognize speech, to create novel musical creations, and play chess, go, or Jeopardy.  In fact, some version of this approach can solve any problem that can be represented in this way.

But it is still the case, that the success of these systems relies heavily on the ability of human designers to construct these three sets of numbers and select optimization algorithms to adjust the model.  The sum of three cubes problem is not amenable to an optimization approach because there is no way to determine which changes to X, Y, and Z, will bring it closer to the desired solution.  There is no way to define closer.

In 1965, I. J. Good raised the possibility of an ultraintelligent computer system that would surpass human intelligence:

“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion,' and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.”

Presumably, solving the sum of three cubes problem would also be among the intellectual activities that such a machine would address, since it continues to be a problem addressed by humans.  This problem is conceptually much simpler than designing intelligence programs, but it may be even less tractable. 

Booker’s improved algorithm was not discovered automatically.  There is no algorithm that we know of that can produce new algorithms like the one he produced.  It took humans over 64 years to come up with one even this good, despite fairly widespread interest in the problem.  We do not know how Booker came up with the insight leading to this new algorithm, nor do we know how to go about designing a method that could do so predictably.  General intelligence will require computers to be able to generate new problem representations and new algorithms to solve new problems, but we have little idea of how to get there.

Even this new method faced a huge combinatoric challenge.  There are just so many combinations of three numbers that could be the solution to the problem.  No matter how intelligent a system is, there ultimately may be no amount of intelligence that can eliminate this combinatoric problem.  If even the problem of finding three numbers can be combinatorically challenging, what will a general intelligence system face when trying to solve problems with even more variables?  The time required to test a large number of integers and their cubes may be reduced, but it cannot be eliminated.

To this point, no one has come up with a computer system that can design its own models.  Deep learning systems that are said to come up with their own representations actually still work by adjusting parameters in a prestructured model.  The transformations that occur within the model (moving from one layer to the next) are determined by the architecture of those layers.  For example, a linear autoencoder layer does not learn an arbitrary representation of the data, it “learns” to perform a principal component analysis, a well-known statistical technique.   So far someone still has to come up with the design of the network and optimization methods used to solve the problems.

The sum of three cubes problem could be solved by simple brute force if we were to allocate sufficient resources to its solution.  With other kinds of problems even the space in which to apply the brute-force search may be obscure.  Some insight problems, for example, are difficult until the solver finds the right representation, at which point they are typically easy to solve.  Like the sum of three cubes problem, insight problems do not admit of partial solutions that can be selected through optimization.  The key to solving is to think of the problem in the right way.  Solving these problems requires a switch in how those problems are represented. 

Here’s an insight problem whose solution may be familiar:  Curt and Goldie are lying dead on a wet rug in a locked room.  The room is in an old house near some railroad tracks.  How did they die?

Once you come up with the right model for this situation, solving it is trivial, but the difficult part is often coming up with the right representation.  There are many other insight problems, but these have not at all been studied by computer scientists so far as I am aware.  But the problem of coming up with good representations has been the very mechanism of progress in artificial intelligence.  So far it has been done slowly and painstakingly by people. 

There are many other problems that a generally intelligent system will have to address if it is ever to achieve general intelligence, let alone superintelligence.  We may someday be able to create artificial general intelligence systems that can address these problems, but it will require a different computational approach than any we have available today.  


Monday, February 18, 2019

The Singularity Called: Don't Wait Up


Dylan Azulay at emerj has just published another in a series of surveys that have been conducted over the last several years by different groups about when the technological singularity is likely to happen.  The singularity is the idea that computers will get so smart that their intelligence will grow explosively.

The notion of a technological singularity was initially proposed by Vernor Vinge in 1993, expanding on some ideas from I. J. Good and John Von Neumann.

Good wrote:
“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an "intelligence explosion," and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.”  Good, I. J. (1965). Speculations Concerning the First Ultraintelligent Machine, in Advances in Computers, vol 6, Franz L. Alt and Morris Rubinoff, eds., 31-88, Academic Press.

According to Vinge: “It's fair to call this event [the explosion in machine intelligence] a singularity (‘the Singularity’ for the purposes of this piece). It is a point where our old models must be discarded and a new reality rules, a point that will loom vaster and vaster over human affairs until the notion becomes a commonplace.”

The notion of the singularity combines the idea of artificial general intelligence, with the idea that such a general intelligence will be able to grow at exponential velocity.  General intelligence is a difficult enough problem, but it is solvable, I think.  But, contrary to the speculations of Good, Vinge, Bostrom, and others, it will not result in an intelligence explosion.

To understand why there will be no explosion, we can start with the 18th Century philosophical conflict between Rationalism and Empiricism.  Simplifying somewhat, the rationalist approach assumes that the way to understanding, that is intelligence, lies principally in thinking about the world.  The empiricist approach says that understanding comes from apprehension of facts gained through experience with the world.  In order for there to be a singularity explosion, the rationalist position has be completely correct, and the empiricist position has to be completely wrong, at least so far as computational intelligence is concerned.  If all it took to achieve explosive growth in intelligence was to think about it, then the singularity would be possible, but it would leave a system lost in thought.

If understanding depends on gleaning facts from experience, then a singularity is not possible because the rate at which facts become available is not changed by increases in computational capacity.  In reality, neither pure Rationalism nor pure Empiricism is sufficient, but if we view intelligence as including the ability to solve physical world, not just virtual, problems, then a singularity of the sort Vinge discussed is simply not possible.  Computers may, indeed, increase their intelligence over time, but well designed machines and being good at designing them are not sufficient to cause an explosive expansion of intelligence.

Imagine, for example, that we could double computing capacity every few (pick one) months, days, or years.  As time goes by, the size of the increase becomes indistinguishable from vertical, and an explosion in computing capacity can be said to have occurred.  If all the computer had to do was to process symbols or mathematical values, then we might achieve a technological singularity.  The computer would think faster and faster and faster and be able to process more propositions more quickly.  Intelligence, in other words, would consist entirely of the formal problem of manipulating symbols or mathematical objects.  A computer under these conditions could become super-intelligent even if the entire universe around it somehow disappeared because it is the symbols that are important, the world is not.  But the world is important.

The board game go is conceptually very simple, but because of the number of possible moves, winning the game is challenging.  Go is a formal problem, meaning that one could play go without actually using stones or a game board, just by representing those parts symbolically or mathematically.  It is the form of the problem, not its instantiation in stones and boards that is important.

In fact, when AlphaGo played Lee Sedol, its developers did not even bother to have the computer actually place any stones on the board. Instead, the computer communicated its moves to a person who placed the stones and recorded the opponents responses.  It could have played just as well without a person placing the stones because all it really did was manipulate symbols for those stones and the board.  The physical properties of the stones and board played no role and contributed nothing to its ability to play.  The go game board and stones were merely a convenience for the humans, they played no role in the operation of the computer.

AlphaGo was trained in part by having two versions of the game play symbolically against one another. With more computer power, it could play faster and thus, theoretically learn faster.  Learning to play go is the perfect rationalist situation.  Improvement can be had just by thinking about it. No experience with a physical world is needed.  With enough computer power, its ability to play go might be seen to “explode.”

But playing go is not a good model for general intelligence.  After playing these virtual games,  it knew more because of its ability to think about the game, but intelligence in the world requires different capabilities beyond those required to play go.  Go is a formal, perfect information problem.  The two players may find it challenging to guess what the future state of the game will be following a succession of moves, but there is no uncertainty about the current state of the game.  The positions of the stones on the playing grid are perfectly known by each player.  The available moves at any point in time are perfectly known and the consequences of each move, at least the immediate consequences of that move are also perfectly known. Learning to play consisted completely of learning to predict the future consequences of each potential move.

Self-driving vehicles, in contrast, do not address a purely formal problem.  Instead, their sensors provide incomplete, faulty, information about the state of the vehicle and its surroundings.  Although some progress can be made by learning to drive a virtual simulated vehicle, there is no substitute for the feedback of driving a physical vehicle in a physical world.  Learning to drive is not a purely rationalist system. Rather it depends strongly on the system’s empirical experience with its environment. 

At least some of the problems faced by an artificial general intelligence system will be of this empiricist type.  But a self-driving vehicle that computed twice as fast, would not learn at twice the rate, because its learning depends on feedback from the world and the world does not increase its speed of providing feedback, no matter how fast the computer is. This is one of the main reasons whey there will be no intelligence explosion.  The world, not the computer, ultimately controls how fast it can learn. 

Most driving is mundane.  Nothing novel happens during most of the miles driven so there is nothing new for the computer to learn.  Unexpected events (why simulation is not enough) occur with a frequency that is entirely unrelated to the speed or capacity of the computer.  There will be no explosion in the capabilities of self-driving vehicles.  They may displace truck and taxi drivers, but they will not take over the world, and they will not do it explosively.

There are other reasons why the singularity will be a no-show.  Here is just one of them.  Expanding machine intelligence will surely require some form of machine learning.  At its most basic, machine learning is simply a method of modifying the values of certain parameters to find an optimal set of values that solve a problem.  AlphaGo was capable of learning to play go because the DeepMind team structured the computational problem in an important new way.  Self-driving cars became possible because the teams competing in the second DARPA grand challenge figured out a new way to represent the problem of driving.  Computers are great at finding optimal parameter values, but so far, they have no capability at all for figuring out how to structure problem representations so that they can be solved by finding those parameter values.

Good assumed that “the design of machines is one of these intellectual activities” just like those used to play go or drive, but he was wrong.  Structuring a problem so that a computer can find its solution is a different kind of problem that cannot be reduced to parameter value adjustment, at least  not in a timely way.  Until we can come up with appropriate methods to design solutions, artificial general intelligence will not be possible.  Albert Einstein was not known as brilliant for his ability to solve well-posed problems, rather he was renowned for his ability to design new approaches to solving certain physics problems—new theories.  Today’s computers are great at solving problems that someone has structured into equations, but none is able yet to build create new structures.  General intelligence requires this ability, and it may be achievable, but as long as general intelligence depends on empirical feedback, the chances of a technological singularity are nil.

Monday, October 22, 2018

Discriminate for fairness


As machine learning methods come to be more widely used, there is a great deal of hand-wringing about whether they produce fair results.  For example, Pro Publica reported that a widely used program intended to assess the likelihood of criminal recidivism, that is whether a person in custody would be likely to commit an additional crime, tended to over-estimate the probability that a black person would commit an additional crime and under-estimate whether a white person would.  Amazon was said to have abandoned a machine learning system that evaluated resumes for potential hires, because that program under-estimated the likely success of women and therefore, recommended against hiring them.

I don’t want to deny that these processes are biased, but I do want to try to understand why they are biased and what we can do about it.  The bias is not an inherent property of the machine learning algorithms, and we would not find its source by investigating the algorithms that go into them. 

The usual explanation is that the systems are trained on the “wrong” data and merely perpetuate the biases of the past.  If they were trained on unbiased data, the explanation goes, they would achieve less biased results.  Bias in the training data surely plays a role, but I don’t think that it is the primary explanation for the bias.

Instead, it appears that the bias comes substantially from how we approach the notion of fairness itself.  We assess fairness as if it were some property that should emerge automatically, rather than a process that must be designed in. 

What do we mean by fairness?

In the Pro Publica analysis of recidivism, the unfairness derived largely from the fact that when errors are made, they tend to be in one direction for black defendants and in the other direction for white defendants.  This bias means that black defendants are denied bail when they really do not present a risk, and white defendants are given bail when they really should remain in custody.  That bias seems to be inherently unfair, but the race of the defendant is not even considered explicitly by the program that makes this prediction. 

In the case of programs like the Amazon hiring recommendation system, fairness would seem to imply that women and men with similar histories be recommended for hiring at similar rates.  But again, the gender of the applicant is not among the factors considered explicitly by the hiring system.
Race and gender are protected factors under US law (e.g., Title VII of the Civil Rights Act of 1964).  The law states that “It shall be an unlawful employment practice for an employer … to discriminate against any individual with respect to his compensation, terms, conditions, or privileges of employment, because of such individual’s race, color, religion, sex, or national origin.”

Although the recidivism system does not include race explicitly in its assessment, it does include such factors as whether the defendant has any family members who have ever been arrested, whether they have financial resources, etc.  As I understand it, practically every black person who might come before the court is likely to have at least one family member who has been arrested, but that is less often true for whites.  Black people are more likely than whites to be arrested, and once arrested, they are more likely than whites to be convicted and incarcerated.  Relative to their proportion in the population, they are substantially over-represented in the US prison system compared to whites.  These correlations may be the result of other biases, such as racism in the US, but they are not likely to be the result of any intentional bias being inserted into the recidivism machine learning system.  Black defendants are substantially more likely to be evaluated by the recidivism system and were more likely to be included in its training set because these same factors.  I don’t believe that anyone set out to make any of these systems biased.

The resumes written by men and women are often different.  Women tend to have more interruptions in their work history; they tend to be less assertive about seeking promotions; they use different language than men to talk about their accomplishments.  These tendencies, associated with gender are available to the system, even without any desire to impose a bias on the results.  Men are more likely to be considered for technical jobs at Amazon because they are more likely to apply for them.  Male resumes are also more likely to be used in the training set, because historically, men have filled a large majority of the technical jobs at Amazon.

One reason to be skeptical that imbalances in the training set are sufficient to explain the bias of these systems is that machine learning systems do not always learn what their designers think that they will learn.  Machine learning works by adjusting internal parameters (for example the weights of a neural network) to best realize a “mapping” from the inputs on which it is trained to the goal states that it is set.  If the system is trained to recognize cat photos versus photos of other things, it will adjust its internal parameters to most accurately achieve that result.  The system is shown a lot of labeled pictures, some of which contain cats, and some of which do not.  Modern machine learning systems are quite capable of learning distinctions like this, but there is no guarantee that they learn the same features that a person would learn.

For example, even given many thousand of training examples to classify photographs, a deep neural network system can still be “duped” into classifying a photo of a panda as a photo of a gibbon, even though both photos look to the human eye very much like a panda and not at all by a gibbon.  All it took to cause this system to misclassify the photo was to add a certain amount of apparently random visual noise to the photograph.  The misclassification of the picture when noise was added implies that the system learned features, in this case pixels, that were disrupted by the noise and not the features that a human used.

The recidivism and hiring systems, similarly, can learn to make quite accurate predictions without having to consider the same factors that a human might.  People find some features more important than others when classifying pictures.  Computers are free to choose whatever features will allow correct performance, whether a human would find them important or not.

In many cases, the features that it identifies are also applicable to other examples that it has not seen, but there is often a decrease in accuracy when a well-trained machine learning system is actually deployed by a business and applied to items (e.g., resumes) that were not drawn from the same group as the training set.  The bigger point is that for machine learning systems, the details can be more important than the overall gist and the details may be associated with the unfairness.

Simpson’s paradox and unfairness

A phenomenon related to this bias is called Simpson’s paradox, and one of the most commonly cited examples of this so-called paradox concerns the appearance of bias in the acceptance rate of men versus women to the University of California graduate school. 

The admission figures for the Berkeley campus for 1973 showed that 8442 men applied, of which 44% were accepted, and 4321 women applied, of which only 35% were accepted.  The difference between 44% and 35% acceptance is substantial and could be a violation of Title VII.

The difference in proportions would seem to indicate that the admission process was unfairly biased toward men.  But when the departments were considered individually, the results looked much different.  Graduate admission decisions are made by the individual departments, such as English, or Psychology.  The graduate school may administer the process, but it plays no role in deciding who gets in.  On deeper analysis it was found (P. J. Bickel, E. A. Hammel, J. W. O'Connell, 1975) that 6 of the 85 departments showed small bias toward admitting women and only four of them showed a small bias toward admitting men.  Although the acceptance rate for women was substantially lower than for men, individual departments were slightly more likely to favor women than men. This is the apparent paradox, departments are not biased against women, but the overall performance of the graduate school seems to be.

Rather, according to Bickel and associates, the apparent mismatch derived from the fact that women applied to different departments on average than the men did.  Women were more likely to apply to departments that had more competition for their available slots and men were more likely to apply to departments that had relatively more slots per applicant. In those days, the “hard” sciences attracted more male applicants than female, but they were also better supported with teaching assistantships and so on than the humanities departments that women were more likely to apply to. Men applied on average to departments with high rates of admission and women tended to apply to departments with low rates.  The bias in admissions was apparently not caused by the graduate school, but by the prior histories of the women, which biased them away from the hard sciences and toward the humanities.

A lot has been written about Simpson’s paradox and even whether it is a paradox at all.  The Berkeley admissions study as well as the gender bias and recidivism bias can all be explained by the correlation between a factor of interest (gender or race) and some other variable.  Graduate applications were correlated with patterns of department selection, gender bias in resume analysis is correlated with such factors as work history, language used to describe work, and so on.  Recidivism predictors are correlated with race.  Although these examples all show large discrepancies in the size of the two groups of interest (many more men applied to graduate school, many more of the defendants being considered were black rather than white, and many more the Amazon applicants were men), these differences will not disappear if all we do is add training examples. 

These systems are considered unfair, presumably because we do not think that gender or race should play a causal role in whether people are admitted, hired, or denied bail (e.g., Title VII).  Yet, gender and race are apparently correlated with factors that do affect these decisions.  Statisticians call these correlated variables confounding variables.  The way to remove them from the prediction is to treat them separately (hold them fixed).  If the ability to predict recidivism is still accurate when considering just blacks or just whites, then it may have some value.  If hiring evaluations are made for men and women separately, then there can be no unintentional bias.  Differences between men and women then, cannot explain or cause the bias because that factor is held constant for any predictions within a gender.  Women do not differ from women in general in gender-related characteristics, and so these characteristics are not able to contribute to a hiring bias toward men.
We detect unfairness by ignoring a characteristic, for example, race or gender, during the training process and then examining it during a subsequent evaluation process.  In machine learning, that is often a recipe for disaster.  Ignoring a feature during training means that that feature is uncontrolled in the result.  As a result, it would be surprising if the computer were able to produce fair results.

Hiring managers may or may not be able to ignore gender.  The evidence is pretty clear that they cannot really do it, but the US law requires that they do.  In an attempt to make these programs consistent with laws like Title VII, their designers have explicitly avoided including gender or race among the factors that are considered.  In reality, however, gender and race are still functionally present in the factors that correlate with them.  Putting a man’s name on a woman’s resume, does not make it into a male resume, but including the questions about the number of a defendant’s siblings that have been arrested does provide information about the person’s race.  The system can learn about them.  But what really causes the bias, I think, is that these factors are not included as part of the system’s goals.  

If fairness is really a goal of our machine learning system, then it should be included as a criterion by which the success of the system is judged.  Program designers leave these factors out of the evaluation because they mistakenly (in my opinion) believe that the law requires them to leave them out, but machines are unlikely to learn about them unless they are included.  I am not a lawyer, but I believe that the law concerns the outcome of the process, not the means by which that outcome is achieved.  If these factors are left out of the training evaluation, then any resemblance of a machine learning process to a fair one is entirely coincidental.  By explicitly evaluating for fairness, fairness can be achieved. That is what I think is missing from these processes. 

The goals of machine learning need not be limited to just the accuracy of a judgment.  Other criteria, including fairness can be part of the goal for which the machine learning process is being optimized.  The same kind of approach of explicitly treating factors that must be treated fairly can be used in other areas where fairness is a concern, including mapping of voting districts (gerrymandering), college admissions, and grant allocations.  Fairness can be achieved by discriminating among the factors that we use to assess fairness and including these factors directly and explicitly in our models.  By discriminating we are much more likely to achieve fairness than by leaving these factors to chance in a world where factors are not actually independent of one another.