Monday 18 March 2013

Artificial intelligence and values

An article in the latest issue of Cambridge Alumni Magazine (issue 68, pages 22-25, but 24-27 of the version readable online), discusses the work of the Centre for the Study of Existential Risk. Huw Price, the Bertrand Russell Professor of Philosophy at Cambridge, takes us through some of the issues.

The magazine is available here:

and some information about the Centre is available here:

I found this comment by Huw Price particularly striking:

"It is probably a mistake to think that any artificial intelligence, particularly one that just arose accidentally, would be anything like us and would share our values, which are the product of millions of years of evolution in social settings."

This is an interesting thought, as well as a scary one. It leads us to reflect on what it would be for an artificially intelligent entity to have alternative values.

I shall start by considering what values do. I take it that they are, or provide, resources that allow us to decide what to do, when the choice is not to be made on purely instrumental grounds. That is, the choice is not to be made purely by answering the question, "What is the most efficient way to achieve result R?". The choice as to what to do might be one that was not to be made on purely instrumental grounds, either because of the lack of a well-defined result that was definitely to be achieved, or because it was not clear that all of the possible means, drawn from the feasible range, would be justified by the ends.

It would be possible for some artificially intelligent entity not to have any values at all, even if it could always decide what to do. It might choose both its goals and its methods by reference to some natural feature that we would regard as very poorly correlated, or as not correlated at all, with any conception of goodness. For example, it might always decide to do what would minimize entropy on Earth (dumping the corresponding increase in disorder in outer space, since it could not evade thermodynamics). We may not care about outer space, but we know that the minimization of entropy in our locality is not always a good ethical rule. When such a decision procedure failed to give clear guidance, the system could fall back on a random process to make its choices, and we would be at least as dissatisfied with that, as with the entropy rule.

The entropy example shows that there are things that do the job that I have just assigned to values, allowing entities to make decisions when the choice is not to be made on purely instrumental grounds, but that do not count as values. It is a nice question, how broad our concept of values should be. But we probably would have to extend it to naturalistic goals such as the maximization of human satisfaction, in order to have an interesting discussion about the values that artificially intelligent entities might have, in the context of the current or immediately foreseeable capacities of such entities. That is, we would have to allow commission of the naturalistic fallacy (supposing it to be a fallacy). It would be a big step, and one that would take us into a whole new area, to think of such entities as having the intuitive sense of the good that G E Moore would have had us possess.

We not only require the possessor of values to meet a certain standard in the content of its values, the standard (whatever it may be) that tells us that a purported value of entropy minimization does not count. We also require there to be some systematic method by which it gets from the facts of a case, and the list of values, to a decision. Without such a method, we could not regard the entity as being able to apply its values appropriately. It would, for practical purposes, not have values.

Sometimes the distinction between values and method is clear: honesty and courage are values, and deciding what action they recommend in a given situation is something else. At other times, the distinction is unclear. A utilitarian has the supreme value of the promotion of happiness, and a method of decision - the schema of utilitarian computations - that is intimately bound up with that value. From here on, I shall refer to a system of values, meaning the values and the method together.

Now suppose that an artificially intelligent entity had some decision-making resources that we would recognize as a system of values, but that we would say was not one of our systems of values, perhaps because we could see that the resources would lead to decisions that we would reject on ethical grounds, and would do so more often than very rarely. What would need to be true of the architecture of the software, for the entity to have that system of values (or, indeed, for it to have our system of values)?

One option would be to say that the architecture of the software would not matter, so long as it led the entity to make appropriate decisions, and to give appropriate explanations of its values on request.

That would, however, give a mistaken impression of latitude in software design. While several different software architectures might do the job, not just any old architecture that would yield appropriate decisions and explanations most of the time, would do.

The point is not that a badly chosen architecture, such as a look-up table that would take the entity from situations to decisions, would be liable to yield inappropriate decisions and explanations in some circumstances. An extensive look-up table, with refined categories, might make very few mistakes.

Rather, the point is that when the user of a system of values goes wrong by the lights of that system - falls into a kind of ethical paradox - we expect the conflict to be explicable by reference to the system of values (unless the conflict is to be explained by inattention, or by weakness in the face of temptation, and an artificially intelligent entity should be immune from both of those). That is, a conflict of this kind should shed light on a problem with the system of values. This is one reason why philosophers dream up hard cases, in order to expose the limits of particular sets of values, or of general approaches to ethics: the hard cases generate ethical paradoxes, in the sense that they show how decisions reached in particular ways can conflict with the intuitions that are generated by our overall systems of values. If the same requirement that conflicts should shed light on problems with systems were to hold for the use that an artificially intelligent entity made of a system of values, the software architecture would need to reflect the structure of the system of values, and in particular the ways in which values were brought to bear in specific situations. Several different architectures might be up to the job, but not just any old architecture would do.

Given a systematic software architecture, for a system of values that we would regard as unacceptably different in its effects from our own system of values, we can ask another question. Where would we need to act, to correct the inappropriateness?

We should not simply make superficial corrections, at the point where decisions were yielded. That would amount to creating a look-up table that would override the normal results of the decision-making process in specified circumstances. That would not really correct the entity's system of values. It would also be liable to fail in circumstances that we did not anticipate, but in which the system of values would still yield decisions that we would find unacceptable.

We would need to make corrections somewhat deeper in the system. Here, a new challenge might arise. As noted above, it is possible for values and methods of decision to be intertwined. In such value systems in particular, and perhaps also in value systems in which we do naturally and cleanly separate values from methods, it is perfectly possible that the software architecture would have intertwined values and methods in ways that would not make much sense to us. That is, we might look at the software, and not be able to see any natural analysis into values and methods, or any natural account of how values and methods had combined to produce a given overall decision-making system.

This could easily happen if the software had evolved under its own steam, for example, in the manner of a neural net. It could also happen if the whole architecture had been developed entirely under human control, but with the programmers thinking in terms of an abstract computational problem, rather than regarding the task as an exercise in the imitation of our natural patterns of thought about values and their application. Even if the intention were to imitate our natural patterns of thought, that might not be the result. The programming task would be vast, and it would involve many people, working within a complex system of software configuration management. It is perfectly possible that no one person would have a grasp of the whole project, at the level of detail that would be necessary to steer the project towards the imitation of our patterns of thought.

A significant risk may lurk here. If we become aware that artificially intelligent entities are operating under inappropriate systems of values, we may be able to look at their software, but unable to see how best to fix the problem. It might seem that we could simply turn off any objectionable entities, but if they had become important to vital tasks like ensuring supplies of food and of clean water, that might not be an option.

No comments:

Post a Comment