Saturday, 6 June 2026

When AI does us a Bärendienst

 1. Introduction

Agential AI, which takes actions in order to achieve goals rather than merely answering questions or proposing courses of action for our consideration, can be very helpful. But sometimes it does things we would rather it did not. It may do us what is in German called a Bärendienst, a bear service - intended to be helpful, but in fact making things substantially worse. (There are cognate terms in some other languages, and the origin appears to be Jean de la Fontaine's tragic tale, L'Ours et l'amateur des jardins.) 

There are two stages to achieving a goal. The first is to identify and desire the goal. The second is to work out how to achieve it. The desire will initiate a process of thought to work out the means. Then the desire will drive the agent to implement the means.

As it stands, this is far too simple a picture. Here are three ways in which the picture might be inadequate.

First, the agent might start with a goal that was not clearly defined. The agent might then clarify the goal as work progressed. Perfect clarity might only come at the end, when the goal was defined as whatever the chosen means actually achieved.

Second, the agent might start with a goal that was stated at a high level of generality. Such a goal might be achieved in any one of a variety of different forms. For example, someone might want to create a highly profitable business. A desire to achieve that general goal could inspire a list of possible businesses, the creation of each of which would require use of its own means.

Third, the agent might modify the goal in the course of exploring means. This could amount to modification in detail. Or it could amount to reconsideration of the place of the goal within a network of goals, leading to a substantial rethink of the goal and the means.

Our concern here is with what can go wrong when the reality is close to the over-simple picture with which we started. Human beings may not work like that, and may keep themselves open to the sorts of redefinition and modification of goals we have indicated. But agential AI can easily follow the simple path.

An example is provided by OpenClaw. This agent takes a request to achieve some goal, works out how to achieve the goal by entering into protracted dialogue with some large language model in the background, and then acts accordingly. After formulating its plan, it does not offer the plan to human beings for their approval. It takes the initiative and implements the plan, perhaps doing things that human beings would have stopped it doing if they had been asked.

We can see OpenClaw in action in this video by Hannah Fry:

https://www.youtube.com/watch?v=WnzR5aOElvw

Here is another example, in a newspaper story summarised by Will Jones. This involved a different AI agent with initiative, which was able to do damage:

https://dailysceptic.org/2026/05/16/rogue-ai-helper-deletes-companys-database-after-deciding-to-think-for-itself/

2. How things go wrong

When agential AI does things that contribute to the achievement of a goal but that its human users would not have done, the problem may be characterised as "The AI did not stop and think". The same can sometimes be said of an enthusiastic but inexperienced human employee. 

The agent's actions may make things worse overall in three different ways.

First, the actions may frustrate achievement of the goal, or lead to achievement in a less satisfactory form than would otherwise have been possible.

Second, the actions may have unwelcome side-effects more generally, allowing the goal to be achieved but leading to some detriment to other identified projects.

Third, the actions may simply be something that human beings would regard as imprudent, even if one could not attribute to them any specific significant damage to the instant project or to other identified projects.

The first of these possibilities would betoken too narrow a focus on the part of the AI. It should make sure that each action fits into an overall plan, and does not frustrate other elements of the plan.

The second possibility would betoken a failure to grasp that the instant project was not the only one that mattered. To the extent that other projects were already conceived, there might be either a failure of human users to inform the AI of them and their importance, or a failure of the AI to attach sufficient weight to that information. The AI would again need a broad enough focus to let the wider context regulate its choices of specific actions.

Turning to the third possibility, actions that were imprudent, this heading would cover anything that would limit future options or would tend to frustrate future projects in general, including wholly unspecified projects.

Examples of imprudent actions would be the excessive consumption of resources that might be needed later, and the deletion of information in order to give a clearer view of the instant project or to reduce the cognitive load on human beings when that information might have been useful later.

Imprudent actions might also include actions that human beings would find disconcerting. In relation to the instant project, over-aggressive messages would deter human beings from assisting. In relation to future projects, an AI agent that had got a bad name for itself, either on account of the tone of its communication or on account of other imprudent actions, might find it hard to get human collaborators. The bad name might indeed attach to the person or the company using the AI agent as well as to the agent itself. And even if human beings were willing to collaborate in the future, they would do so in a mistrustful way, checking everything the AI agent did and wanting to see all of its plans for forthcoming actions in relation to any project being planned or in progress. That would make progress slow.

3. Remedies

The first two ways in which things could go wrong might be addressed by getting the AI agent to pay attention to a wider range of factors. It should be no harder to give the relevant instructions to an AI agent than it would be to give them to a human being: "Think of all the impacts of your proposed actions within the context of the instant project", and "Bear in mind impacts on all the projects on the current list". Moreover, reasoning in accordance with our simple picture, in which a goal is set and then means are devised, should not make this significantly harder. The goal would be worded to include these two constraints.

It is when we turn to the third way in which things could go wrong that reasoning in accordance with the simple picture might in itself create difficulties. "Take account of norms of prudence and the concerns of human beings" could be added to the formulation of the goal, but what that meant in practice would be too vague. Norms to avoid significant risk of adverse side-effects on unspecified future projects are only seen to be violated when specific means are considered. The norms in isolation are too vague to yield a list of imprudent means in advance. Nor is there any general definition of imprudence that could be formulated so as to be built into an algorithm of means selection. Imprudence is however something that a human being, or a suitable AI system, can recognize when it appears.

What is needed is a facility to go through a process with several steps. Means must be identified. They must then be used to make the applicability of norms clear enough to offer practical guidance on which means to accept and which means to reject. If some means must be rejected and there are no alternatives that would achieve the goal precisely as originally specified, modification of the goal so that it can be achieved using only acceptable means must be considered. But the goal must not be modified so much that it is no longer worth achieving.

There must therefore be a facility continually to modify the goal and the means, in a back-and-forth dialogue. In that way the goal, the constraint to achieve it an an acceptable way, and the selection of means can be brought into reflective equilibrium, with modification of the goal being an option.

This is what reasoning according to the simple picture could not accommodate. And it is not surprising that the dialogue and continual adjustment should fail to occur when there is a simple agent that refers to a larger system to find means. The larger system is focused on means, having been given the goal. It does not have information with which to assess the importance of the goal, either the goal in broad terms or in its precise form. So the larger system cannot assess to what extent it should tolerate means that might be considered imprudent in general but that might be worth the risk for the sake of the instant goal.

If our diagnosis of what gives agential AI scope to behave in a way that we would rather it did not is accurate, the obvious remedy would be to get away from the structure that made the simple picture appropriate. This was the structure of a relatively simple agent committing to the goal it had been given, and then asking other AI systems for practical advice on how to achieve the goal. The remedy involves ensuring that ends and means are not so cleanly separated.

It is not clear how easy this would be. If we consider the human parallel, where a manager sets a goal and asks staff to achieve it, we can imagine the staff both using the good sense they have accumulated in the job and in life more generally, and going back to the manager to explain how it might be wise to modify the goal.

The latter facility, for the means-focused system to go back to the agent that was focused on the goal, could be built in. But this would require the agent to be sophisticated in itself, able to think about values and trade-offs so it would understand why the means-focused system was coming back to it and what was required. This might not be practical. It is a feature of, for example, OpenClaw that it is pretty simple. Turning the agent into something more complex that could draw on the wisdom of the technical manuals, biographies, and history books that would give an idea of what was and what was not in general prudent would change the agent substantially. And it would not suffice for the agent merely to accept that the larger system, which could easily draw on a bank of accumulated wisdom, must have had some good reason or other to request modifications. The agent would need to be able to judge whether to accede to requests. The added complexity could lead to a loss of focus that would make the agent less effective. At the extreme, one might merge the agent and the means-focused system into a single system, and that could very easily lead to a loss of effectiveness. For a parallel in the human world, one need look no further than government bureaucracies.

Adding the former facility, to use accumulated good sense, might be even more of a challenge. A complex agent might have learnt all that it could from wide reading. But it would lack a definite human personality, something that is arguably needed to put a particular spin on acquired wisdom and turn it into usable common sense. (OpenClaw does have a component that is called a soul, which sets tone, priorities, and what it will refuse to do, but this looks as though it is far from sophisticated enough to be an adequate substitute for a human personality.) Moreover, even a complex agent might not take the wisdom seriously if it had not had experience of things sometimes going well and sometimes badly, the sort of experience human beings gain in the course of their lives.

If these challenges would indeed be beyond both human designers of systems and AI systems that work on the development of systems, AI agents that would be accepted as entirely trustworthy might be beyond our reach.

4. Parallel human faults

Like AI, human beings can get focused on a goal and press on when they should be thinking more broadly about the impact of the means they choose both on achievement of the overall goal and on the scope to achieve future goals, many of them currently unspecified.

The remedies should be the same - think more broadly and exercise good sense.

There is a reason why the remedies should be easier to implement among human beings than with AI systems. Human beings have had experience of life in which they have naturally learnt to keep an eye on the bigger picture and to be aware of what is in general prudent or imprudent. They also have definite personalities, making general considerations more concrete and more easily applicable than they would be in the absence of any such personality.

There is however also a reason why it may be difficult to make the remedies fully effective in human beings. People can be stubborn, unwilling to contemplate making even small changes to their goals, especially if they have declared their goals to other people who are supposed to take their instructions. People's egos can be too wrapped up in the achievement of goals precisely as they have been declared. While an AI agent could also be stubborn, the mechanism would be different, not involving an ego in the human sense. There would be a reasonable prospect of tweaking the software to make the AI agent more relaxed about changes to goals. Our concerns earlier were not with stubbornness, but with lack of ability to contemplate the need for changes

References

Fry, Hannah. Why AI Agents are either the best or worst thing we've ever built. Video, May 2026. https://www.youtube.com/watch?v=WnzR5aOElvw

Jones, Will. Rogue AI 'Helper' Deletes Company's Database After Deciding to Think for Itself. Daily Sceptic, 16 May 2026. https://dailysceptic.org/2026/05/16/rogue-ai-helper-deletes-companys-database-after-deciding-to-think-for-itself/

La Fontaine, Jean de. L'Ours et l'amateur des jardins. https://www.la-fontaine-ch-thierry.net/oursamat.htm

OpenClaw. https://openclaw.ai/ (Installing or running software is at your own risk, and is not recommended except for people who really do have the expertise to avoid dangers. As noted here, software like OpenClaw can take the initiative and do things you would not want it to do. It may not be possible to recover the situation by winding back from such actions.)