Thursday 16 February 2023

Chatbots, AI, education, and research

Some sophisticated chatbots are now available. One is ChatGPT, which is being incorporated into Microsoft's search engine Bing. Another is Bard, which is being incorporated into Google's search engine. Connection with web searches allows bots to do more than write in a human style. They can also gather information on which to base what they write.

This post discusses some implications of chatbots, and of artificial intelligence (AI) more generally, for education at the university level and for academic research. We shall start with education, because that is the area in which the greater number of people will be affected directly. Some of what we shall say in relation to education will carry over to research. This is a natural consequence of the fact that at the university level, students should develop skills of enquiry, analysis and the reporting of conclusions that are also needed in research.

We shall assume that AI will get a good deal better than it is at the moment. In particular, we can expect systems that meet the needs of specific disciplines to develop. Such systems would draw on appropriate corpora of material both in their training and to respond to queries, and would reason their way to conclusions and present the results of their work in ways that were appropriate to their disciplines.

References to publications cited are given at the end of the post.

Education - developing students' minds

When a student finishes a course of education, they should not only have acquired information. They should also have developed ways to think that would be effective if they wanted to respond to novel situations or to advance their understanding under their own steam. Even if they did not continue to work in the disciplines in which they had been educated, such skills would be useful. Many careers demand an ability to think clearly and to respond intelligently to novel challenges. And there is also the less practical but still vital point that the ability to think for oneself is important to a flourishing life.

Ways to think are developed by making students go through the process of finding information, organizing it, drawing conclusions, and writing up the results of their work. They must start with problems to solve or essay questions. Then they must work in laboratories, or use libraries and online repositories to find relevant source material (whether experimental data, historical evidence, or academic papers). They must think through what they have found, draw conclusions, and produce solutions to the problems set or essays on the relevant topics.

If critical stages in this process were outsourced to computers, educational benefit would be lost. But which stages should not be outsourced, and which stages could be outsourced harmlessly?

The traditional function of search engines, to show what material is available on a given topic, seems harmless. It looks like merely a faster version of the old custom of trawling through bibliographies or the footnotes in books and articles which surveyed the relevant field. 

Search engines do however have an advantage besides speed over old methods. A search engine will typically put what it thinks will be the most useful results first. This is helpful, so long as the search engine's criteria of usefulness match the student's needs. But even then, it can detract from the training that the student should receive in judging usefulness for themselves.

The latest generation of search engines, with ChatGPT, Bard or the like built in, take this one step further. If students can express their requirements with sufficient clarity and precision to prompt the search engines appropriately, the results can be well targeted. Rather than giving several pages of links, the search engines can provide statements in response to questions. References to support those statements could be supplied along with the statements, but if not, they would be reasonably easy to find by making further searches. The assistance of search engines in going directly to answers to questions might be helpful, but it would also take away practice in reviewing available items of material, judging their relative reliability and importance, and choosing which items to use.

Moving on to putting material to work, developing arguments, and reaching conclusions, there is not much sign that the new generation of chatbots will in their current form be helpful.

This reflects the way in which they work. (A good explanation is given by Nate Chambers of the US Naval Academy in his video ChatGPT - Friendly Overview for Educators.) When working out what to say, they rely on knowledge of how words are associated in the many texts that can be found online. They get the associations of words right, but they do not first formulate sophisticated ideas and then work out the best ways to express them. Texts they produce tend to be recitals of relevant bits of information in some reasonably sensible order, rather than arguments that move from information to conclusions which are supported by the intelligent combination of disparate pieces of information or by other sophisticated reasoning.

So training in the important skill of constructing arguments would not seem to be put at risk by students' use of chatbots. But there is other software to consider, software which can engage in sophisticated reasoning. AI is for example used in mathematics (a web search on "theorem prover" will turn up examples), and in chemistry (Baum et al., "Artificial Intelligence in Chemistry: Current Trends and Future Directions").

If students relied on software like that to respond to assignments set by their professors, they would not acquire the reasoning skills they should acquire. And it would be no answer to say that if they went on to careers in research, they would always be able to rely on such tools. If someone had not in their training reasoned their own way through problems, they would not understand what the conclusions provided by AI systems really meant. Then they would not be able to appreciate the strengths and the weaknesses of those conclusions. They would also be unable to assess the level of confidence one should have in the conclusions, because they would not have a proper grasp of the processes by which the conclusions were reached and what false steps might have been made.

Software that does all or most of the reasoning required to reach interesting conclusions is not to be expected everywhere. We can expect it to be far more widespread in mathematics and the natural sciences than in other disciplines. In the social sciences and the humanities, there may be data which are open to being marshalled into a form that is suitable for mathematical analysis, and such analysis may yield robust conclusions. Some such analyses may be found under the rubric of digital humanities. But while such analyses might be devised and conducted entirely by AI systems, and their results might be eminently publishable, those results would be unlikely to amount to conclusions that conferred insights of real value, at least not in the humanities and to some extent not in the social sciences either. Humane interpretation of the sort that may confer Verstehen is still needed, and that does not yet appear to be a forte of AI.

Having said that, where the thinking that AI might be expected to do would suffice to reach interesting conclusions, the combination of such a system with a language system to write up the results could be powerful. All the work could be done for the student. 

Such a combination will surely be feasible in the near future in mathematics and the natural sciences, where the results of reasoning are unambiguous and there are standard ways to express both the reasoning and the results. There are already systems, such as SciNote's Manuscript Writer, which will draft papers so long as their users do some initial work in organizing the available information. That package is, as its maker states, not yet up to writing the discussion sections of papers, but we should not suppose that accomplishment to be far away. To write discussion sections, a system would need to have a sense of what was really interesting and of how results might be developed further. But we should not suppose such a sense to remain beyond AI systems for long.

In other disciplines, and particularly in the humanities, it is much less clear that such a combination of AI reasoner and AI writer would be feasible in the near future. The results of reasoning are more prone to ambiguity, and ways to express those results are less standardized. It might also not be feasible to break down the task of building a complete system into two manageable parts. The distinction between reasoning and final expression for publication is never absolute, but it is decidedly hazy, involving substantial influences in both directions, outside the natural sciences.

Another issue is that in work of some types the expression of results, as well as the reasoning, need to be apt to confer Verstehen. As with reasoning, that does not yet appear to be a forte of AI. The main obstacle, in relation both to reasoning and to expression, may be that AI systems do not lead human lives. They do not have a grasp of humanity from the inside. Human experience might one day be fabricated for them, but we are some way off that at the moment. (There is more on the themes of Verstehen and the human point of view in Baron, Confidence in Claims, particularly section 5.6.) Having said that, AI may well improve in this respect. Systems may come to align their ways of thinking with human ways by having human beings rank their responses to questions, a method that is already in use to help them to get better at answering questions in ways that human beings find helpful.

We may conclude that AI could put the development of any of the skills that students should acquire, from gathering and organizing information to drawing conclusions and expressing them, at risk, but that the relevant software would vary from skill to skill and the threat would probably become serious in the natural sciences first, then in the social sciences, and finally in the humanities.

Education - grading

There has been concern among professors that students will use ChatGPT to write essays. So far, the concern seems to be exaggerated. If ChatGPT is offered a typical essay title, the results are often poor assemblies of facts without any threads of argument, and are sometimes laughably incorrect even at the merely factual level. But chatbots will get better, and may merge with the reasoning software we have already mentioned to yield complete systems which could produce work that would improperly earn decent grades for students who submitted it as their own.

Some remedies have been suggested. One remedy would be to make grades depend on old-fashioned supervised examinations, perhaps on computers provided by universities so as to compensate for the modern loss of the skill of handwriting while not allowing the use of students' own computers which could not reliably be checked for software that would provide improper assistance. Another remedy would be to make students write their assignments on documents that automatically tracked the history of changes so that professors could check that there had been a realistic amount of crossing out and re-writing, something which would not be seen if work produced elsewhere had simply been pasted in. A third remedy would be to quiz students on the work submitted, to see whether they had actually thought about the ideas in their work. A fourth remedy would be to set multi-part assignments, with responses to each part to be submitted before the next part was disclosed to students. This idea relies on the fact that current software finds it difficult to develop arguments over several stages while maintaining coherence. Finally, anti-plagiarism software is already being developed to spot work written by chatbots, although it is not clear whether it will be possible for detection software to keep up with ever more sophisticated reasoning and writing software.

Alternatively, AI might shock educators into the abolition of grading. It is not that AI would adequately motivate the abolition of grading. Rather, it could merely be the trigger for such a radical move.

There are things to be said against such a move. Grades may be motivators. Employers like to see what grades potential employees have achieved. And there are some professions, such as medicine, in which it would be dangerous to let people work if their knowledge and skills had not been measured to ensure that they were adequate.

There are however things to be said in favour of the abolition of grading. Alfie Kohn has argued that a system of grading has the disadvantage of controlling students and encouraging conformity (Kohn, Punished by Rewards). Robert Pirsig puts into the mouth of his character Phaedrus an inspiring account of how students at all levels can actually work harder and do better when grades are abolished. As he puts it when commenting on the old system:

"Schools teach you to imitate. If you don’t imitate what the teacher wants you get a bad grade. Here, in college, it was more sophisticated, of course; you were supposed to imitate the teacher in such a way as to convince the teacher you were not imitating, but taking the essence of the instruction and going ahead with it on your own. That got you A's. Originality on the other hand could get you anything - from A to F. The whole grading system cautioned against it." (Pirsig, Zen and the Art of Motorcycle Maintenance, chapter 16)

So the end of grading could have its advantages, particularly in the humanities where there is, even at the undergraduate level, no one right way to approach a topic. (This is not to say that all ways would be acceptable. Some would clearly be wrong. And at the detailed level of how to check things like the reliability of sources, there may be very little choice of acceptable ways to work.)

Research - the process

AI that collates information, reasons forward to conclusions, and expresses the results can be expected to play a considerable role in research in the reasonably near future. There is however some way to go. Current systems seem to be good at editing text but not so good at generating it in a way that ensures the text reflects the evidence. Their specialist knowledge is insufficient. And as noted above, they cannot yet write sensible discussion sections of scientific papers, let alone sensible papers in the humanities. (For an outline of current capabilities and shortcomings see Stokel-Walker and Van Noorden, "What ChatGPT and Generative AI Mean for Science".)

Influences of AI on research will be likely to parallel influences on education. The burdens of tracking down, weighing up and organizing existing information, analysing new data, reasoning forward to interesting conclusions, and expressing the results of all this work might be taken off the shoulders of researchers in the near future, leaving them only with the jobs of deciding what to investigate and then reviewing finished work to make sure that the AI had considered a suitably wide range of evidence, that it had reasoned sensibly, and that the conclusions made sense. Having said that, the issues would be different.

There could be very great benefit if more research got done, particularly when research addressed pressing needs such as the need for new medical treatments or for greater efficiency in engineering projects. There would be no such benefit in the context of education rather than research, although if the use of AI made education more efficient students might progress to research sooner.

There is also the point that if an AI system absorbed the content of all the research being done in a given area and interacted with human researchers, this could create a hive mind of pooled expertise and knowledge which would be more effective than the hive mind that is currently created by people reading one another's papers and meeting at conferences. (We here mean a hive mind in the positive sense of sharing expertise and knowledge, not in the negative sense of conformity to a majority view.)

The development of minds by thinking through problems would be less important at the level of research, because minds should already have been developed through education. The loss of opportunities for development on account of the use of AI would however still be a loss. Every mind has room for improvement. In addition to concern about the continued development of general skills, there is the point that not actively reasoning in the light of new research would reduce the extent to which someone came to grasp that research and its significance. Finally, only a researcher who had a firm grasp of the state of the discipline, including the most recent advances, would be able to judge properly whether the results of AI's work passed the important test of making sense.

A concern that is related to the development of minds is that there would be a risk that any novel methods of reasoning by AI would not be properly understood, leading to the misinterpretation of conclusions or a failure to estimate their robustness properly. Well-established techniques should have been covered in a researcher's student days, but new techniques would be developed. A researcher who had never gone through the process of applying them laboriously using pen and paper might very easily not have a proper grasp of what they achieved, how reliably they achieved it, and what they did not achieve.

Another concern is that the processes of reasoning by AI systems could be opaque to human researchers. This would be an instance of the AI black box problem. Reasoning might be spelt out in the presentation of work, but there would be a risk that the reasoning as represented was not the actual reasoning. If satisfactory reasoning were set out, that might appear to address the issue. But that reasoning might not in fact be related appropriately to the evidence (while the internal reasoning was so related), and this might not be noticed by human researchers reviewing the work.

One specific form of opacity that should concern us is the risk that when AI systems search for material, they may be influenced by inappropriate criteria. Search engines can already give high rankings to links that it is in their commercial interests to favour, or can push down the rankings material that is disfavoured by the political establishment (the practice of shadow-banning). If AI used by researchers did the same sort of thing, research could be skewed in wholly improper ways.

Research - credit

Researchers like to get credit for their work, and are annoyed when other people take credit for work that is not their own. Names on publications need to be the right ones, and if any material is taken from someone else's work it must be attributed in a footnote. One reason for this ethos is that non-compliance would be considered to amount to bad manners, or theft, or something in between these extremes. Another reason is that jobs, promotion and funding depend on the work one has done, so each person needs to be able to take credit for their own work, whether it is published by them or used by others.

Now suppose that a researcher had relied on an AI system to organize material and reason the way to conclusions, rather than merely to find material. And suppose compliance with the minimal requirement that the use of AI should be disclosed. How should its use affect the allocation of credit?

One might argue that the use of AI was not in principle different from the use of any other tool. In many disciplines, the use of sophisticated computer systems is routine and is not thought to give rise to special issues of attribution. Even in the pre-computer age, people relied on bibliographies and on other people's footnotes to track down material, and that was not thought to lessen the credit due to researchers who relied on such aids.

On the other hand, AI systems that were good enough to help with reasoning would learn as they went along, pooling knowledge gained in their use by all researchers in a given field. Then any particular user would rely indirectly on the work of other researchers, and might easily be unaware of which other researchers' work was involved. There could be no more than a general acknowledgement, enough to tell the world that not everything was the author's own work but not enough to give credit to specific previous researchers.

We should not however think that such a failure to credit previous researchers would be entirely new. At the moment, identifiable contributions by others are expected to be acknowledged. But there is also the accumulated wisdom of a discipline, which may be called common knowledge or prevalent ways to think. That wisdom depends on the contributions of many researchers who are not likely to be acknowledged. One may stand on the shoulders of people of middling stature without knowing who they were, and it is not thought improper to fail to acknowledge them by name. The new difficulty that would be created by the use of AI to produce reasoning would not lie there. It would instead be be that contributions which would be easy to acknowledge if one worked in traditional ways might accidentally go unacknowledged. On the other hand, one might get an AI system to track such contributions and generate appropriate footnotes.

We now turn to the significance of credit when allocating jobs, promotion and funding. The basic idea is a sensible one. The aim should be to employ, promote and fund people who produced the best work, and ensure that it was their own work without undisclosed reliance on other people's work. How might the use of AI to reason the way to conclusions matter? We shall again assume that its use would be disclosed.

Given that the difference made by the use of AI would vary from one piece of work to another, and that a researcher who routinely relied on AI might in fact have the talent to do just as well without its help, it might become harder to decide reliably between candidates. On the other hand, such decisions are unlikely to be particularly reliable as it is, at least not when choices are made between several candidates all of whom are of high quality. So any loss might not be great.

Finally, there is the question of credit for the clear and elegant description of work and expression of conclusions. AI that was rather more sophisticated than currently available could do this work, and a human author might take credit. Fortunately it is not normal to credit other researchers for one's style of writing anyway, so there would be no appropriate footnotes giving credit to others to be omitted even if the AI had developed its style by reviewing the work of many researchers. And when it comes to the allocation of jobs, promotion and funding, reasoning should in any case be a good deal more important than style.


Baron, Richard. Confidence in Claims. CreateSpace, 2015.

Baum, Zachary J., Xiang Yu, Philippe Y. Ayala, Yanan Zhao, Steven P. Watkins, and Qiongqiong Zhou. "Artificial Intelligence in Chemistry: Current Trends and Future Directions". Journal of Chemical Information and Modelling, volume 61, number 7, 2021, pages 3197-3212.

Chambers, Nate. ChatGPT - Friendly Overview for Educators.

Kohn, Alfie. Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes, with a new afterword by the author. Boston, MA, Houghton Mifflin, 1999.

Pirsig, Robert M. Zen and the Art of Motorcycle Maintenance: An Inquiry into Values, 25th anniversary edition. London, Vintage, 1999.

SciNote Manuscript Writer.

Stokel-Walker, Chris, and Richard Van Noorden. "What ChatGPT and Generative AI Mean for Science" (corrected version of 8 February 2023). Nature, volume 614, number 7947, 2023, pages 214-216.

No comments:

Post a Comment