Friday 9 February 2024

Publication and the shape of the knowledge base

 Introduction

This post is about the dissemination and use of pieces of work that set out the results of research, that review specific pieces of research, or that survey fields of research.

Journals are the established form of dissemination. But technology has facilitated both new ways to make works available and new formats for individual works.

Old and new ways

The traditional journal is managed by an editorial team, printed on paper, and sent out by post (as well as often being available online). The number of papers per year is for practical reasons tightly limited. Submissions routinely far exceed capacity. Decisions as to what to publish are taken by editors, either at sight or on the basis of reports received from referees.

Consequences of this system are a degree of prestige for those authors whose papers are published, the rejection of a great deal of work that is of a perfectly good standard, and a degree of confidence among readers that published papers have been scrutinised and meet appropriate standards of quality. Readers do not however have grounds to think that published papers are better than at least a substantial number of the papers that are not accepted for publication.

This long-standing system remains significant, with the modification that some journals now only exist in online format. But original research of perfectly good quality can now appear in other ways.

Most conspicuously there are online repositories such as the arXiv and the Social Science Research Network. These repositories accept papers subject only to very light quality control, and they do not limit the numbers of papers accepted. Some of the papers will be modified by their authors, and the arXiv in particular has a system for keeping successive versions available. Some of the papers will go on to be published in more traditional journals, again perhaps in modified form. Such a traditionally published version is then regarded as the version of record, the one to cite when others refer to the research.

There are also online overlay journals which pick out papers in online repositories and link to them, sometimes with comments from the editors of the overlay journals. The idea is to bring together some papers particularly worth reading in a given field, and to add a layer of quality appraisal that is not provided by the repositories.

An environment of repositories and overlay journals separates the two tasks of making work available and directing attention to work of high quality, tasks which are combined in traditional journals. Far more work is made available, but those who want to have their attention directed to work that might well have made it into traditional journals can rely on overlay journals. Breaking down a single process into two tasks that can be performed separately has its advantages.

There are also ways to make work available that exist outside even the structure provided by organised repositories, although they carry a correspondingly low (but not zero) probability that links to the work will appear in overlay journals. Authors may post work on their own websites or blogs, and may draw attention to the work by posting links on social media.

Finally, long-standing journals may find themselves in competition with new ones that retain some of the features of their forebears. A recent example is the new journal Political Philosophy, which has been created by former editors of the Journal of Political Philosophy. The new journal is published online only, by the Open Library of Humanities, but it is too early to know whether this means that far more papers will be accepted than would be practical with a paper journal. Peer review is carried over from the traditional journal model. One thing that is facilitated by this new journal's online model, and by the fact that no commercial publisher is involved, is that articles will be free to access while no publication charge is imposed on authors. 

Publication and citation

There are those who do not regard papers placed on websites or in online repositories as published. They reserve that term for papers which have appeared in journals of the traditional sort, where there is an expectation that there will be quality control through peer review, through a restriction on the number of papers published even among papers that meet some standard which can be applied without full peer review, or through both. The view is that publication requires not only dissemination, but also a gatekeeper. 

It is tempting to see this as a grumpy old establishment keeping control of its territory. But there is another aspect. Works get cited as support for arguments in later works. Earlier works present arguments and results on which the authors of later works seek to rely. There is a case for saying that only works which have got past a gatekeeper should be considered respectable enough to be cited in this way. And it might be thought convenient to take publication in the traditional sense as the primary indicator that works are good enough to be cited.

There is however scope to challenge a view that the gatekeepers of traditional publication are the ideal source of quality control. One may doubt that the gatekeepers of traditional publication are always good at their job. A mere restriction to papers which strike the editors as good enough to outrank others in competition for the available space in journals is certainly not likely to weed out all or only the papers on which others should not rely. And peer review can vary greatly in its quality, both because reviewers may lack expertise in every aspect of a paper's topic and because time-pressured and unpaid reviewers may not devote great effort to the task. It does not help that reviewers' reputations are usually not on the line. It is normal for them to remain anonymous and for their reports not to be shared at all widely.

An alternative that would address such concerns in the new world of online repositories would be no control over initial acceptance but then open comment, by individuals who would be named so that their expertise could be assessed, either on papers as wholes or on specific points. So long as the comments were collected in repositories alongside the relevant papers, everyone could benefit. And adverse comment on one element in a paper, which might have led to a refusal to publish in a traditional journal (even if the verdict was to revise and resubmit, if the author could not or would not revise to meet the objection), would not deprive people of access to other elements which might be of considerable value to them. Moreover, an easy way to add comments would make it easy to add new material which might change one's view of the original paper or of parts of it. In the traditional system, drawing the attention of readers of the original paper to relevant new material may need to await publication of a whole paper which comments on the original paper. And the authors of such later papers may be so concerned to convey their own views that they do not bother to make all the points on earlier papers that they could make.

Overlay journals provide an additional quality control mechanism in the world of online repositories. They may suffer from the same disadvantages as traditional journals. Publication in an overlay journal is neither necessary nor sufficient for a paper to be worth reading. But overlay journals can provide the kind of gatekeeping that traditional journals provide, while the repositories over which they lie ensure that papers which would not pass a gatekeeper still become available.

Finally, we should note the different degrees of significance of the controls discussed here in different areas of work.

The risk of being driven to inappropriate conclusions as a result of relying on mistaken content of earlier work which might have been corrected if the earlier work had been subjected to better quality control arises most frequently and on the whole most severely in mathematics and the natural sciences, less frequently or severely in the social sciences, and least frequently or severely in the humanities. One reason is that deductive or close to deductive chains of reasoning are most common in mathematics and the natural sciences, and least common in the humanities. Some mistaken content of earlier work may drive the author of later work directly to a mistaken conclusion whenever the chain of reasoning is deductive or close to deductive, but if links in the chain are weaker an earlier mistake is less likely to drive the author of later work to a mistaken conclusion. If the links in the chain are nowhere near deductive and a prospective conclusion happens to look doubtful on some other ground, there will often be scope to place little weight on the earlier content even when it is not recognised as mistaken. This is however only how things stand on the whole, not universally. For example, a historian might misdate a document and that mistake might deductively rule out certain analyses by other historians of events in which the document played a role.

Turning to the use of research to choose practical actions, the greatest risk of serious mistake by relying on mistaken content of earlier work again arises when the earlier work was in mathematics or the natural sciences. Conclusions of such work come across as more solid than other considerations when choosing actions, so they are likely to carry considerable weight. Such conclusions can also have the potential to rule out certain actions absolutely. So undetected mistakes can have significant consequences. When it comes to the social sciences, there is likely to be more wariness of reliance on conclusions because it is clear that conclusions in the social sciences are less soundly based than those in the natural sciences. There is also less likelihood that a conclusion will rule out an action absolutely. Finally, conclusions in the humanities are not of a kind that should lead directly to choices of action, although they may well inform the outlooks of people who make such choices.

The shape of the knowledge base

Traditional publishing gives us individual papers which contain references to other papers. Within each paper one finds a large number of pieces of information. They are not explicitly isolated from one another as detachable items. Isolation might in any case be impractical without extensive re-writing because pieces of information are stated in ways that rely on their having the context of the rest of the paper. And connections between pieces of information are not usually set out explicitly in a web of links. Thus the largely implicit connections between pieces of information within a paper are mostly of a different nature from that of the connections to other papers that are given by explicit reference.

None of this need change with a move to online publishing in repositories, with or without overlay journals. But online technology does allow changes. In particular, papers no longer need to be confined to the linear form.

Let us present a radical possibility, while acknowledging that actual developments may turn out to be less radical. A paper could comprise a number of different files, which we shall call notes even though they might be in a final and polished form. Each note would set out one piece or a few pieces of information, with links between the notes and a contents list which would include links to all of them. That list could optionally set out the notes in a hierarchical structure or provide a suggested order in which to read them. Each note would have its own public identifier to facilitate linking to individual notes by other authors.

The overall effect would be that of a published Obsidian vault for each paper, or for all the work of a given author, with links between notes (whether in the same vault or in different vaults) as well as links to papers that were in more traditional forms. Ultimately, if this style became the norm and many links were also made between notes created by different authors in their own vaults, the effect would be much the same as that of one giant Obsidian vault which was made up of all the vaults of individual papers or authors.

One would want to maintain separate vaults for separate authors within such a giant vault, both to assign authorial control and to allow attribution. So the giant vault would be a notional one. But others could send an author their suggestions for amendments, which the author could accept or reject. Systems for recording versions and handling suggestions could be added, perhaps using software on the lines of Git.

There would be an effect on the ways in which authors used other authors' work. The notional giant vault would be full of notes that captured single thoughts or small groups of thoughts, notes of the sort that when they contain single thoughts are called atomic notes. Searches might tend to lead to the thoughts of several authors on a specific point, rather than to the thoughts of one author on that point and related points. On the one hand this would be an advantage. But on the other hand there would be an increased risk of gathering atomic notes from several authors and misinterpreting them because they were viewed outside their original contexts. The new context of the collection of notes on a single point would tend to drive the contexts of the original papers out of readers' minds.

Correspondingly, there would be less focus on individual papers. An imperative to read papers by authors B, C, and D on some topic would be replaced by an imperative to comb the giant vault for whatever had been said on particular points. One might gain in depth on specific points, but lose an appreciation of overall ways to handle topics.

Attribution

The traditional approach of complete papers which have named authors and which are accessed as wholes gives clarity of attribution. If a paper includes a piece of information, an idea, or an overall approach to a topic, then the information, idea, or approach should be attributed to that paper's author unless he or she acknowledges or should have acknowledged another source.

Attribution could be preserved even if all work was within the notional giant vault. Individual authors would have their own vaults, and new vaults could be created for specific groups of authors working together on single projects. There would be a minor complication when a team worked together so that they wanted a single vault but it was thought appropriate to attribute some particular piece of work within a project only to selected members of that team, but notes within the vault that related to such a piece of work could be tagged appropriately. The difficulty would be no greater than exists in the traditional system when a team produces papers that are to be attributed to selected members.

Attribution might however be affected, whether work was produced by one author working alone or by all or part of a team. When an idea is seen within a whole paper, the context can serve to distinguish it from similar ideas in other papers. If however ideas tended to be seen in an atomic way, detached from the contexts of the vaults in which they occurred, similar ideas might be seen as so close that it became unclear whether one author or another could really claim ownership of it. There might be visible priority in time, if all notes were time-stamped and the history of amendments to them could be viewed. But when the interval of time between two notes was small, the different authors were all working within the same intellectual context (a context which would be enlarged, merging contexts that might otherwise have been seen as different, by the existence of the notional giant vault), and both of the ideas could easily have developed from the context as it was shortly before either of them had been written down, such an order of temporal priority would be of little significance for the purposes of attribution.

(We may add that when notes written by different authors supplied the same idea, rather than similar ideas, or the same piece of information, and copying could be excluded, it would be appropriate simply to attribute the idea or the information to each of the authors independently.)

So a move from complete papers to atomic notes within a notional giant vault could at least sometimes reduce clarity of attribution. Would this matter?

From the point of view of individual authors affected, it might matter a great deal. People like to be given credit for their own ideas. On the other hand, if attribution ever came to be seen as unimportant, more useful work might be produced because producers would not devote any time or mental capacity to tracking down the sources of ideas and giving due credit except when that was necessary in order to show that some claim on which the new work relied had indeed been established by earlier work. More generally, if attribution came to be seen as unimportant, the focus would be on the corpus of knowledge that had been generated and was being enlarged all the time rather than on the contributions of particular people.

A bright future

If the dissemination of work were to develop along the lines indicated here, at least in the direction of a notional giant vault of atomic notes, then the future could very well be on balance brighter than it would have been in the absence of such development, whether or not the extreme of a notional giant vault was ever reached. Searches for material relevant to new work would be faster and more comprehensive, new work could be made available quickly and without being compiled into the currently recognised form of a complete paper, and gaps in knowledge could be filled in one by one as and when material to fill them occurred to individual authors.

On the other hand, the discipline imposed by the recognised form of a complete paper might be lost. There is something to be said for requiring an author to set out in sequence the question addressed, how evidence was gathered, what evidence was obtained, the argument to conclusions, the conclusions themselves, and a discussion of their significance. Such discipline could however be restored by a norm that a batch of notes would be accompanied by a table of contents with links to the individual notes, such that when the notes were read in the order given the traditional sequence would be followed.

One danger to avoid would be a slide towards centralisation, with some authoritative figures seeing a notional giant vault as requiring management and directing individuals to working on certain topics in certain ways. Fortunately software like Obsidian works perfectly well with individuals all doing their own thing, even if there is notionally a single giant vault. Equally fortunately, academics and the like are strongly inclined to do the work they choose and to do it in their own ways. If there is a danger, it comes from people in authority threatening non-conformists with obstruction to their career progression. But any such danger should not be allowed to obstruct the spread of new ways to disseminate work done, or the growth of banks of work in forms that may be more useful than the traditional forms.

References

arXiv: https://arxiv.org/

Git: https://git-scm.com/

Journal of Political Philosophy: https://onlinelibrary.wiley.com/journal/14679760

Obsidian: https://obsidian.md/

Open Library of Humanities: https://www.openlibhums.org/

Political Philosophy: https://politicalphilosophyjournal.org/

Social Science Research Network: https://www.ssrn.com/