TEI Models for the Publication of Social Sciences and Humanities Journals: Opportunities, Challenges, and First Steps Toward a Standardized Workflow

The TEI Guidelines are developed and curated by a community whose main purpose is to standardize the encoding of primary sources relevant for humanities research and teaching. But other communities are also working with TEI-based publication formats. The rst goal of this paper is to raise awareness of the importance of TEI-based scholarly publishing as we know it today. The

TEI Models for the Publication of Social Sciences and Humanities Journals: Opportunities, Challenges, and First Steps Toward a Standardized Workflow 3 4 This paper draws on the questions that were at the core of Holmes and Romary (2011) and that initiated the creation of the jTEI format, now used among others by the Journal of the Text Encoding Initiative. 1 It is not our purpose to discuss this schema or possible amendments to it, nor to discuss the respective technical benets of TEI vs. JATS (Journal Article Tag Suite, http:// jats.niso.org/1.1/). We have two distinct goals in building upon Holmes and Romary (2011). The rst goal is to raise awareness of the importance of TEI-based scholarly publishing as we know it today. The second is to contribute to a reection on the development of a TEI customization that would cover the whole authoring-reviewing-publishing workow and guarantee archiving options that are as solid for journal publications as what we now have for primary sources published in TEI. The encoding ideas we propose are to be considered as explorative.

5
In a rst step, we will present a panorama of the use of TEI in social sciences and humanities (SSH) scholarly publishing and consider the advantages and challenges of using TEI-based formats in scholarly publishing in the humanities. Which organs are currently doing so and how much text does this represent? Why are these journals using a TEI format? We will then relate our experience as editors, with the main aim of initiating a discussion within the community on the role of TEIbased formats for scholarly publishing at large, focusing on the copyediting and reviewing process.
We will argue that such formats have the potential to be a powerful leverage to increase the TEI's impact on the scholarly community and to empower humanities scholars for better dissemination of their own research. We will propose some preliminary ideas for encoding a complete publishing workow of secondary sources that include the reviewing and copyediting process.
The following is an overview of TEI-based journal publications that was gathered mainly by initiating an informal survey on the TEI-L discussion list. 2 Since there is no other overview of this type known to us in this area, we assume it is the best summary of the current situation. 3

In Europe 8
The largest actor in Europe is France, where a long history of political centralization led to the development of national infrastructures and services that has allowed the deployment of TEIbased publishing formats and TEI-compliant platforms on a national scale. France has three main providers for secondary sources that rely on the TEI: OpenEdition, HAL, and Istex.

9
OpenEdition 4 is a platform hosting four dierent services: an academic calendar announcing events such as conferences and call for papers (Calenda), a scholarly blog platform called Hypotheses, a scholarly book service (OpenEdition Books), and a scholarly journals platform (OpenEdition Journals, called revues.org in the past). The Journal of the Text Encoding Initiative is hosted by OpenEdition. Almost all of the journals published on the journals.openedition platform are TEI-based. 5 The articles published as of February 15, 2021 amount to a total of 310,515 documents. 6 The book platform also provides many TEI-based documents (204,869 as of February 15,2021), 7 which adds up to a total of over half a million documents, some of them being books, that is, rather large documents.

10
Although the TEI les for these articles and books are theoretically available for reuse and can be used for research purposes, there is currently no direct access like a download link or button that would make harvesting easy. When asked about the feasibility of such an endeavor, OpenEdition mentions 8 local legal issues but a clear willingness to support research projects that would need to download these les, should such a research project issue a request (which has never yet ever happened).

11
The second French infrastructure to be TEI-based is not strictly a platform for scholarly publishing but a publication repository that can be used for either preprint or postprint open access (OA) publication. Unlike OpenEdition, which was initiated by a research project and gained momentum because it met the needs of the scholarly community and the national political agenda, the open archive HAL 9 was from its onset a national infrastructure, conceived as a service to the French scholarly community at large, with a specic eort directed towards humanities disciplines that had resisted noncommercial archiving strategies even long after other communities relied on them. Scholars can either register metadata concerning their publication or additionally archive one or several preprint versions. It is the metadata that are stored in a TEI format, potentially allowing a wide overview of French scholarly publications.

12
The signicance of the HAL data has grown since July 2018 when the French Ministry for Higher Education and Research issued an Open Science Plan that led the main funding agency ANR to require HAL publications of research output they fund from then on. 10 It was followed in this constraining requirement by the national research evaluation organ HCERES, which issued a statement that they would only take into consideration full-text HAL publications when evaluating universities. 11 As for the CNRS (a national organization employing only researchers and research assistants) and Inria (national organization for research in informatics), they also require researchers to use HAL for evaluation purposes on both the lab and the individual level. At the CNRS, HAL import functionalities are currently being transferred into the internal databases that harvest annual reports. 12 Moving to the TEI-based HAL is a political evolution that took some time to be implemented throughout the research ecosystem, but as of 2020 there is virtually no single publication produced in France that will not have at the very least a set of TEI metadata associated with it. The centralization of information which these evaluation and archiving constraints have imposed on a wide range of scholarly communities has not always been well received. In the humanities and social sciences, it was occasionally considered 13 to have the potential for enabling political control, and hence to be detrimental to the freedom of research. This was especially a concern for those who were not familiar with the underlying technologies or the values of the community that develops them. 13 The third French publication organ hosting TEI-based information is ISTEX. 14  Germany has a dierent strategic approach, but it still presents a series of initiatives in the eld of TEI-based scholarly publishing. Until recently, Germany relied solely on the good will (political and economic) of its regions, and more specically of their universities, to build and maintain the infrastructures that are necessary for hosting reliable publication platforms. A funding scheme for a national infrastructure has recently been negotiated for hosting scholarly data at large in a stable environment. 16 Building such an infrastructure will certainly change the overall approach of scholarly publishing in the middle and long run. At this stage though, it has not been fully After the institute concentrated for several years on teaching and research, its publication organ R-I-D-E has gained momentum over the past several years. All of their publications are TEI-based and the data can be retrieved easily. 23

18
Finally, the Jahrbuch für historische Bildungsforschung 24 -a publication backed by a solid institution, in this case, the Leibniz institute for the History of Education -has also been preparing to convert to a full TEI-based, OA publication in an eXist database. While it will use the jTEI Article schema for the scholarly journal, the goal is to generate continuity between other scholarly resources that will also be published in TEI, and the journal itself. Publishing scholarly journals in TEI oers several advantages on dierent levels.

22
On an economic level, using the jTEI Article schema on top of an OJS workow allows low-budget production of scholarly publications like journals. OJS is an open-source platform that can be customized and maintained with the support of an active user community. The jTEI Article schema is available for any journal to reuse, and there exist transformation scenarios for the OpenEdition platforms. In this process, the development needed is limited. In terms of output, the result is clean, and in case of problems or bugs, the editors can mostly rely on the community to tweak minor issues. In other words, you do not need to invest a lot to publish your journal using such an infrastructure and in return, there is little economic gain to be expected from it.

23
The low cost and the easy transfer to open science allowed by such an infrastructure can appeal to decision makers like university presidents, especially for scholarly domains that are comparatively not too impact factor-dependent. There is, on a more political level, a growing interest in stable, sustainable publishing solutions that are now increasingly being recognized as economically valuable. 28 These arguments can explain why convincing decision makers, and more generally people from outside the TEI community, to "invest" in a TEI-based workow for scholarly publishing has become a worthwhile eort, especially in times when the values of open science-a philosophy with which the TEI technology is compatible-are being promoted. 25 We base this assessment on the existence of still-isolated but real attempts, which show the ability of TEI schemas to propose a satisfactory processing that complies with the main principles of accessibility and interoperability 29 of academic data and metadata. 30

26
Two major needs can indeed be identied in the context of SSH journals. The rst is the need for an interoperable and stable workow that would be integrated to open source publication infrastructures. The second need is for an evaluation and publication process that would include transparency as a core virtue. While our proposition, described in more detail in the third part of this paper, is aiming at massively improving interoperability and stability by relying on certain characteristics of TEI particularly adapted to these problems (use of a controlled vocabulary, possibilities of semantic and descriptive markup), it is also our goal to encourage discussion of transparency and interoperability as key quality criteria in SSH journals. In that sense, the use of TEI technology in an editorial process is likely to aect not only workows but also evaluation criteria in general. It is indeed intended to increase the consideration for editorial tasks in the context of Open Science in particular, but also of SSH research in general.

27
For the TEI community, pushing forward secondary sources in TEI (as opposed to primary sources) presents several advantages. The rst aspect worth mentioning is an assessment of current realities such as was provided in the rst part of this paper: we have reached more than a critical mass already, one that calls for better coordination and sustainability of how the community integrates its outputs.

28
Why does TEI encoding work so well for scholarly publishing of secondary sources? There are several possible reasons. First, the TEI is exible in its vocabulary, which means that it allows us to manage and bring together heterogeneous sources of information. Second, the TEI is not closed.
On the contrary, it is conceived so as to allow resources to communicate: it enables us to avoid silos of internal formats developed for the use of one specic publication platform that will need further specic development to remain sustainable, costing a lot to be maintained and stay in use.
What makes the TEI valuable for secondary scholarly publications is what makes it valuable for any publication: its stability, its interoperability, its openness, its reusability. TEI-based scholarly publications are made available in a nonproprietary format, which can also easily be transferred to OA publication models.

29
There is one nal argument, proposed by Laurent Romary in the course of the TEI-L discussion on the topic, that is less obvious than the previous ones. To him, the main advantage of the TEI being the same basis format for primary and secondary sources is that it allows scholars to use the same format for primary and secondary scholarly publications: for example, for digital editions and articles on the digital editions. This continuity between the two major publication dimensions of SSH research activity allows both uidity and solidity. Fluidity means here that integrating elements from primary resources to secondary resources (and reciprocally) is made particularly easy. Solidity means that the same people who have the expertise in one eld can contribute to the other: the brainpower available is considerable. But it also means that issues of nesting TEI structures are of central importance in this context.

30
In terms of research content, the continuum between primary and secondary scholarly publications could have another consequence: the TEI community could (or should?) become more attractive for scholars interested in less philological questions than the core community is: for instance, sociological aspects of knowledge transfer, community building, or evolution of research strategies. Journal material could easily be turned into a primary source. And there are enough documents available at this point to make this corpus interesting to sociologists, sociolinguists, and historians of science. What is still missing most of the time is a more obvious download button -that is, a structural incentive to use TEI corpora.

31
The advantages of expanding TEI-based models to secondary publications would not only benet the TEI community by widening it. It would also make it possible to bypass the dead end of reputation mechanisms in the SSH at large. We are currently in a situation where scholars, assistants, and research engineers are sacricing a great deal of time and work for the prot of publishing houses that are negotiating reputation for money. Coming up with a TEI-based format has the potential to break these reputation rules because it is nonprot, requires little technical maintenance, and allows scholars to dedicate more time to reading papers than evaluating their impact factor or abiding by publishers' editorial guidelines (see Kosmopoulos and Pumain 2008).
What are TEI-based scholarly publications aiming at in general? Mainly, stable and wide dissemination. What makes reputation in the realm of such values is neither primarily quantity nor established publishing houses but mostly a dissemination strategy based on core virtues like FAIR (Findability, Accessibility, Interoperability, and Reuse), 31 and a common set of values and improvements carried out by the community. Researchers need relevant papers to be accessible and easy to nd, to evaluate and to reuse; TEI-based formats can help meet these needs. They are already doing so for publishing in the situations described in part 1, and sometimes also for authoring, but never so far for reviewing and copyediting.

TEI-based Workflow Improvements 33
To be able to provide a complete publication workow in TEI, we would not only need to improve and develop customizations for publishing and for authoring as they already exist. 32 We need to conceive a complete TEI-based workow, to take into account the reviewing and copyediting phase as well-phases that are currently dealt with in other formats.

34
Complex workows are at the core of editorial work for scholarly journals. The coordination chain usually includes authors, editors, reviewers, and copyeditors. Depending on the journal and the reviewing format (single-blind, double-blind, open), this can easily add up to ten people working on one text. Our own editorial experience, respectively at the Journal of the Text Encoding Initiative and Philosophie Antique, can give signicant insight into such workows. JTEI, for instance, foresees three dierent reviewers for each paper and three rounds of copyediting (some of which are done by the editors, but not all). Philosophie Antique 33 has a printed edition in addition to the digital one; this implies, in addition to double-blind review for each article, a double copyedit for each of the two formats. With dierent issues running in parallel as is now the case in most online publications, this means having an editorial interface that makes it possible to deal with dierent workow timelines at the same time and to assign dierent editorial roles to one single person.

35
JTEI is working with OJS, as are many TEI-based journals. The recent update to version 3.0 has made some improvements to OJS's functionalities and interface, but OJS still suers from being developed for too many dierent uses, making it occasionally tricky to tailor for specic needs. For years, the overall management of JTEI was actually dealt with not in OJS itself, but in tables archived in separate Google documents (one Google document for each issue of the journal), because OJS proved unable to oer such an overview in the way that was needed. This type of management has the inconvenience that the text and the information on the status of the text are separated, making an overview dicult to gain for the dierent actors involved in the process of text production.
Philosophie Antique is even worse o. The workow does not use any content management interface.
The editorial team works by document exchange, archiving successive versions, with two parallel workows for the preparation of the PDF to be printed and the uploading of texts in HTML, using the Lodel tool on the OpenEdition interface.

36
These two examples are symptomatic of the situation in many journals. Compared to this reality, it is clear that, to be able to deal with the complexity of workows as we know them today, the texts should ideally contain metadata allowing one to see at rst glance, as well as to process automatically, their editorial status. This is one of the many improvements that the TEI can provide.

37
To conceive a TEI encoding that would fulll the reviewing and copyediting function and allow for an overview of the editorial status of the text, let us rst consider the dierent roles (author, reviewer, copyeditor) and the dierent types of interventions in the text. In an ideal world, each role would correspond to a type of intervention, but in actual editorial practice, it might well be that the copyeditor who checks for content coherence also nds typos to correct. One can divide roughly into two categories the types of interventions that will be done on the text: one intervention type encompasses content (editorial requirements and their application), while the other one deals with the form (ortho-typographical aspects). This corresponds to two workows that run in parallel: one checking for the scholarship, adequacy, and coherence of the article, and one checking for its conformance to the typographical guidelines of the journal. The TEI encoding for reviewing and copyediting should reect these two aspects as well as the two work phases that are the reviewing and the copyediting processes.

38
The rst step in a TEI-based workow will be to dene on the one hand roles (e.g., reviewer A, reviewer B, and reviewer C) and on the other hand types of modications that can be undertaken: an editorial schema (EdSchema), a tagset for modications of formal aspects, and a redactorial schema (RedSchema) that allows the tagging of content-related modications. To illustrate the prototype we have in mind, we used existing examples of articles already published or in the process of being published and converted the copyediting/reviewing process from the Word document to a TEI-based version. 34 Figure 1. Workflow pattern. 39 The graph in gure 1 is a simplied representation of the annual workow of the journal Philosophie Antique. Each article submitted and published by the journal goes through the same process.

40
The rst part of the process is dedicated to the scholarly evaluation of the paper's content; it concerns the reviewers (whose names remain unknown to the authors), the authors, and the journal's editors. All this part would be encoded using the RedSchema subscheme.

41
The work of formal preparation for publication takes place in a second stage that partially overlaps the rst one. It partly concerns the authors (at two dierent moments: delivery of the last version of the text, and proofreading) and the editorial team. It would be encoded using the EdSchema subscheme.

42
We nd a strong advantage to this visualization of the workow by task cycle and not by agent because it allows us to de-individualize the dierent interventions on the text, which must essentially be approached not according to their source (who intervenes) but according to their nature (what type of intervention). The actors within each stage are then only dierentiated by a TEI Models for the Publication of Social Sciences and Humanities Journals: Opportunities, Challenges, and First Steps Toward a Standardized Workflow 13 @resp attribute, which can be anonymized according to the editorial needs (especially in the rst stage for a double-blind review), and their interventions t into one or the other of the subschemes depending on whether they concern content, or form and presentation.

43
EdSchema allows the tagging of elements from the review process as well as from the copyediting process. The tags are attributed to the dierent <resp>s dened in the <respStmt> part of the header and associated with an @xml:id (see gures 2 and 3 and gures 10, 11, and 12). EdSchema contains primarily the <lem>, <add>, <del>, and <choice> elements.

45
Both types of interventions are likely to involve short alterations (changes in punctuation marks, for instance), bibliographical elements (see gures 8 and 9), 35 and longer text passages that need re-writing, this last category being more likely to be relevant to RedSchema.

46
To address remarks that require the rewriting of a longer text passage, RedSchema needs to include an anchor-based tagset that allows pointing to a comment, which in turns should allow the author(s) and/or editor(s) to answer this comment (see gures 6 and 7, gures 8 and 9, gures 10, 11, and 12 and gures 13 and 14).     To generate a clean text, in the end, the last modication would be considered as nal, which means that the last copyeditor should validate with their @resp attribute the earlier modication layers according to the nal editorial decision.

48
The main advantage of such an encoding system is that it sums up in one le all the editorial evolution of a text, from submission to publication, displaying precisely its evolution and the contribution of each one of those who were involved in this process. It is, therefore, a question of including within the chain that structures and edits content the part corresponding to the evaluation and formatting, and thus making it visible and shareable. In this way, the data are made open to the point of the preparation of the data themselves. It is worth noting that recent eorts made by scientic publishers to automate workows have focused on publishing content but did not include the preparation of such content. 36 conversion without information loss. It allows splitting the editorial work on form and content, giving the editor the nal hand on the last version of the text. And nally, it should be conceived as a fairly minimal combination of tagsets, meaning that these schemas should be easy to share with other scholars and journals.

50
It would have, on the downside, the inconvenience that comes with its advantages: being multilayered, such a document might quickly become complex. Transformation scenarios ltering specic tagsets to gain readability will be made necessary: both readability for the human eye and information extraction for digital tools would rely on the development of such transformation scenarios. But all in all, the development of such schemas and transformation scenarios seems in the realm of the doable considering what the TEI has been able to develop over the last three decades.

Structural Workflow Improvements beyond the TEI 51
The main reason why corrections are implemented, and the correction process hidden from the public eye, is a need for control. Editors and authors want to have control over each modication in the text, be it punctuation, bibliography formatting, or a sentence that seems a bit dicult to understand. This is all the more important in the case of papers written by nonnative speakers.
Those need close linguistic attention to reach the same level of readability as those written by native speakers.

52
But looking at it more closely, the whole workow inherent to this control of modications and corrections is based on reputation control. It is based on editorial "black boxes" that keep articles away from the public eye as long as they are not "perfect" or "nished." It is based on the idea that work in progress can damage reputation. This is exactly what started to change with the publishing of digital scholarly editions online. The fact that it is possible to update a digital scholarly edition suggests that we could at least imagine that such plasticity can be envisioned for journal articles too. Making it possible to work with sources that are not xed in time is one of the greatest instances of intellectual progress not only made possible but actually realized by the TEI. Taking it one step further for scholarly journals is a fascinating intellectual challenge in terms of data dissemination quality assurance. 37 It is also the logical next step in terms of data empowerment.
This could easily be encouraged by two (infra)structural lines of action. The rst one is to further foster pre-print publications (JTEI encourages such publications for papers submitted to it). The second one consists in improving the academic recognition of editorial expertise. There are already many experts in TEI working as research engineers or editorial assistants. They are in general working for specic journals: for example, for an editor or publisher. This expertise should be better recognized than it is now, and valued more explicitly in advertisements for editorial jobs.
The TEI community should encourage this expertise to be better represented, for instance in the special interest groups (SIGs) or through the awarding of prizes. The academic publishing market has been intensely professionalized in terms of digital competence for several years, particularly through the emergence of networks such as the Medici network, 38 which has helped in the continued training of editors. The profession is now ready to integrate such a workow and support its development.

54
If we were to alter the workow in such a way that the versioning occurring alongside reviewing and copyediting was easily manageable, it would mean that successive updated versions could be In this situation where the dialogue between authors, reviewers, and editors can be made transparent within the text, scholars would be in a dierent position than that of accepting or refusing a correction suggestion. The dynamics of the relationships between those involved in the process of generating the text could benet from this change. The TEI community could be actively involved in providing schematrons, stylesheets, and publishing environments for journals that would allow researchers to access TEI documents directly, be it their own for additional editorial work or others' for queries.

56
One relatively easy way to implement such a workow consists in building overlay journals like those hosted on the Episciences platform, 39 which only provides an additional review layer on top of preprint publications. The interface allows a journal to set up a review process and link to the evaluated and selected preprints to make up a journal issue. While this greatly simplies the review process, especially because of the easy-to-adjust interface, it does not really address the online presentation of publications in a reader-friendly way, for which other solutions have to be implemented and which would require an additional investment.

Conclusion 57
Workows are complicated because, in theory, they address all the needs of the editorial process.
In terms of scholarly publishing workows, there is some development work to do before we will be able to disseminate an encompassing TEI schema for authoring, reviewing, and publishing. The encoding we have proposed here could serve as a basis; potential users are already working with a TEI-based publishing format, and the TEI community has a great deal of expertise to build upon.
Looking at the many journals that are already working with a TEI-based publishing format, there is one major thing required to take it to the next step: good documentation everywhere.

58
The TEI community is in a position to impact access to knowledge for future generations-not only through our digital editions of manuscripts, but also through the way we disseminate all the information we gather from working with manuscripts and with digital editions. Fostering TEIbased scholarly publications is worthwhile: rst, it has been done at dierent scales for publishing and is working; second, our knowledge of it allows us to reect on the specicities of the dierent disciplines we come from; and third, it is up to us to initiate this change because nobody else can do it the way the TEI community can. There is no reason why the coming generations should be plagued by time-and money-consuming requirements from publishing houses now that there is a real political awareness of the need for solid archiving formats and now that the TEI has established itself as a standard in so many elds. 29 On the TEI's proposed compromise between precision and ability to share, see Bauman (2011). On the challenge posed to the TEI by the notion of interoperability and on the need to build a general ecosystem conducive to achieving the full ability to share primary or secondary documents, see Unsworth (2011). Automatic conversion of SSH documents into a common form of markup is currently being explored by the MONK project, as described in Pytlik Zillig (2009 32 See also Thoden (2019) on the strategy of converting the workow and document basis from a proprietary format to a fully standards-compliant system in the context of a publishing platform.
34 We would like to thank the authors, reviewers, and copyeditors who allowed us to use for this purpose data that are usually not made public but considered part of the "black box" of academic publishing.
35 It should be possible to add a bibliographical entry using the <resp> to indicate that it comes from a specic reviewer or a copyeditor.
36 See, for instance, the METOPES XML editorial workow, which is based on the principle of single-source publishing but leaves aside the preparation of content itself: "Environnement Métopes," pôle Document numérique, Maison de la recherche en sciences humaines (CNRS / Université de Caen Normandie), accessed February 25, 2021, http://www.unicaen.fr/recherche/ mrsh/document_numerique/outils/metopes, which is not making its schemas openly available at this stage.
37 Easy access to all the data from the editing process in a structured format could be a decisive step in promoting open peer-review practices. Langlais (2016)