Medieval Glosses as a Test Subject for the Building of Tools for Digital Critical Editions

The thirteenth-century Latin corpus of the Oxford gloss has challenging features. A digital scholarly edition of this corpus necessitates innovative solutions that on the one hand account for the complex structure and content of the text and its manuscript transmission, and on the other hand convey these data comprehensibly to the reader. There are still few user-friendly tools which allow scholars to encode a digital critical edition easily, particularly its critical apparatus, in TEI XML. In this article, I document my approach to editing a digital corpus of philosophical glosses, and explain the tools I am using and customizing to make available to a larger public of digital scholarly editors, especially in the eld of medieval studies.

The Aristotelian text was written in the middle of the page, in a large module. Most of the gloss was copied in the margins, which were designed to accommodate several columns of those notes, but a signicant portion was placed between the lines. Each manuscript therefore has both marginal and interlinear annotations, which are referred to universally as glosses, no matter the length or content. Dierent techniques were used, even on the same page, to link a marginal gloss to its lemma, the word or phrase in the central text. Either an attachment sign was copied above the lemma and reproduced beside the gloss, or the lemma was repeated at the beginning of the gloss.
The uid and variant character of the glosses and of their links to the text make them dicult to edit. The few paper editions are partial (Burnett and Mendelsohn 1997;R. K. French 1997;E. J. French 1998;Galle 2008). Representing the links between the glosses and the text and the variant character of the glosses and the lemmata proves to be a problem in all of them. The most accurate edition is not very easy to read (Galle 2008). The corpus is huge, as the twenty-six or so manuscripts span a dozen dierent treatises. There is no electronic edition yet. My current project aims to make an electronic edition of the gloss on De plantis. The edition will provide a critical text, but also let the reader follow text and gloss as easily as a medieval reader could do, that is, text and gloss being immediately connected to one another.

11
Reproducing the medieval layout seems a dicult goal to reach, whether in print or digitally.
The medieval annotators arranged the material to t the space left on the page. To spare writing space they could abbreviate some places, or tighten the script. They could adapt the writing and disposition of the glosses according to the situation and context, because they could see and reect on the text as readers. Tailoring an algorithm to resolve contextual and potentially subjective issues seems rather cumbersome, and to promise uncertain results. New solutions must be found to represent glossed corpora, other than the paper or digital editions tried so far. 9

12
The high degree of variation between witnesses, both in content and in presentation of the glosses, as well as the uncertainty about the transmission of the text (oral, written, or both) both make the idea of a stemmatic or reconstructed edition irrelevant, if not wrong-headed. Furthermore, the very instability and mutability of the manuscript tradition best helps us understand medieval teaching; dierences are windows into the lively and social context which generated them. This is perhaps one of the main values of an edition based on the manuscript tradition, unlike the sole publication of a reconstructed text, as the content of the gloss is mainly a compilation of commentaries, and can probably not be retraced to an archetypal authorial version. 10 On the contrary, the Oxford gloss is the result of gradual, collaborative work by Masters of Arts of the University of Oxford throughout the second third of the thirteenth century (Kuhry 2019). 11 The ideal edition should thus represent the peculiarities of each of several manuscripts that carry a given treatise, to help readers identify those layers. 12

13
On the other hand, an edition which provides merely facsimiles or document-centered editions of each manuscript would not be a satisfactory scholarly edition. 13 Born-digital critical editions are also dierent from print critical editions that have been digitized. 14 They must oer features not found in a traditional critical edition, including means of reading, exploring, and analyzing the text. 15 Document-centered editions and digitized critical editions are thus two models from which any digital critical edition should be distinguished.

Features of the Corpus 14
In the edition I am preparing of De plantis, I have determined that the corpus has the following main features:

1.
From one glossed manuscript to another, the central text can be variant, with missing, variant, or additional lemmata, which means that the central text should display textual variance. The glosses being uid, it is not possible to restrict the encoding to the lemmata: the entire Aristotelian text must also be encoded. On the other hand, the purpose is not to redo the Aristoteles latinus. 16 Only the glossed manuscripts are transcribed and encoded.

2.
At a given passage from De plantis, a gloss might be in one manuscript but not another, or may come in dierent versions. Because I wish to show such variation, the glosses' variants must be encoded too.

3.
A given gloss can be linked to dierent lemmata from one manuscript to another, or to a lemma not existing in the manuscript, which means that the design of the encoding should take into account the uidity of the glosses, to reect accurately the state of the manuscript tradition.

4.
One gloss in one manuscript can be two or three in another.

5.
One gloss can be interlinear in one manuscript but marginal in another.

6.
Glosses in the same manuscript might be written by several hands.

15
These last two characteristics imply that any encoded gloss must allow additional levels of description, beside the text and its variants.

Editorial Principles 16
For this corpus, which is far less standardized than the Biblical Glossa ordinaria, stemmatical reconstruction of the text is irrelevant, even if it were possible, as I have already said. 17 Before the digital edition can be published, several steps must be completed, according to a specic methodology. 18 The manuscript witnesses must be classied according to the glosses' typology.
In the absence of precise dating for the copying of the glosses, a typology can be achieved only after signicant samples have been collated for each one. Collected samples should help to identify layers of composition in the glosses' content, and thus to dene stages in the development of the glossed corpus in the thirteenth century. 19 Then, a base manuscript containing an extended version of the glosses is chosen and the variants of the most signicant manuscripts of each type are collated against its text. This comparative activity will enable the creation of a thesaurus of the glosses, each gloss archetype-a gloss in an abstract sense, instantiated in dierent versions -receiving a unique identier.

Creating Versatile Tools for Digital Scholarly Editions 17
In her paper presented at the 2015 TEI Conference, M. Burghart (2016) describes three areas in which the digital scholarly editor of medieval texts could need help: detection of human errors in the apparatus, ability to display dierent versions of the work-in-progress edition (corresponding to the text of dierent manuscripts), and handling of editions encoded with <rdg> or <lem> elements in the apparatus. Those needs are met by the TEI Critical Apparatus Toolbox, which she created. 20 I would add another prior and very basic need: nding help to encode the edition. Indeed, the task of encoding is described as the main diculty confronted by TEI users, partly because of the lack of user-friendly tools (Burghart and Rehbein 2012). One can nd tools to annotate images and transcribe sources, 21 but these tools do not enable the scholar to encode variants and the critical apparatus as specied in the TEI Guidelines. They are therefore purely document-centered tools, and we have seen above (see note 13) that there is a general demand for text-oriented digital scholarly editions designed to be critical. Collation of several manuscripts can also produce an automatically encoded critical apparatus using the TEI parallel segmentation method, by means of CollateX or Juxta software, provided one already has the separate transcriptions at one's disposal. 22 As such, the result is not a proper critical edition, as no selection of variants has been made to produce a critical text with an apparatus. Critical editors therefore still need a user-friendly tool which allows them to establish the edited text, select variants, and introduce all the necessary critical annotation.

18
As a matter of fact, many scholars are invited by the TEI to encode their research data but they have neither the time nor the opportunity to get proper training in TEI XML encoding. So the second part of the project consists in creating a panel of tools for the encoding of ancient textual sources in TEI XML. These tools will enhance existing software, and not create it from scratch.

19
The tools include XSLT stylesheets that convert styled Word or LibreOce documents to TEI XML. 23 Pre-encoding a text in a word processor can be useful, but frequently a deeper level of encoding is needed, which is dicult to reach working only on the text document. So a second category of tools is a series of frameworks or encoding environments made through customization of two widely used XML editors: • XMLmind XML Editor (XXE), 24 which I used to build a critical edition encoding environment. The encoding environment allows one to display the document in dierent views thanks to the use of CSS stylesheets, and to automate and speed up the encoding tasks with custom commands. Customized frameworks in existing XML editors presents multiple advantages: • The use of CSS stylesheets makes the appearance of the document being encoded signicantly more user-friendly, which helps scholars unfamiliar with TEI encoding . 27 Moreover, in both applications, views corresponding to dierent CSS congurations allow one to highlight groups of elements and attributes in dierent ways depending upon the work's progress, thus facilitating the encoding tasks.
• In both applications, using custom commands and CSS stylesheets calling the commands or combined with CSS extensions enabling advanced encoding features allows the automatization of a great number of tasks, many of which are tiresome and error-prone when done by hand, like stand-o annotation, linking, or indexing.
• A framework can be adapted very easily to any other scholarly editing project. Each framework can be customized to the structure specied by the project, which is helpful for scholars with limited previous knowledge of XML and TEI, who could be confused by the myriad possibilities of the TEI All schema. Nevertheless, the raw encoding can be veried at any moment by switching to the text mode in Oxygen or to the tree view in XXE.
the use of custom commands and of features of CSS stylesheets like drop-down menus, checkboxes, and pop-ups.

22
For instance, the attribute values of a number of elements can be listed and explained in the <teiHeader> by means of several <valItem> elements grouped in a <valList>. These values are informed by the editor at the beginning of the work and can be completed (but preferably not changed) at a later stage. In the Oxygen framework, inside the <text> element, the insertion of tags for which the typology of attribute values has been described in the <teiHeader> (and registered as available in dropdown menus) generates a drop-down menu listing these values, thanks to the use of XPath. The same principle is used for references to witnesses in the critical apparatus, by means of the description of witnesses in the <sourceDesc>, and for references to particular hands described in <handNote>, inside a <handDesc>, which is a component of the <msDesc>.
That way, the editor can customize the semantic part of the encoding (namely, the attribute values), in a "go with the ow" mode, that is, after the formalization of the structure. This "open" characteristic is essential when it comes to adapting the framework to dierent projects. On the other hand, the frameworks do not prevent the user from inserting unexpected tags or from coming up with dierent attributes than the ones suggested. Ultimately, a restriction of the schema is desirable, notably for validation purposes (Burnard 2019).

26
The challenge in developing encoding frameworks is not so much technical, as they use features available in each application. Frameworks consist of command conguration and CSS les, sometimes XSLT les. They provide an economic, versatile way to provide scholars with customized tools for digital scholarly editing. 28 The challenge is rather about ontology and modeling. 29 Among the diculties or constraints are thus: 1. the need to oer tools specialized enough to ll the particular needs of each scholarly edition project, especially in the medieval eld, but also generic and customizable enough to be used by other projects. 30 A solution is to identify a core of basic needs likely common to many projects, and create a library of specialized commands for each scholar to apply as they like to their own project. 31 The publication phase, whether in print or in digital form, can be supported by existing tools which the scholar can congure. 32 The tools described in this paper are meant primarily for my colleagues in medieval studies.
Collaboration over several months has allowed me to collect information about the precise needs of several digital scholarly edition projects. 33 The functionality and user-friendliness of the tools can be improved by enhancing the underlying code in accordance with the needs expressed during this test phase. Having been tested through use by scholars with little previous knowledge of digital editing, the tools can be reused by any other project.
2. the need to tailor the tools to the practice of scholars with little or no previous training in encoding and to anticipate potential issues they might confront while encoding.
Regarding this point, collaboration with my colleagues leading projects has also allowed me to draw a more precise sketch of their needs and of their behavior when switching from the creation of a traditional critical edition to that of a digital one.
The tools will be available online as soon as they are tested and stable. Currently two dierent frameworks are in testing phases in several projects. One of these frameworks is for critical editions and one for genetic editions, the latter being more document-centered.
The user base includes all digital scholarly editors, especially in medieval studies.

28
I was able to address the constraints imposed by the nature of the glossed corpus thanks to TEI encoding and to the support of frameworks. To test the selected solutions, I have encoded a small part of the text, the prologue of De plantis, as follows: • The "parallel segmentation" 34 method has been used to encode the variant readings of the central text and of the glosses. 35 • Text and glosses have been separated into two les. Both are linked to one another via stand-o encoding, where the @target attribute on the central text's lemmata (i.e., the words to which the glosses refer, which are encoded in a <term> element) points to the @xml:id attribute of the gloss (see gure 4). A complete stand-o approach would be more dicult because the central text is not xed: it receives more variant readings as new manuscripts of the Oxford gloss are collated. 36 Under such an approach, tokenization is more complicated, since the target text is constantly moving, and therefore the tokens would be as well. Besides, encoding the glosses in the same le allows one to compare the dierent versions of a particular gloss while encoding, thanks to the parallel segmentation, and thus to facilitate part of the philological analysis. This arrangement accommodates the uid nature of the glosses, which cannot be encoded in a xed point in the text. To anchor a gloss to the text while expressing its uidity from one manuscript to another, the <term> element is set inside an apparatus entry: in a <rdg> element contained in an <app> element.

29
The consequence of encoding the lemma in a <term> is that the same lesson in two given manuscripts can result in two readings depending whether there is a lemma in one of them or not. This is indeed a problem because it overloads the critical apparatus, because the shifting of the lemma to another word from one manuscript to another is very common, and does not represent a variant reading in a strict sense. When encoding the lemma in a <term> element, which is itself hosted in a <rdg>, actual variant readings are mixed with lemmata, the latter being very numerous. 37 The problem might be solved by moving the lemma information somewhere else, possibly into another type of the <witDetail> element mentioned below, which should be located in the text after the word bearing the lemma quality, as if it were a note. The <witDetail> can have @wit and @target attributes, which point respectively to the witness in which the word is a lemma, and to the related gloss in the glosses' thesaurus. The + command designed in the CSS to be available on the lemma (element <term type="lemme">) opens a transformed version of the glosses' thesaurus and allows one to choose the appropriate gloss archetype toward which the @target attribute must point (XMLmind framework).

•
In the le containing the glosses (the glosses' library or thesaurus), each gloss archetype is encoded in an item element, inside a rst-level apparatus entry containing two readings: the rst listing the witnesses not containing the gloss and having a @type attribute with value "omission" (see gure 5). This feature allows one to see easily, in the framework, a list of the witnesses transmitting the gloss or not. After the second rdg element, which contains the gloss's text with internal variant readings, a witDetail element describes the location of the gloss in each witness (marginal or interlinear), by means of @n, and to specify, if needed, the hand which copied the gloss, thanks to @corresp. The @target attribute identies the <rdg> at issue, and @wit points to the precise witness(es). • <lem> elements are not used because there is no putative original text. Nevertheless, oering a single base text for a gloss archetype is a future possibility. A single reconstructed text seems to be necessary in a paper edition because of layout limitations. But it is also needed in a digital critical edition, which should provide a "representative text version with a canonical work structure" (Fischer 2017, S281), allowing citation and reuse both inside and outside the project. Translations, semantic analysis, inclusion in larger corpora, and so forth all need a canonical version.
• A future encoding layer will deal with the identication of the sources, probably with stand-o markup pointing to <milestone> elements framing the identied segment and referred to by the @target of a <note>. Each <note> will contain the references to the source of the passage, possibly through @source on <author> and <ref>, within a <bibl>.

34
I have tried to show that it is possible to build exible tools for digital scholarly editing, namely building frameworks within existing XML editors. Quick-tech solutions like these are critical to a future that seems to promise ever-decreasing funding in the humanities.
He also calls for ways to search corpora of digital critical editions from a single gateway in addition to the possibilities oered on each distinct digital critical edition website. F. Duval (2017)  14 Digital scholarly editions cannot be printed without loss of content and functionality, and depend on a digital paradigm in their conception: see Sahle (2016).
15 See the state of the art about functionalities, including the distinction between the concepts of "scriptons" (strings as they appear to the readers) and "textons" (strings as they exist in the text), 17 A. Andrée draws the same conclusion about the Glossa ordinaria and the lack of "authorial" character in this type of text (2016, 9-10).
18 I have described the methodology (Kuhry 2020). to collect and analyze editorial and encoding practices when it comes to ancient sources, in order to model a common core of editorial features and to make encoding and publishing tools more interoperable and complementary.
30 About tools, E. Pierazzo expresses the need for sustainability and ability to meet the needs of other projects, which implies the modeling of the working methods and concepts of a potentially large community of textual scholars (2016a, 111).

31
This solution could be compared to T. Andrews's and E. Pierazzo's dierent conceptions of a "third way" besides "traditional" or "manual" philology and use of the computer to satisfy precise needs without involvement from the scholar (which E. Pierazzo calls "computer-assisted philology"). T. Andrews's idea of a third way involves complete digital philology actively invested by the scholar in every step of the editorial task, from transcription to analysis, which eventually produces unexpected results, thanks to innovative tools and methods crafted by philologists themselves, and nally to electronic publication (Andrews 2013;Pierazzo 2016a, 109-16). E.
Pierazzo doubts that philologists can acquire such a highly technical prole in the near future, with exceptions. She advocates instead for a "bricks approach": the denition of microtasks common to most text editing projects and for which individual tools could be built, customized, and combined in dierent ways according to the needs of each project (2016a, 109-17). The approach described in the present paper is closer to Pierazzo's view.

EMMANUELLE KUHRY
Emmanuelle Kuhry is a postdoctoral researcher in digital humanities applied to medieval manuscript studies at Institut de Recherche et d'Histoire des Textes, CNRS (Centre National de la Recherche Scientique).