The TEI Critical Apparatus Toolbox: Empowering Textual Scholars through Display, Control, and Comparison Features

The various mechanisms oered by the TEI schema and Guidelines for the encoding of critical editions suer from one major shortcoming: the lack of user-friendly tools allowing philologists and their readers to display and process TEI-encoded editions. After witnessing –and personally experiencing– this frustration, I decided to develop an application especially dedicated to supporting philologists in their work, and helping them to fully benet from their encoding work. The TEI Critical Apparatus Toolbox is now available online

Yet very few tools have been designed to facilitate the work of the editor during the production phase, while the edition is still in the making. 2 By its very nature, building a critical apparatus is a dicult and error-prone task, especially as the number of witnesses grows. As a scholarly editor of medieval texts, I especially wished to be aided in three areas: • Detection of human errors in the apparatus: it is quite easy for the editor to forget to record the reading of a manuscript for a particular locus, or to make a typo in the witness siglum while recording it, which will result in a loss of information. In a print edition, or even in a digital edition prepared with text-processing software, such errors are extremely dicult to spot and correct since they require a human reader to closely double-check every apparatus entry. The careful encoding of a critical edition, however, oers the possibility of automatic notication of such errors, among other advantages.
• Displaying the work-in-progress edition: in the early stages of a critical edition, it is quite common to be undecided yet regarding the manuscript (if any) on which you are going to base your edition, so the possibility of easily displaying the dierent versions of the text according to various witnesses, and in dierent states, is quite helpful. Regular tools designed for the display of critical editions in their nal stage can be helpful; however, a work-in-progress edition may have characteristics which make those tools dicult or impossible to use.

A Survey of Existing Tools 3
While looking for such aid in existing tools, I surveyed the most likely candidates. 3 Three major TEI display tools have been evaluated using three criteria: their ability to detect human errors in the apparatus; their ability to display the text according to a particular manuscript; and their ability to display the critical text in a traditional fashion. I have not considered tools designed for the display of a single witness or documentary edition, since the needs of those types of edition are dierent. 4 None of the tools provided automatic detection of errors, while the style of display oered diered: • The TEI Stylesheets 4 oer a simple display for critical editions, but do not oer a way to display a single manuscript.
• TEI Boilerplate, 5 for all its other qualities, is not suitable for critical editions: it displays both lemma (if any) and readings one after the other, without giving any information on the witness where they appear.

•
The Versioning Machine 6 does not display a critical text, but is especially designed to display a text in parallel versions according to its various witnesses. It suers from a minor shortcoming, though: it only works with a positive apparatus, where even lemmata have a list of witnesses attached. If an edition is encoded with a negative apparatus, 7 the witnesses whose siglum does not appear in an apparatus element are considered to bear no text at all at this place, instead of bearing the same text as the lemma for this entry.

5
It goes without saying that since those tools are open source, they could of course be customized or modied to address the needs listed above. I have only considered o-the-shelf features, available to any scholar without programming skills or even advanced digital skills. After this survey, I started writing my own scripts in order to support my editorial work, as a set of separate tools. It quickly occurred to me that those scripts could easily be grouped together in a "toolbox" that, with a little bit of packaging, could be of help not only to me but to many other editors. By focusing on the processing and display of the critical apparatus, it was possible to nd common features useful to most TEI critical editions, therefore going beyond a project-specic tool. I opted for a design that would be as user-friendly as possible, not requiring users to know any XSLT or CSS, nor even to download and congure anything. The TEI Critical Apparatus Toolbox is an online application for the quick and easy visualization and processing of TEI XML critical editions. It is not meant to be a publication tool: the Critical Apparatus Toolbox specically targets the needs of editors during the preparation of their ongoing work, allowing them to perform quality controls on their TEI les and to display their work-in-progress text either in the style of a "traditional" critical edition, and/or in parallel versions corresponding to each witness.

10
The requirements are very basic: no account, download, installation, or conguration is needed.
The users are simply invited to upload the TEI XML le of a critical edition through a web browser.
The only requirement is that this edition must be encoded using the Parallel Segmentation method.
As long as this method is used, any style can be used in the le: positive or negative apparatus, use of <lem>, use of <rdg> only, or a mix of dierent styles -which is common in work-in-progress • In each case, the content of <lem> and <rdg> are highlighted, with a white background.
• When an <app> element contains a <lem> and one or more <rdg>s, there is an easily identiable critical text: only the content of the <lem> will be displayed in the text, and the readings will appear in a pop-up note. To see the note, the users simply need to click the upward-pointing arrow sign (↑) following the content of the lemma. Example:   • When an <app> element has only <rdg> children, there is no identiable critical text. It is therefore impossible for the application to decide which reading should be the critical text, and which other(s) should be variants only mentioned in a note. In this case, the content of each <rdg> is displayed, by order of appearance in the <app>. To make the text more readable, curly brackets open and close each series of <rdg>s belonging in the same <app> element. To make the presence of empty readings clearer, they are materialized with a minus sign (-), which was deemed more neutral in this case than using the notion of "omission." Example:  • The use of reading groups is also supported: the content of each <rdgGrp> element is displayed between bold double parentheses. If the <rdgGrp> contains a <lem>, its text is underlined. Example: This traditional critical edition view also oers the option to show or hide page breaks, if they have been encoded with the <pb> tag. If the user chooses to display all page breaks, the page or folio numbers appear in blue, between square brackets, and inline for better readability of the text (especially when the page breaks of multiple witnesses have been recorded). Alternatively, it is possible to display page breaks only for a particular witness (relying on the value of the @edRef attribute, which must refer to the siglum of the witness). In this case, it is assumed that the page breaks of this witness are of particular interest to the user, and they are displayed in a more prominent fashion, as blocks with a thin blue line representing each break.

13
So far the Critical Apparatus Toolbox is not very dierent from other TEI display tools, except perhaps that it can handle a great variety of encoding styles within the Parallel Segmentation method. But its most distinctive feature is the ability to perform automated controls of the encoding.

Controlling the Consistency of Your Encoding 14
The preparation of a critical edition involves many sessions of meticulous proofreading, especially to check the accuracy of the apparatus. If the Critical Apparatus Toolbox cannot replace the careful eye of the editor, it oers an ecient way to control the consistency of the encoding by detecting small inevitable mistakes, like a typo in the list of sigla or the failure to record the reading of a particular witness in an apparatus entry.

15
To perform those controls, the Critical Apparatus Toolbox will scan the <teiHeader> and <front> sections of the TEI le for a <listWit>, and nd all the sigla of the witnesses. Then, it will compare this list to the manuscripts appearing in the @wit attribute of <lem> and <rdg> elements. The nature of the controls will depend on the type of apparatus used in the edition.

Positive Apparatus 16
In a positive apparatus, the reading of each witness considered for the edition is explicitly mentioned in each <app>: even lemmata have a @wit attribute, listing all the witnesses bearing this text. This type of apparatus may be more verbose, but it is a useful practice at least during the preparation of the edition (whether or not this will be the style used in the nal version of the edition) because it forces the editor to be more accurate and makes verications easier. It is the type of apparatus that allows for the most ecient consistency checks.

17
The Critical Apparatus Toolbox can: • Highlight apparatus entries that do not use all witnesses: the content of the incomplete <app> elements which do not explicitly give a text for each witness listed in the <listWit> will be highlighted in red. Example: Example 4. Encoding of readings in a positive style of apparatus, in an edition considering four witnesses (F, K1, K2, and V), offering no reading for V.
Nos quoque <app> <rdg wit="#F">oramus</rdg> <rdg wit="#K1">eramus</rdg> <rdg wit="#K2">obsecramus</rdg> </app> ut servo • Highlight apparatus entries that do not use a specic witness: in the previous feature, if any witness is missing, the entry will be highlighted. In some cases, for instance an incomplete collation for one of the witnesses, this could generate a great deal of unwanted information.
With this function, the user can choose to highlight only the <app> elements that do not explicitly give a text for a specic witness that they select. Each witness is assigned a dierent highlight color (there are twenty dierent colors available).
• Highlight apparatus entries where no witness at all is mentioned: not giving any @wit for a <lem> or <rdg> -or an empty @wit -may be a choice from the encoder, but it may also result from a mistake. With this function, the editor can easily map and control those entries.
• Highlight apparatus entries where a witness is mentioned more than once: a careless mistake or a typo may cause the same witness to appear more than once in the @wit of the <lem> and <rdg> children of the same apparatus entry. This results in a confusing situation, where the application cannot determine which reading actually belongs in the witness, that must be corrected by the editor.

18
It is worth noting that the application is capable of dealing with lacunae encoded with <lacunaStart> and <lacunaEnd>. When one of the witnesses is lacunary, the apparatus entries appearing between the <lacunaStart> and <lacunaEnd> tags marking up this lacuna will of course not give any reading for this witness, but the application will not consider that the lacunary witness is erroneously missing, and will not highlight those entries if the user checks incomplete <app> elements. Conversely, if an apparatus entry gives a text for this witness within the lacuna, the application will consider that the witness is mentioned more than once and highlight this entry accordingly.

Negative Apparatus 19
In a negative apparatus, only witnesses with a reading diering from the lemma are explicitly mentioned, the <lem> element usually having no @wit. Witnesses whose siglum does not appear in the @wit of a <rdg> are assumed to bear the same text as the lemma.

Other Controls 21
The Critical Apparatus Toolbox can also highlight apparatus entries that contain a <lem>, or that contain only <rdg> elements.

Parallel Versions View 22
As an alternative to the traditional critical edition view, users can display their edition in parallel versions corresponding to the text. After uploading the TEI XML le, the user is asked to choose the sigla of the witnesses they want to see displayed, from a list automatically built from the <listWit> found in the document. The user also has the option to display the critical text alongside the text according to individual witnesses.

23
The parallel versions view is available for a positive as well as a negative apparatus. In the case of a negative apparatus, the application will reconstruct the text of each witness by using the text of the lemma when a reading is not explicitly given for the witness.

24
After submitting the choice of witnesses, the parallel versions will be displayed in columns. The look and feel of those parallel versions is similar to the traditional critical edition view, with the dierence that for each apparatus entry, the text displayed is that of the current witness. The text of the lemma and of the other manuscripts is still presented, but only in an apparatus note.

25
A color-code helps the user to situate each witness in relation to the critical text: • when the current witness has the same reading as the lemma, the text of the apparatus entry is highlighted in white; • when this witness has a dierent reading from the lemma, the text is highlighted in orange; • when there is no lemma for an apparatus entry, the text of the witness is highlighted in yellow; • when an apparatus entry does not give any intelligible reading for this witness, question marks highlighted in red are displayed. This means there is probably something wrong with the way this apparatus entry is encoded.

Application Design 26
The Critical Apparatus Toolbox is an online application built on a set of XSLT stylesheets served through PHP les, the output of which is made interactive thanks to Javascript and CSS. It makes use of some parts of TEI Boilerplate, most notably its web design. But despite the similar look and feel, the core functions are very dierent: all the parts of the TEI Boilerplate stylesheets pertaining to critical edition elements have been overridden.
The XSLT stylesheets analyze the TEI XML edition, and determine its characteristics. 10 When they transform each <app> into <span> HTML elements, this information is used to assign one or more CSS classes to each <span>: all are assigned the "app" class; the ones not explicitly giving a text for each witness are assigned the class "incomplete," those giving more than one reading for the same witness are assigned the class "doubles," and so on. A list of all the witnesses mentioned in the apparatus entry is also created by the XSLT and stored in the @title attribute of the HTML <span>.
Those @class and @title values are used by the javascript functions to control the consistency of the encoding and highlight elements according to the user requests. The default TEI Boilerplate XSLT and CSS are used for all the TEI elements which are not directly related to critical editions.

28
Within the Parallel Segmentation method, the latest evolutions of the TEI are implemented, like the possibility for <lem> or <rdg> to contain model.divLike and model.pLike elements. 11 In the future, keeping up with the developments and evolutions of the Critical Apparatus module will be a priority. We hope that the Critical Apparatus Toolbox will be able to adapt to these evolutions: since the functions of the interface are powered by Javascript, updating the XSLT should be enough to adapt to new rules or elements in the module.

Future Developments 29
The beginning of the development of the Critical Apparatus Toolbox was a lonely endeavor, but the project has since beneted from the collaboration of Magdalena Turska 12 who wrote a prototype for the integration of the Toolbox into an oXygen framework. Decisive help was also found via a collaboration with the Erasmus SP+ DEMM program (Digital Edition of Medieval Manuscripts). 13 For the three years beginning in June 2015 DEMM is holding an annual hackathon event where the Critical Apparatus Toolbox is the base application that small, mixed teams of textual scholars and computer scientists try to enhance to meet their particular needs. 14 These events will play an important role in the future developments of the Toolbox, since they confront us directly with the real-life experience and needs of editors.

New Controls and Features 30
During the rst hackathon, the students and computer scientists were divided into four groups working on three themes: one worked on features linked to the representation of named entities, two others on various ways of representing the variance of an edited text (from various states of transcription showing either abbreviated or expanded words to parallel versions of a text with potentially dierent branches), and the last concentrated on the relationship between the edited text and images. These themes could serve as general directions for enhancing the Critical Apparatus Toolbox: • oering visualization options for named entities, from a simple index to more elaborate links to maps, when possible; • taking into account the visualization of transcription features like abbreviations/expanded words; • allowing the display of parallel branches of a tradition, beyond the mere display of parallel witnesses; • adding some options to link the text to its representation, or to images generally. This poses the problem of access to the images: in the current state of the Toolbox, users upload their TEI XML edition but not the other les potentially linked to it, like images.

A Basic Web-publication Kit 31
Another direction would be a feature similar to the "Web View" output system proposed in Martin Holmes's Image Markup Tool. 15 Even if the Critical Apparatus Toolbox is not a publication application, such an output would provide users with a ready-to-use static version of their edition, a set of les (HTML, CSS, Javascript, etc.) that they could publish on their website or show in a demo session.
While complex projects will always need a proper publication framework, this sort of lightweight publication output would provide a simple tool for basic self-publication. 16

Printing Editions 32
The printed page is one of the many possible outputs for a critical edition, even if it is only a scaleddown version of the proper digital edition. In many cases, a printed output is not only a desirable option, but a necessity: think, for instance, of the PhD students preparing a critical edition who have to turn in a printed version of their work. Many of them turn to LaTeX instead of TEI, therefore losing the benets of semantic encoding just to be able to print their dissertation in the required fashion.

33
It is of course extremely dicult, if not impossible, to propose a generic model for such diverse objects as TEI XML critical editions. Yet, a useful compromise can be found by concentrating only on the critical apparatus layout, and also by oering the user some simple interactive customization options (for instance, page size, page and line numbering, and content of the apparatus notes) to deliver as useful an output as possible.

34
I am preparing a generic TEI-to-LaTeX and TEI-to-PDF conversion feature that will be implemented in the Critical Apparatus Toolbox. I chose LaTeX as an intermediary format because it oers all the desired options, thanks to the reledmac package especially designed for typesetting critical editions. 17 It is better suited to the specic needs of critical editions than XSL:FO. Another advantage of an intermediary le is that it leaves users the opportunity to edit the LaTeX code to obtain a better PDF result, which they might prefer over a modication of the XSLT templates, depending on their skillset.

35
This feature, still a work-in-progress but well advanced, lets the user customize many parameters of the output through a graphical interface, without requiring any knowledge of LaTeX. When users need heavy customization of the default settings, they can easily override the templates transforming the TEI into LaTeX (although this requires some understanding of XSLT and LaTeX).