Signs of the Times: Medieval Punctuation, Diplomatic Encoding and Rendition

Digitally managing punctuation in the editions of medieval manuscripts is one of those issues that initially looks like a minor detail, but later reveals itself as a tangled web of problems spanning from computer science (how to represent punctuation signs?) to philology (what types of signs exist?) through epistemology (is the processing of punctuation a mere technical transformation or a valuable part of the scholarship?). The aim of this paper is to address the theoretical aspects of these questions and their practical implications, providing a couple of solutions tting the paradigms and the technologies of the TEI. This paper describes how we dealt with the encoding and transformation of the punctuation in the Early New High German edition of Marco Polo’s travel account. Technically, we implemented a set of general rules (as XSLT templates) plus various exceptions (as descriptive instructions in XML attributes), and applied them in an automated fashion (using XProc pipelines). In addition to this, we discuss the philological foundation of this method and, contextually, we address the topic of the transformation of a single original source into dierent transcriptions: from a highly diplomatic edition to an interpretative one, going through a spectrum of intermediate levels of normalization. We also reect on the separation between transcription and analysis, as well as on the role of the editor when the edition is the output of a semi-automated process.


1.
it was not possible to nd an underlying system governing the use of punctuation in the source; 2. using the standard approach to normalization in TEI (using <choice> with, e.g., @orig and @reg) turned out to be impracticable for two main reasons: the verbosity of the encoding and the lack of correspondences between the medieval and the modern systems (@orig would very often be empty), both complicated by the presence of three levels of edition (diplomatic, semidiplomatic, and interpretative).

5
To address these issues, we devised and implemented a novel approach that made it possible for a single scholar to edit each witness in three dierent levels of edition in the allotted time span of eighteen months, while keeping the master TEI le lean in its structure and readable even by scholars who are not well versed in XML technologies. 6 The key innovation of this approach is the encoding of many textual phenomena (and of how these phenomena should be handled) as formal, machine-readable rules, separate from the edited text. An automated workow is then used to process the master le according to the rules and to produce each of the three levels of edition, depending on which set of rules has been selected. In contrast with other similar systems, there is a clear distinction between rules and exceptions in our approach. This is particularly important in the ENHG Marco Polo edition because the language it is written in has never been standardized and thus the system of rules has been organically derived during the editorial work (as described in detail in section 2).

Background and issues
2.1 Marco Polo and ENHG 9 The Devisement dou Monde is a travel narrative written by Rustichello da Pisa and Marco Polo, most probably in 1298, while they were both in prison, in Genoa. It narrates Marco Polo's travels to and within the Eastern World: the traveler was a Venetian who, following his father and uncle on a business trip to Asia, ended up spending almost twenty-ve years in the East, at the court of Kublai Khan. The account of his journey is preserved in more than 140 manuscripts which were produced over two centuries. The original version composed by the two prison companions is unfortunately lost. The text was translated into German twice during the Middle Ages. While one of these translations (version VG3) was already edited by Tscharner (1935), version DI is being edited for the rst time in our project, which serves as the backdrop for the present study. and dialectal variance have been regularized and eroded. This desire has had dierent motivations in the history of editorial practice. When philology was still considered an ancillary discipline, the role of the editors was to prepare the texts for historians and literary scholars, who were mainly interested in the contents. Diacritics, for instance, and other linguistic peculiarities were considered on a par with ein verdammtes Unkraut (a damned weed) in a tidy ower garden, as Julius Weizsäcker wrote in 1867. 8

17
With Karl Lachmann, 9 normalization became a necessity for philologists themselves: needing to summarize a whole textual tradition in an apparatus at the bottom of the page, it was no longer feasible to take into account all variants and punctuation. Collation, in particular, implies normalization: the establishment of a traditional stemma is based on Leitfehler ("guiding errors"), not on equivalent variants. 10 The formal appearance of the reconstructed text (for example, the dialectal variety chosen for the text) would then be given the shape considered to be closest to the language of the author. 11 Generally, however, it is commonly assumed that editors of medieval texts are free to decide when, how, and how much their texts should be normalized. 12

18
For many languages the exact procedures used to normalize a text have been established a posteriori by historical linguists. For instance, normalized Middle High German was rst established by Karl Lachmann (see Maas 1950) and then accepted by the scholarly community. 19 However, no such normalization standards currently exist for ENHG. As Ebert et al. (1993, 7) write: "Until the sixteenth century and for many aspects also later, there is no variety of Early New High German that … had a generally accepted and undisputable prestige over all the others and which can, on the basis of such sociolinguistic fact, be considered as a reference variety for synchronic writing and speaking or for diachronic use." 13 20 A survey of existing editions of ENHG texts shows that, while internal coherence in normalization practices is desired and documented, not only are normalization standards missing, they are also not actively sought after. Focusing on the Ingelheimer Haderbücher project (Marzi and Sprenger n.d.), for instance: although the editors were extremely accurate in outlining their principles for textual transcription, they did not provide any specic bibliographical reference in support of their decisions on how the texts were transcribed and normalized. In fact, they only tangentially address this concern by writing that "for the orthography of the single letters and the creation of the transcriptions we mostly welcomed and observed the suggestions of Germanists, linguists, and paleographers." 14

21
An interesting case is the Frühneuhochdeutsches Wörterbuch (FWB), a dictionary of ENHG, started in the 1980s and expected to be completed in 2027. 15 The editors of the FWB did propose a set of rules, a necessary precondition for such a large-scale project. The principle at the basis of the orthography chosen for the lemmata in the dictionary is a phonological one: after having established an ideal phonological system of ENHG, the corresponding graphemes were chosen. The editors themselves, however, state very clearly that such a systematization represents only an ideal (in their words ausgezeichnet) language. 16 We are not using the FWB as a reference for our normalization, rst, because this ideal language is very distant from the actual language and orthography we nd in our witnesses, and, second, because the dictionary is still not complete (many letters are completely missing, and many are incomplete: only six letters of the alphabet are fully represented).

Normalization Choices Are Editorial Choices 22
For a scholar editing a medieval text, "to normalize" means making a series of editorial decisions. 17 Some examples of the kinds of decisions that need to be made are: how to substitute modern equivalents for original textual features (e.g., if and when to substitute the modern s for the long s [ſ]); how to change certain signs to others (e.g., when to turn a virgula into a comma, when to turn it into a full stop, and when to eliminate it completely); when to omit features that are today considered unnecessary (such as decorations); when to add new features that were not yet fully developed in medieval times (like capitalization rules or spaces between dierent words or sections of text).

23
To an external observer, the act of normalizing a text can be seen as a sort of correction being imparted on the text. Scholars, in this respect, have often distinguished between "normalization" and "correction," between mechanical and less mechanical interventions: the former, normalization, being seen as an almost objective process, while the latter, correction, often considered as a subjective act that involves the editor's iudicium. 18 In our view there is no strict dichotomy between normalization and correction, but rather a continuum. No normalization is entirely objective, no correction is entirely subjective. For example, one can argue that substituting u for v and vice versa is quite unproblematic and rather objective because it follows an established rule of the art in a certain school of philology. However, in our opinion, choosing which set of rules to follow is also a subjective choice of the scholar that cannot, and should not, be seen as independent of their work. In addition, we argue that even in the simplest cases the philologist runs the risk of concealing linguistic features of great importance, taking them for scribal negligence.

25
In our opinion no intervention pertaining to the normalization process can be considered objective or mechanical. The lack of a clear line of distinction suggests that as much information as feasibly possible should be preserved during the preparation of a scholarly edition.

26
Preserving information has, however, a cost: the editor must put in the eort to record every single correction, even the most minute. The method we present in section 3 aims to make it possible to preserve a vast amount of information with little eort on the part of the editor.

Normalization in TEI 27
In TEI most of these normalization actions are represented using <choice> elements. For example, the expansion of ᵽ into per would be encoded as It is also possible to use <choice> to encode more than one normalization form for the same word, for example, using the ad-hoc elements described by the Menota handbook: 19 Example 2. Normalisation in the Menota project. There are many TEI elements that are routinely used in combination with <choice> to express dierent kinds of normalization intents: for instance, <abbr>, <expan>, <sic>, <corr>, <orig>, <reg>, <supplied>, and, for punctuation, <pc>.

30
The use of <choice> has two main drawbacks: rst, it causes a twentyfold expansion in character count (a ve-letter word with two normalizations requires more than one hundred characters to be encoded), and, second, it forces what would be a single <w> element to grow into an eight-tag, two-level-deep markup structure. When each word in a manuscript needs such a treatment (as is the case in medieval texts), these two drawbacks quickly lead to the creation of a TEI le that is hard to navigate and to maintain.

Punctuation 31
Punctuation is one of the many aspects of a text that undergoes normalization during the creation of a scholarly edition. In most cases normalizing punctuation is considered a secondary task and very quickly brushed o in the introductions of many editions. 20

32
In our opinion, the normalization of punctuation is a critical step in the creation of any nondiplomatic edition. Punctuation is not only a reading aid: it delivers meaning, suggests interpretations, creates structures, and changes the importance of words. It is the responsibility of the editor of a non-diplomatic edition to translate ancient punctuation systems into something that a modern reader can fully understand.

33
From an operational point of view, the normalization of punctuation marks is profoundly dierent from the normalization of words, although the two are often treated in the same way.

34
The normalization of words is usually quite simple and has only minimal impact on the markup structure (when done without preserving information, as described in the previous section). In the most common case, some letters in a word are changed, requiring the replacement of that word.
The more convoluted cases require splitting or merging words, but those still have, at most, a linear eect on the markup structure (for example, turning <w>aberes</w> into <w>aber</w><w>es</w> 21 ).

35
In contrast, the normalization of punctuation is often more complex, leads to modications in the hierarchical structure of the markup, and even has a ripple eect on the surrounding text. For example, normalizing the middle dot in "<s>… geben ‧ von stund er gesund wirt …</s>" into a full stop will not only change the markup structure, but also require the capitalization of the following word, now the rst word of the following sentence: "<s>… geben.</s><s>Von stund er gesund wirt …</s>." 22

36
The additional complexity of normalizing punctuation is especially prominent when dealing with medieval texts. As a matter of fact, punctuation in the Middle Ages responded to needs that were completely dierent from what we are used to in modern times, and dierent rules were applied based on many factors: the scriptorium where the manuscripts were produced, the scribe, the language, the destination of the text, and the local customs.

37
This is the case in medieval German, where punctuation was usually not driven by syntactic principles, but instead often had a rhetoric function: it was meant as an aid to those reading the text aloud (Digilio 2008, 373). For instance, an important element in the clause could be put between two commas to signal that it was to be read louder; today this would have the opposite result.
Take for example, "Mary, his sister, is a doctor": if we were to follow some medieval punctuation conventions, like the ones often found in the witnesses represented in the ENHG Marco Polo, "his sister" would be the focus of the sentence, while according to the modern use of commas, the same phrase would be of secondary importance. An example of this is the following sentence, from the rubric of chapter 3 in München, BSB Cgm 696: "die zwen prüder … darnach chomen, zu dem groſſen hern, der gancȝen tartarey." 23 Despite being the pragmatic focus of the sentence, "zu dem groſſen hern" is here between commas: the commas clearly emphasize the phrase in question. Moreover, modern punctuation imposed on medieval texts can obscure important linguistic phenomena: in the case of the ENHG Marco Polo, this relates to the presence or absence of structures similar to the so-called relative nexus in Latin and with the syntactic parameter of the null subject.
Unfortunately, things are even more complex than that; the sentence we have just analyzed proceeds as follows: "die zwen prüder … darnach chomen, zu dem groſſen hern, der gancȝen tartarey, gnant ẟ groſe cham, kayſer ʋon Chatay": the second part of the sentence, which translates "called the Great Kaan, emperor of Chatay," would seem to reect instead a more modern use of commas.

39
The underlying truth is that the systems regulating punctuation in medieval texts are not yet fully understood. This is true for the German Middle Ages in general, as Digilio (2008, 373) observes, and for ENHG in particular. In this respect, Ebert et al. (1993, 29) write that "how far it is possible to describe rules and how much rulelessness reigns here, remains to be determined." 24 The very meaning of the signs is often ambiguous: punctus and virgula, the two signs that appear in the witnesses at the basis of the edition,are, particularly in the fourteenth and fteenth centuries, "polyfunctional" (Ebert et al. 1993, 29).

40
This lack of systematic rules and the high degree of ambiguity makes it hard to understand which set of rules was being used in the rst place and, in turn, almost impossible to transpose in a fully mechanical way the punctuation system used by a manuscript into one of the many currently used. Contini supports our observations when he writes that "the frictions due to the change of system are particularly visible in the case of punctuation [that in old texts mixes semantic and melodic functions], so that it is only rarely possible to insert or omit one and the same sign in the same place" (Contini 2014 †, 23). Instead of a translation from one system into another (which is the denition of normalization adopted by Contini), in the case of punctuation one would have to substitute the system in toto. 41 We are not the rst to notice that normalizing punctuation diers profoundly from normalizing Among the dierent attempts on the part of scholarly editors to address this issue, we can identify three main tendencies: 1. the complete removal of the original punctuation system, replacing it with a modern one; 2. the decision to maintain punctuation exactly as it was in the source document; 3. an endeavor to normalize punctuation, while remaining as close as possible to the source.

43
An example of the rst approach, removal and replacement of the original punctuation, is found in Corpus Rhythmorum Musicum (Stella 2020), 26 a multimodal edition containing the earliest medieval Latin songs, together with their musical rendition. Figure 1 shows an excerpt from the edition alongside one of the witnesses: An example of the second method, the faithful preservation of the original punctuation in the edited text, can be found in the Parzival-Projekt (Stolz 2020), 29 depicted in gure 2. Here the transcription of the witness perfectly matches the original punctuation: the medieval dot is encoded using a modern full point, fully comparable from the point of view of its visual rendering. looking edited texts, such as that shown in gure 3. In this example, the dot preceding the string "achine" is maintained in its original position, but the two words composing the string "ac" and "hine" are separated. Having a full stop still adjacent to the rst letter of a word, though, is quite unfamiliar for modern readers, who might have diculties parsing the text. . Extract from the Digital Vercelli Book. Some punctuation is simplified "in place, " leading to unfortunate hard-to-read artifacts that do not match current punctuation conventions (e.g., ".ac hine").
If, on the one hand, we appreciate the attempt to normalize the original punctuation, on the other hand, this approach does not seem to improve the readability of the text, as one would expect from an interpretative edition. Moreover, it does not seem to help much in the analysis of the original punctuation either. However, it could represent one step in a potential spectrum of normalizations, a rst step in reducing the complexity present in the original source.

Punctuation in the ENHG Marco Polo 47
In our project, punctuation will be handled in three dierent ways, one for each level of faithfulness: diplomatic, semidiplomatic, and interpretative.

48
At the diplomatic level punctuation will be fully preserved. Each punctuation mark will be encoded in its original position and using the nearest Unicode code point. Extracting precise information about the punctuation marks from the facsimile and encoding it in the TEI les will also have the positive side eect of unlocking important analyses in the elds of stylometry and phylogenetics (Darmon et al. 2020).

49
At the semidiplomatic level a reduced system of punctuation consisting of two signs, a middle dot (·) and a virgula (|), will be introduced. Similar systems are often used in the late Middle Ages (Parkes 1992, 46) and are often adopted in ENHG texts of the fourteenth and fteenth centuries (Hartweg and Wegera 2005, 131). This reduced system based on unfamiliar signs will allow a smooth presentation of the text's structure, and, at the same time, reduce the intervention of the editor on the interpretation of the textual content. A similar approach was used by Mitchell and Robinson (1998), as discussed in the next section. In addition, in our case, the insertion of this minimal system of punctuation follows the motto "no punctuation where the sense is clear without any" (Mitchell and Robinson 1998, 313).

50
At the interpretative level the punctuation will be completely modernized, allowing the general public to easily read the text.

Multiple Editions 51
Notwithstanding the issues outlined in the previous section, the project behind the ENHG Marco Polo edition strives to match and surpass the state of the art for digital editions.
One particular aim of the ENHG Marco Polo is to be able to programmatically produce multiple editions, each with dierent notes and normalized texts, from a single master TEI le. In practice, this means: The witnesses will be encoded using TEI-compliant XML markup.
The scholar will work by directly editing the TEI XML master le.
For each witness, there will be a single master le containing all related information and critical annotations (including the references for the editor's notes).
The same text will be edited in multiple levels of closeness to the source: diplomatic (with special characters to show allographic variance, colors, abbreviations, errors etc.), semidiplomatic (with expanded abbreviations, but still maintaining some relevant graphematic distinctions, a reduced punctuation system, and the correction of trivial mistakes), and interpretative (with the correction of all mistakes and the modernization of graphemes, punctuation, and other textual features). Given that the project is based on three main witnesses, this means producing a total of nine dierent editions.
Multiple editions will be generated automatically from the master TEI le, with no manual intervention on the resulting les.
The generated editions les will conform to the TEI subset understood by EVT.

53
Some of these desiderata clash with each other. For instance, the desire to directly edit the XML le makes it hard and error-prone to keep in a single master le all the information needed to generate the three editions. It would have been hardly possible to respect the desired level of explicitness in the normalisation process due to the complexities emerging from the high level of granularity of the transcriptions combined with the goal of producing multiple levels of normalisation.

Predigital Approaches 54
In predigital times, presenting the same text at dierent levels of rendition would of course mean a lot of work. The philologist, after completing the diplomatic transcription, would start all over again and proceed with another transcription and repeat the process for each level of normalization they wished to produce. Not only is this a very long and demanding workow, but it also poses a number of problems: rst of all, if an uncertain passage of the text gets claried later on in the editing process, the changes need to be made on all the dierent editions separately. Moreover, inconsistencies can emerge between dierent edition levels, as they are created independently from each other.

55
An interesting example, in this respect, is that of Mitchell and Robinson (1998): in their edition of Beowulf, they decided to use modern punctuation. The text, in fact, needed to be accessible to a wide audience. At the same time, however, they expressed their reservations on the issue: modern punctuation would not suit Old English poetry and it would force the editors to make decisions in order to resolve ambiguities which cannot be resolved. Given these considerations, extensively explained in section IIIB of their edition, they wanted to oer a way out, or at least an alternative. They proposed a dierent system of punctuation, which inspired our intermediate level of normalization. However, they limited themselves to oering a mere taste of their solution in the appendix to the edition (pp. 313-18). In the appendix they edited verses 1-114 again, using a dierent punctuation system and inviting the reader to go back to the beginning of the full edition, where the rst three folios of the manuscripts are reproduced in a black and white facsimile (pp. 44-48). Such an approach has many limitations: it is highly impractical, both from the point of view of the reader, who is forced to go back and forth from the beginning to the end of the book and who, in the end, only gets a small portion of the "best" version of the text, and from the point of view of the scholar, who cannot realize their ideas to the fullest, as that would probably cost too much time and space.

Digital Approaches 56
One of the main advantages of digital transcriptions of manuscripts is that they can be reprocessed and transformed quite easily, at least in theory. This lends itself to the idea of having a single container that includes both a diplomatic transcription and hints on how to normalize certain pieces of texts.

57
For example, it is quite common for philologists who prepare their editions using TEI to make use of elements such as <choice>, <orig>, and <reg> to indicate how to normalize certain words, or <sic> and <corr> to encode corrections. A software tool will then process these hints and, for example, allow the reader to choose between a version that has been created on the y by selecting the text marked with <orig>/<sic> and another version created by selecting the text in the<reg>/<corr> elements.  orthographic corrections after changes in punctuation (e.g., if a virgula has been normalized into a full stop, then the rst letter of the following word must be capitalized); methods to join words split at the ends of lines (e.g., if a token is at the end of a line and ends with one of the characters used to denote a joining, then merge this token with the following token).
replacement rules for archaic letters (e.g., replace a long s (ſ) with a lowercase s); expansion rules for abbreviations (e.g., replace all occurrences of "vñ" with "vnd").

65
The rst two examples demonstrate why one should not think of these rules as simple textual substitutions. As described in previous sections, there is a deep connection between punctuation and the structure of the text. Changing punctuation can, and often does, change the structure of the text. At the same time, it is essential to represent such structures in a machine-readable way.
For example, without a proper representation of the concept of sentence it would not be possible to write rules such as "if a word appears at the beginning of a sentence…" What follows is that there is a continual interaction between the rules and the structure of the edited text. Rules, and in particular punctuation rules, often modify the structure of the text. A technical system that does not allow for this interaction to happen is not able to deal in properly with normalization in general and punctuation in particular.

66
Each rule is implemented as a small and self-contained XSLT transformation. At the time of writing, the ENHG Marco Polo project comprises about a hundred rules, grouped in twenty macro categories. On average, the core of each rule is implemented in less than three lines of XSLT.

67
To give the readers an impression of the simplicity of the rule implementation, we show here the main parts of the XSLT that implement one of the example rules described above.

Example: Rule to Join Words Split at the End of a Line 68
In ENHG, a punctuation sign that we nowadays call a double oblique hyphen was used to mark that a word has been split at the end of a line. In the diplomatic rendition we want to preserve this word division and the forced line break, while in other renditions we want to reconstruct the complete word.

69
The XSLT excerpt in Example 3 shows how split words are joined when a middle double oblique hyphen is found. The joining is performed in a lossless way: all information present in the original witness is preserved. This is possible because this step operates on a supertextual structure that contains information about the structure of the text and the positioning of a token (e.g., @eol="true"). It must be stressed that this last piece of information, and the supertextual structure in general, are not part of the master TEI le and have been added in the preceding steps. It should also be noted that this rule does not delete any text, it just marks that two words have been joined and what the result of this operation is. Another rule will take care of modifying the structure, and yet another will remove the now superuous partial word in the next line before creating the nal edition le. However, while removing and carrying out all these changes, comments about what is being done will be added as an aid to the readers of the nal edition le. <xsl:apply-templates select="@*"/> <xsl:attribute name="joined" select="true()"/> <xsl:attribute name="part1" select="$part1-raw"/> <xsl:attribute name="part2" select="$part2"/> <xsl:value-of select="$joined-parts"/> </xsl:copy> The rule in Example 3 is independent from other rules in the pipeline. The scholar is free to use this rule in the generation of a specic edition or to leave it out. In the case of the Marco Polo project, this rule is employed in the pipelines that produce the semi-diplomatic and the interpretative editions, but not in the diplomatic edition.

71
Because it is independent from the other rules used in the pipeline, other scholars can reuse this rule in their projects without being forced to adopt the Marco Polo pipelines, and all their rules, in their entirety.

One-off Cases, Exceptions, and Manual Corrections 72
Before a rule can be written, a textual phenomenon must rst be recognized as such and then a pattern must be identied. At the beginning of the editorial process, when the textual materials are not yet well known and the normalization rules are not yet clear, it is simpler and more productive to address one-o textual phenomena manually. This is done by annotating the word to be normalized or the sign to be changed in the master TEI le. Once a pattern is recognized, a rule can be written and these annotations removed from the master TEI le.
Similarly, there are cases where the conditions of a rule would normally apply, but where the scholar does not want the transformation to happen. Rather than complicating the condition in the rule to exclude a specic case, it makes sense to mark the specic occurrence as an exception in the master TEI le.

74
Finally, the scholar may want to correct an obvious mistake, while keeping track of the original reading. This is done not by writing a rule, but by placing an annotation in the master TEI le.

75
All these issues (one-o normalization, exceptions, and manual corrections) are encoded using TEI elements directly in the master TEI le.

76
One-o normalizations are marked in the master TEI le using one of the non-standard attributes dened by the ENHG ODD. For example attaching mp:n1-subst="foo" to any element will force the substitution of the word "foo" for the content of that element, but only at the normalization level denoted by N1 ("semi-interpretative"). Such exceptions are marked using <w> or <pc> elements, together with the already cited project-specic mp:nX-subst attributes, as shown in Example 4.
The ability to quickly deal with occasional exceptions without having to resort to writing a new rule or making the existing ones more complicated greatly simplies the day-to-day job of the scholar. It also allows for a cleaner separation between rules (theoretically shared by a community) and exceptions (the results of the editor's own decisions).

Pipelines and Normalization Levels 79
Pipelines are used by the scholar to indicate which rules should be applied, and in which order, to turn the master TEI le into a complete edition at a specic level of normalization.

80
In the ENHG Marco Polo project there are three pipelines, one for each target level of faithfulness and readability. Achieving a specic level of faithfulness and readability is done by carefully choosing which rules to apply from the catalogue of available rules. Some steps are shared by all pipelines, while others are specic to certain faithfulness levels. Figure 4 illustrates this approach. the output of a rule is the input for the next rule). From a methodological point of view, the XProc pipeline is a record of all the operations that the scholar performs on the transcription. The creation of an edition level is equivalent to replaying this record. Example 6 shows an excerpt of the XProc pipeline used to generate the semidiplomatic edition.

82
It is important to note that pipelines comprise three kinds of steps: 1. infrastructural steps: for example, the tokenize step that creates the textual super-structure to which all other steps will refer to; 2. rule-based steps: for example, the adjust-capitalization step that implements the set of rules to x the capitalization of words after changes in punctuation; and 3. exception-related steps: for example, the apply-punct-n1-subst step that handles all the exceptions to the punctuation rules that are relevant for the N1 level (i.e., semi-diplomatic). The fact that the editorial workows for all the editions are formalized in XProc pipelines makes it possible, for instance, to compare these pipelines and see in detail (and with utmost precision) how they dier and what is, in this project, the dierence between the processes needed to establish a diplomatic, a semi-diplomatic or an interpretative edition. Breaking down the traditional analogue processes into unambiguous discrete steps can contribute to the scholarly debate on edition typology.

Conclusions and Future Work 84
We have described the workow and the methodological approach behind a digital scholarly edition of the ENHG translation of Marco Polo's work. The key idea to this methodological approach is the encoding of all the scholar's knowledge (transcribed text, normalization rules for words and punctuation, exceptions, corrections) in a formalized way. Using this approach, it is possible to produce multiple editions at dierent levels of faithfulness and readability from a single master TEI le while keeping the master le lean and readable. In addition, the fact that the whole process is logged, and no manual modications are performed in the generated edition les, allows the scholar to rapidly make xes or test hypotheses. Finally, the sets of rules, being separated from the transcriptions, can be debated by the scholarly community independently from the edition.

85
Compared to other approaches, this pipeline-based approach greatly simplies the programming of rules that modify the structure of the text, leading to simpler and more succinct rules. Firstclass support for structural modications is necessary to properly handle the normalization of punctuation. While textual normalization can, in most cases, be carried out through simple textual substitutions, many cases of punctuation normalization instead require changes to the structure of the text, as well as follow-up adjustments, for instance when full stops are inserted and sentences split.

86
The adoption of our approach, as well as other similar programming-based approaches, forces a reection on the role of the editor, or more precisely, on the many new roles that the editor must embrace. First, the editor stops being a mere transcriber: his/her main task is now to record all textual phenomena (each normalization action is explicitly recorded in the source les). With a collection of recorded phenomena in place, the editor can turn these single occurrences into systematic rules and catalogues of rules, becoming a pattern-spotter and knowledge-synthetizer.
Once various such catalogues of rules have been established (e.g., for dierent languages, dierent time periods, and dierent scriptoria) and publicly shared, we envision future editors starting their editions by picking and choosing their preferred set of rules, thereby explicitly signaling in which philological tradition they are placing their work while maintaining the technical support needed to express their scholarly freedom to devise new exceptions and new rules. The publication of the rules and workows used (in the form of steps and pipelines) would then become part of the expected content of a digital scholarly edition.

87
In the future we would like to test various possible improvements. First we would like to experiment with creating declarative rule generators. Many rules are repetitive in their nature (for example, the normalization of single characters) and it should be possible to express them in a declarative fashion. These abstract rules would then be translated into XSLT transformations.
Another aspect we would like to reect on is how the transformation process directed by the pipelines inuences the various levels of abstraction of the document being transformed, drawing parallels with stratied document models such as CMV+P (Barabucci 2019). A nal thing we would like to test is the replacement of the XProc pipelines with pure XSLT pipelines (Birnbaum 2017).
Replacing XProc with XSLT pipelines would reduce the number of technologies that other scholars have to be familiar with in order to understand the editorial process in its entirety.

88
Another future development that we envision is the deconstruction of the visualization of the edition into a series of small, explicit steps, taking place one after the other, just like their counterparts in the pipelines: one click would show the eects of the normalization of the allographs, another click would expand the abbreviations, another one would introduce a dierent system of punctuation, etc, until the last click would nally reveal the scholarly edition in its completeness. Our approach, in which all the knowledge of the editor is formalized, recorded, and made actionable, would be a solid base for such a future fractally detailed edition.

89
We hope to be an inspiration for other edition projects. We rmly believe that all medieval manuscripts should be accessible in this way. We need to escape the constraints of paper editions and to start thinking dierently. As Sahle famously wrote, a digital scholarly edition is really digital if it cannot be printed without a loss of the original contents and functionalities. Only editions that fully embrace the digital medium and its tools, including formalization of knowledge and programming, can satisfy this condition.

90
In this environment where we have no space limits, why should we limit ourselves to the presentation of a predened number of rendition levels? Why only diplomatic, semidiplomatic, and interpretative? We believe, in fact, that the granularity can increase even further.
6 "A light-weight, open source tool specically designed to create digital editions from XMLencoded texts" (Rosselli Del Turco et al. 2013).
10 But things are dierent in digital stemmatology: see in this respect Andrews (2020).
11 After a long discussion on the importance of formal variance and on the overlapping between form and content, Contini (2014✝, 50) gives concrete indications on how formal variance should be dealt with in a critical edition: "a parità di condizioni, si adotta costantemente la forma di un testimone, scelto (ma per di solito apoditticamente) per ragioni o di antichità o di congruenza regionale o di sorvegliata organicità.… Non ci si sottrae all'impressione che la forma passi in seconda linea innanzi alla sostanza" ("all things being equal, one constantly adopts the form of one witness, chosen (but usually apodictically) for reasons either of antiquity, or regional consistency, or surveilled organicity.… One does not escape the impression that the form is in the second place with respect to the substance") (our translation).
12 "The extent of normalisation, as well as the rules followed by the editors, depends on their judgement and the methods they adopt." (Buzzoni 2020, 140) 13 "Es gibt bis ins 16. Jahrhundert und unter vielen Aspekten selbst darüber hinaus keine Varietät