Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format

This paper presents work in progress on the DTA “Base Format” for Manuscripts (DTABf-M), an extension to the DTA “Base Format” (DTABf) for the TEI-conformant annotation of manuscripts. The DTABf is a TEI-subset for the consistent, yet unambiguous, annotation of large amounts of historical text. During our work on the

Our approach is to base the DTABf for manuscripts on the established DTABf for printed texts, reusing the existing tagset wherever possible and only making changes (enhancements or reductions) when new tags are needed to represent phenomena exclusively found in manuscripts or printed texts, respectively. This is rather straightforward, since, as exemplied by gures 1, 2, 3, and 4, a large number of textual phenomena can be found in both handwritten and printed texts. Figure 1 (Hufeland [ca. 1829], http://www.deutschestextarchiv.de/hufeland_privatbesitz_1829/149). Figure 3. Example of textual structures in manuscripts and possible ways to annotate them with DTABf tags. (Hufeland [ca. 1829], http://www.deutschestextarchiv.de/hufeland_privatbesitz_1829/88). ( Hufeland [ca. 1829], http://www.deutschestextarchiv.de/hufeland_privatbesitz_1829/139). 14 As illustrated by gures 1, 2, 3, and 4, there are many structural similarities between handwritten and printed texts with regard to document structure, the arrangement of the text, and the meaning of layout specics. Furthermore, inline phenomena may also display similarities. For instance, even certain types of emphasis are at least comparable, if not identical, between printed texts and manuscripts. 15 One example of change in the style of handwriting is illustrated in gure 5. The style changed from old German script (Kurrent) for the general text body to Latin script for distinct terms like proper names or foreign language material. We consider this to be analogous to the change of font from Blackletter (Fraktur) to Antiqua types in print. The DTABf solution for changes from Fraktur to Antiqua typeface is to use the <hi> element with @rendition="#aq". This tagging can thus be used similarly for changes from Kurrent to Latin script in manuscripts (gure 5). 14 Figure 5. The proper names "Herschel, " "Bonpland, " and "Jupiter" are distinguished by the use of Latin script from the remainder of the text written in Kurrent.
[…] haben <hi rendition="#aq">Herschel</hi> und <hi rendition="#aq">Bonpland</ hi> den <hi rendition="#aq">Jupiter</hi> noch 18<lb/>[…] 16 Another example for similar inline phenomena in manuscripts and printed texts is the underlining of important phrases or keywords, represented in the DTABf as <hi rendition="#u"> for printed texts and manuscripts alike. Furthermore, though this feature is far more frequent in prints, manuscripts may also contain catchwords or signature marks at the bottom of the page, which we tag as <fw> with @type="catch" or @type="sig", respectively. (Anonymous [1827/28b], http://www.deutschestextarchiv.de/nn_n0171w1_1828/41). <p>[…] des vergleichen.</p><lb/> <p>Dieſe große Entdeckung trifft merkwürdiger<lb/> <fw type="sig" place="bottom">Phyſiſche Erdbeſchreibung <hi rendition="#aq">e</ hi>.</fw> <fw type="catch" place="bottom"><hi rendition="#u">Weiſe</hi></fw><lb/ >[…]</p> representation within the DTABf-M. Subsequently, we will give an overview of those manuscript-specic TEI elements which we have identied so far as necessary to be included in a DTABf extension for manuscripts (DTABf-M). Compared to the DTA corpus of printed texts with its more than 3,000 works, 15 the data basis for the DTABf-M is quite small (129 works). 16 Thus, the number of manuscript-specic elements, attributes, and values illustrated in the examples is not exhaustive and will not t any given manuscript precisely. However, in comparing the texts of the manuscript corpus it is possible to dierentiate common from less common features. Thus, though the DTABf-M tagset might still have to be augmented in the future, the proposed tagging solutions do cover phenomena common in handwritten sources and therefore should be applicable to other manuscripts and for the integration of further handwritten sources into the DTA corpora. 18 The DTABf had to be extended especially with respect to traces of the writing process, such as ad hoc corrections by substitution, deletion, or addition of characters, words, or passages, and change of hands or writing devices. These phenomena can be observed in many manuscripts.
Manuscript-specic extensions to the DTABf were applied following the principles of simplicity, consistency, and avoidance of ambiguousness, as established for the DTABf. To achieve the latter, DTABf-M specications were created not only on element and attribute level, but also with regard to attribute values that utilize specied vocabularies for phenomena in manuscripts (e.g., for typical methods of deletion or addition).

Deletions and Additions 19
Common features of manuscripts are corrections to the original text, carried out by adding or deleting textual material. Unlike in printed texts, where manual corrections are usually made on a text which has previously been nalized for the printed publication, corrections in manuscripts are part of the text creation and amelioration process. The TEI elements relevant to these phenomena are "Core Elements for Transcriptional Work" within the TEI Guidelines section on "Representation of Primary Sources" (TEI Consortium 2016, 11; 11.3.1.1). They include <add> (for additions) and <del> (for deletions), which can be grouped within <subst> to represent the substitution of a correct character or phrase for an erroneous one (TEI Consortium 2016, 11.3.1.4). Thus, while it was possible to leave these elements unconsidered for the DTABf for printed material, they became an immensely important part of the DTABf-M.

20
Superuous textual material in manuscripts may be deleted without substitute. Similarly, characters, words, or phrases missing in a written text may be added with no need to erase text.
[…] die man mit grossen Winden <del rendition="#s">einander</del> näherte […] 21 Besides crossing out misspellings or mistakenly-notated characters or words, it is also a common method within manuscripts to erase characters or passages by rubbing or scraping them out (gure 9). Figure 9. The term initially spelled "Esquimeaux" was altered to the more common spelling "Esquimaux" by erasing the superfluous "e" (which has become almost invisible in the manuscript).

23
The place where added text is meant to be inserted is often marked with some sign or arrow (see gures 10, 11, and 16). To enable the encoding of such signs in the manuscript text we additionally included the TEI element <metamark> in the DTABf-M tagset; the abovementioned gures provide transcriptions illustrating the usage of the <metamark> element.

Substitutions 24
Deletions and additions may also be parts of a substitution process, where erroneous text was deleted in favor of a correct version which was added to the original text. In such cases, <add> and <del> are grouped inside a <subst> element according to the TEI P5 Guidelines, as shown in gures 12, 13, 14, and 15 (TEI Consortium 2016, 11.3.1.5). Moreover, the following two examples illustrate the necessity of adding a value @rendition="#ow" (for "overwritten") to the DTABf-M which can be used in this context within <del>. Figure 12. Substitution of the characters "d" and "t" by overwriting the former with the latter. 20 (Hufeland [ca. 1829], http://www.deutschestextarchiv.de/hufeland_privatbesitz_1829/28).

Different Hands or Writing Devices 25
Another manuscript-specic phenomenon is the change of hands and/or writing. To distinguish the change of hands (i.e., dierent scribes) in the course of a writing process from the change of writing devices (i.e., the same scribe using, e.g., a pencil instead of their regular ink for certain alterations of the text), we use the <handNote> element in the TEI Header. 22 There, an @xml:id is assigned to the respective scribe, scribal act, 23 or writing device (in case the scribe cannot be identied). This @xml:id is then used as a referencing value of @hand in the transcription. Figure 16. In the second line to be seen in the image scan, the word "ein" ("a") and the words "der Lichtstärke" ("of the light's intensity") have been inserted. Whereas the first addition was written using the same ink and by the same hand as the remainder of the passage, the second addition was clearly written with a different, darker ink, but still by the same hand, i.e., by the same scribe, in this case Gustav Parthey (therefore labelled as @hand="#Parthey_darkInk").

27
Furthermore, we found in our manuscript corpus various occurrences for the phenomenon that words or phrases from one scribe were underlined by another, using a dierent writing device, as shown in gure 17. Here, the person responsible for the underlining could not be identied.
Therefore, the hand has the identier "#pencil", referring to both the scribe and the writing device. 25 Figure 17. The main body of the text was written with dark ink and two terms were underlined (here, presumably to mark them as questionable) by a different scribe using a pencil. 26 (Anonymous [1827/28b], http://www.deutschestextarchiv.de/nn_n0171w1_1828/216).

Different Types of Notes 29
Manuscripts-just like printed materials-may contain dierent types of notes, providing comments on the text or further information about it. While notes at the bottom of a page (footnotes) or at the end of a chapter or the text body (endnotes) are more typical for printed text types than for manuscripts, marginal notes at the right or left margin of a page may occur in both manuscripts and printed texts (gure 19).

30
In addition, notes in manuscripts may occur at several places other than the ones common for printed texts. They may, for example, be inserted inline or at the top or bottom of the text area of a page (gures 19, 20, and 21). We therefore introduced the additional @place values "mInline" | "mTop" | "mBottom" to the <note> element. 28 The usage of @place="mInline" within <note> was already illustrated in gure 18, while gures 20 and 21 provide examples for the usage of @place="mBottom" and @place="mTop" within <note>. Figure 19. A note has been added at the left-hand side of the manuscript stating that the document in question has been prepared according to the addressee's command.
<choice><orig>be fehl</orig><reg>befehl</reg></choice><lb/></note> Figure 20. This note, stating that isotherm, isothere, and isocheim must be well differentiated from one another, is recorded at the very bottom of the page. However, it is not referring to a certain point in the text as a footnote would, but is rather a general comment on the topic of the page. Therefore, we use @place="mBottom".
[…]<note place="mBottom">(Isotherme, Isothere (von gleicher Sonnenwärme wie <hi rendition="#aq">Moskau</hi><lb/>u. der Ausfluß der <hi rendition="#aq">Loire</hi>) = isochaimone Linien sind wohl zu unter-<lb/>scheiden.)</note><lb/>[…]  As for metadata, the DTABf already makes quite extensive use of the TEI Guidelines. Thus, most of the metadata information necessary for manuscripts in text corpora was already covered by the existing DTABf metadata tagset. There was only one signicant change to make: instead of <typeDesc> we added the <handDesc> element with its child <handNote>. 29 The @xml:id in <handNote> identies the hand or scribal act described and can be referred to from the @hand attribute within the document. The writing device of each hand or scribal act is specied within a @medium attribute.

Overview: DTABf Extension for Manuscripts 32
In section 4.2 we presented some examples of typical structural and textual phenomena in manuscripts. Table 1 provides an overview of the elements, attributes, and values which were introduced to the DTABf for manuscript encoding and of how they are related to already existing DTABf tags.

33
The proposed format, DTABf-M, is still a work in progress and will be subject to continuous development based on further manuscripts added to the DTA corpus. Development will occur in a manner similar to the approach used with the DTABf for printed texts: extensions will be performed cautiously, in a restrictive and minimalistic manner, and based on actual phenomena observed in the historical text sources (Haaf, Geyken, and Wiegand 2014-15, §60-61).

47
DTABf-All contains all possible values of @rendition.

Changes of Element Features 49
In manuscripts the title of a chapter or section is sometimes written on the right or left margin of a page. Those instances cannot be considered marginal notes, but represent a type of heading. Therefore, we introduced @type="rightMargin" | "leftMargin" in <head>. Since this is a phenomenon we haven't (yet) encountered in printed texts, the @type attribute in <head> is only needed for manuscript annotation. 39

50
To address this phenomenon, the element <head> is provided with the necessary attribute-valuepairs within its element specication in DTABf-All: handNote.html.
23 That is, dierent <handNote> elements can represent the same scribe using dierent writing devices.
24 Translation: "[…] show a bright center surrounded by an illuminated shell: in this illuminated shell an increasing and decreasing of the light's intensity can be observed, an ebb […]."