Collaborative Encoding of Text Genesis: A Pedagogical Approach for Teaching Genetic Encoding with the TEI

The paper discusses the question of how genetic encoding can be taught in order to introduce encoding strategies of text genesis to less technologically adept scholars. The TEI Guidelines have oered ways to encode manuscript genetics for almost a decade (chapter 11.7). However, this topic is considered to be relatively advanced, and is rarely covered in introductory courses or teaching materials. The paper is based on two encoding workshops at the University of Vienna in dierent settings. Participants produced a dossier génétique (genetic dossier) in a collaborative writing process. They had at their disposal several writing tools (pens, typewriters) and dierent kinds of paper, in order to produce a text with several layers, while the whole process was recorded on video. The products were then digitized, and each group got digital copies of a dierent one. Participants analyzed the specic phenomena of the primary sources, such as additions, corrections, deletions or scribal hands. The participants were then introduced to the necessary encoding strategies to represent the genesis of texts. After the introduction to the transcription tool Transkribus, the students transcribed one of the texts, exported it, and enriched it further with Oxygen Author using a dedicated framework.


Introduction 1
This paper presents the results of a pedagogical approach for teaching genetic encoding with TEI.
We report on two workshops, in which we introduced beginners in digital text encoding to the challenging task of textual criticism. In particular, we discuss the question of how encoding text genesis using TEI can be taught to technologically less-experienced students and scholars.

2
The TEI Guidelines have oered ways to encode manuscript genetics for almost a decade now, but even the more recent teaching materials for electronic text encoding and the TEI do not cover all aspects of genetic encoding, especially when it comes to the encoding of interrelated versions of a text or the identication of more complex revision campaigns. There may be two reasons for this: even though manuscripts are fascinating research objects, they are very hard to work with; and one has to know about the context of the manuscript and be able to reconstruct the writing process of its author. In short, it takes a lot of training and experience to be able to work with complex manuscript material properly.
3 of genetic edition projects. To address this need, we tested an approach based on active learning to engage participants early in the learning process of text encoding with more advanced chapters of the TEI Guidelines.

Genetic Criticism 4
Genetic textual criticism, the identication of textual variants and the reconstruction of the genesis of a (literary) text, is one of the more challenging and interesting tasks philologists currently face. The genetic approach aims to reconstruct the writing process and, in a way, to understand what might have happened in the author's mind while he or she was writing.
Therefore, scholars are not merely concerned with recording "what is on the page," as they do when producing a detailed diplomatic transcription of a document; they also use this evidence in order to understand how "what there is on the page" might have come into being. On the document level, genetic criticism therefore tries to capture the various traces of the writing process, to reconstruct the "micro genesis" of a document. But scholars are also interested in documenting and reconstructing the interrelations of several documents, forming a so-called "dossier génétique," a genetic dossier (see Grésillon 1999). They aim to understand the "macro genesis" of a whole text.

5
For ten years now, the TEI Guidelines have provided ways to encode manuscript genetics. Means to encode the relevant traces of the writing process can be found mainly in chapter 11 on "Representing Primary Sources." The encoding of textual variants can be achieved with the encoding strategies described in chapter 12 on the "Critical Apparatus." Interrelations between documents, which form the basis for understanding the macro genesis of a text, are still seldom encoded. The Critical Apparatus is used to encode textual variance. However, if one wants to foreground not the dierences between witnesses, but the similarities in witnesses that vary to a great extent, encoding them using the strategies described in the chapter on the critical apparatus is not feasible. There have been proposals to encode interrelations of individual witnesses as graph structures, for which the TEI oers a set of dedicated elements in the chapter on "Graphs, Networks, and Tree" (cf. TEI Consortium 2018).

6
More and more projects are focusing on the genetic aspects of texts in their editions. Several recent research projects demonstrate how text genesis can be done within the TEI, e.g., the Samuel Beckett Digital Manuscript Project, 1 the Shelley-Godwin Archive, 2 the digital Faustedition, 3 and the Theodor Fontanes Notizbücher. 4 ) 7 Within the Guidelines, the chapter on "Representation of Primary Sources" that describes the elements and methods for encoding phenomena of interest to scholars of genetic criticism has an interesting genesis itself. To better understand why some of the concepts were not included, or were only partly included in more recent teaching materials, we should consider how this chapter of the Guidelines was extended for the purposes of genetic editing. Tags used to represent typical characteristics of primary sources like additions <add> and deletions <del> were already part of P1 (TEI P1).

8
It took several years to revise the TEI Guidelines to cope with demands of genetic criticism. The initiative to overhaul the chapter on "Primary Sources" was a result of work done by the Special Interest Group (SIG) on Manuscripts. A subgroup of this SIG, the "task force on Genetic Editions" proposed "An Encoding Model for Genetic Editions" as a TEI customization. The resulting ODD was written in 2009 by Elena Pierazzo and Malte Rehbein, with contributions by Lou Burnard, Gregor Middell, and Moritz Wissenbach, and proposed to the TEI Council in 2010 (Pierazzo et al. 2010). 9 It is helpful to go back to this draft version to understand how the recent versions of the Guidelines implement the elements proposed by the SIG in a more general manner. For example, the proposed <ge:stageNote> to describe a writing stage, a concept known to scholars of genetic criticism, became the more general or "neutral" <change> in <listChange>. Whereas the names of some of the proposed elements were altered and their scope extended to a wider range of use cases, most of the examples put together by the authors of the proposal ended up in the Guidelines. This becomes problematic, because while some of the examples are taken from "real world" literary texts, for example the Walt Whitman Archive, 5 and the Manuscripts of Henrik Ibsen, 6 there are some which are especially constructed for illustrative purposes. Most notable in this regard is the example demonstrating the use of <undo>. The Guidelines demonstrate the application of this element to a handwritten sentence-an "imaginary" example, as the caption reads. While this example is perfect for illustrating the use of <undo>, on a meta-level it addresses a crucial point that we want to explore further in this paper: the "imaginary" example is something especially constructed for demonstrative and pedagogical purposes and stands in harsh contrast to other examples, taken from archival material. While the above-mentioned sentence is very easy to read and shows only the relevant textual phenomena, the other illustrative material displays a variety of hands and comes from a variety of languages. This case shows the diculty of nding illustrative material to demonstrate certain textual phenomena and possible encoding strategies.
When teaching encoding, one is always in need of illustrative examples and torn between two approaches. One may take "real world" material, which may overwhelm the participants because they have to decipher a hard-to-read hand or read text in a foreign language or vernacular, and because they have to know a lot about the writing processes of its author and the writing tools used. Alternatively, one may use examples constructed for teaching purposes, which are easier to work with but have the problem that, later on, students might be daunted when they have to use their newly gained experience on "real world material."

Teaching Materials for Text Genesis 10
There are several resources for learning text encoding with the TEI. In the following section, we discuss two outstanding courses that oer teaching materials for text genesis. 8 TEI By Example 9probably the most widely-known resource introducing beginners to the TEI-in the tutorial section uses an essay by school child "Hannah Renton" as the material to familiarize learners with digital textual criticism. Why did the authors of this tutorial choose this three-page essay as an example to introduce the encoding of primary sources? It obviously has some advantages over the "real" examples of manuscripts taken from digital archives. The handwriting is relatively clear, and the genesis of the text is not too complex. Learners will most likely understand how the text was produced because they have an understanding of the writing process. They have probably written similar essays themselves and had them returned with marks from their teachers, which in this case results in two scribal hands being identiable in the manuscript.

11
Whereas TEI By Example provides an excellent introduction to the encoding of manuscript material by explaining the use of the basic tagging in a transcription within <text>, the tutorial could not cover the above-mentioned more recent developments of the TEI in regard to genetic criticism.
The development of the tutorials began in 2006, and a sneak preview was published in mid 2009. TEI By Example was already nished when the work of implementing the means for genetic editing into the Guidelines was done. are not too dicult but for inexperienced students and sometimes even for scholars coming from non-English-speaking backgrounds it is still challenging to read them because they feature various historical writing styles and are in Latin or English. In general, genetic processes are not included in the teaching materials in great detail. This fact served as a starting point for our workshop: is it possible to teach genetic editing with the TEI in an introductory course by focusing on encoding of the writing process?

Teaching Genetic Encoding 13
In 2017, at the University of Vienna, we oered two workshops in which we taught genetic encoding to experienced practitioners as well as beginners. Since we agree with Brett Greatley-Hirsch, who stresses the importance of collaboration and a learning-by-doing approach in digital humanities teaching (Hirsch 2012), we centered the workshops around a process of collaborative text production. Participants produced their texts in several writing stages, using a variety of writing tools in small groups. The resulting genetic dossiers were then exchanged between the groups, encoded, and their genesis reconstructed. The whole text production process was recorded on video, which was then used to compare the encoded genesis of the texts with the actual events.
14 Thinking of how the participants of a workshop could benet the most we chose "active learning" as our pedagogical approach. Active learning is understood as a learner-centered rather than teacher-centered approach to teaching. Instead of passively listening to an instructor's input, students complete tasks on their own and even solve their problems themselves (Bonwell and Eison 1991). The instructor simply guides this process and supports the discovery of solutions to the problems. As Biggs and Tang argue, "Teaching is not a matter of transmitting but of engaging students in active learning, building their knowledge in terms of what they already understand" (Biggs and Tang 2011, 22).

15
In learning to encode, an active learning approach has many advantages. Firstly, it is learning by doing. Even students with little or no experience in text encoding can quickly become acquainted with the TEI and take their rst steps by practicing. Secondly, the group setting encourages students to help each other and to learn by explaining dicult issues to others. It is crucial that instructors are available to help and explain. Students should have the opportunity to have their results checked by the instructor to prevent them from learning incorrect solutions and problemsolving patterns. Thirdly, students are working on a palpable problem, thus acquiring practical skills and gaining high problem-solving competency. Fourthly, students are continuously provided with scaolding during the process, in order to reduce the complexity of the task. For example, in working with manuscripts of literary texts from digital edition projects, participants face many challenges at once; while working with their self-produced texts, some of these diculties (for example an unknown script) can be circumvented. In addition, the authors of the original materials are there to be interrogated and thus can be asked about the actual writing process.
In our case, we followed a bottom-up principle: participants were provided with solutions only after they encountered challenges. On a meta level, we employed the method of "participatory observation" (Hennink et al. 2001, 179-185): we took part as instructors and later, with the help of several means of data collection including audio and video recording of the workshops, we reected on the whole process. We also interviewed the participants and asked them for their feedback.
In German-speaking countries, comic strips are commonly used as a means of learning how to write a story and serve as a writing stimulus. We wanted our participants to focus on text production and prevent them from writing texts that would be too complex or sophisticated. We opted for a well-known comic strip with six frames-the rst comic from the series "Father and Son" by Erich Ohser. The strip is known under two titles "Vater hat geholfen" [Father helped] or "Der schlechte  The rst workshop took place on May 12th, 2017. The turnout was of a manageable size, with six participants and three instructors. The attendees turned out to be a heterogeneous group.
For example, half of the participants had prior experience in non-digital textual editing; a very experienced professor was the only one in the group who had actually worked with a typewriter before; a 25-year-old student had no prior editing or TEI experience. We asked the participants to install the transcription software Transkribus and the Oxygen XML Editor on their personal computers beforehand so that we could start without troubleshooting installation problems.

18
At the beginning of the workshop, we asked our participants to sign a consent form so that we could record and analyze the writing and encoding process as well their results. We planned the workshop for four hours and managed to nish it within time. After the workshop, we discussed the results in an informal setting.

19
The workshop was divided into ve phases, which included practical aspects as well as theoretical input. Our idea was to start immediately with an active task, and we began the writing process.
The rst phase (1) focused on producing a text that served as the material for the encoding task.
The participants were divided into two groups with three people in each. Both groups were given the same comic strip as a stimulus for producing a text. On the basis of the image, the participants were instructed to collaboratively write a text using dierent writing tools. We asked them to create several versions, including at least a handwritten draft, a typescript with corrections, and a nal version. Each group was equipped with a typewriter and had various writing tools such as pens, crayons, markers, pencils, and highlighters at their disposal. They could also use carboncopy paper for the duplication of their typescript, which turned out to be a technical challenge for one group. The text production process was recorded on video with two cameras. At rst, the participants reacted a little hesitantly to the unexpected task of having to produce a text based on material they knew from primary school, but they soon adjusted to the situation and began to enjoy it. Later they described working in a group as inspiring and helpful. We observed dierent approaches in the text production: One group worked in parallel on their versions, the other one took turns. This resulted in two dierent kinds of genetic dossiers. After 25 minutes of text production, we collected the dossiers and scanned them while the participants had a break.
We then exchanged the texts and provided the groups with scans. They were asked to order the material and think about how the texts had been produced. The dossier of the group in which more participants had worked simultaneously was harder to reconstruct. The second phase (2) of the workshop was dedicated to transcribing. We introduced the participants to the tool Transkribus. This is a platform for the transcription of handwritten documents which has been released by the University of Innsbruck and can be downloaded as a desktop version for free. We used this tool because it is a quick way to produce a linebased transcription that is linked to the image les via coordinates. The tool also allows for collaboration, so that the group could work together on their transcriptions. Transkribus can export TEI-XML, making use of <facsimile> with <zone>, and includes the transcription in <text>. Another advantage of the tool is its tagging functionality, which we used later to add some basic markup to the transcription. Unfortunately, Transkribus did not support nested tags of the same type at the time we used it, which prevented us from doing most of the encoding process within the Graphical User Interface. The tool is still under active development, and at the time we used it, unfortunately, there was a bug which resulted in the loss of some attributes added to the tagging when exporting to TEI. Although it is easy to use for beginners the tool itself is not designed to produce an embedded transcription in <sourceDoc>, but in <text>. We exported all texts with the option "Line tags" resulting in an incorrect intermediary encoding of text lines as <l>, but this allowed for an easier transformation to our desired base-encoding.
The XSLT that handled this transformation took the lines from <facsimile> and produced the corresponding embedded transcription in <zone> and <line> within <sourceDoc>. It also added the base structure for encoding of <handNote> and stages within <listChange> in the <profileDesc> of the <teiHeader>. The stylesheet was included as a transformation scenario in a framework for the Oxygen XML Author, which the participants went on to use in the following step. In the third phase (3), after having produced a basic transcription of the documents, we demonstrated how to export the transcription as a Word, PDF, and TEI-XML document. Then we gave a short introduction on how to use TEI to encode genetic phenomena using examples from the previously produced material and to explain the TEI Guidelines so that the participants could refer to them if they had questions about how to encode certain phenomena. Some of the basic tags had already been introduced when tagging with the transcription tool: <add>, <del>, <retrace>, <undo>, etc. We pointed the participants to methods for encoding of writing stages, which could be demonstrated on the typescript with added corrections, and we explained how metadata could be recorded in the <teiHeader>. We mentioned approaches to encoding information on scribal hands in <handNote> and on revision campaigns in <change>. To put this knowledge into practice, the participants encoded the texts using the distributed framework for Oxygen XML Author. 12 They added scribal hands to the elements and tried to identify text stages. After that, the participants connected the genetic phenomena to <change> elements in the header with the corresponding attribute @change. The framework supported them while encoding by enforcing a predened schema, and the encoders could switch between the Author and Text modes for visual feedback on their work.

22
The fourth phase (4) consisted of a nal discussion of the results with the whole group. Together we also watched some parts of the recorded video. We then realized that one would need at least two cameras per group-one capturing a close up of the writing process, and another one recording an overview of the group work-to grasp all aspects of the writing process fully. Even though the recordings did not fully capture the whole process of text production, the groups still had the opportunity to discuss their reconstructed writing process with the authors in the other group and then to reect on the means to encode their observations. We fell short on practically encoding the interrelations of the witnesses produced, but we informed the participants of possible strategies using graph structures to encode them. Six months later, on October 31st, 2017, we tested this pedagogical approach to genetic encoding in a dierent setting. The second workshop had to be adapted so that we could work with a group of 30 students for only 1.5 hours in an introductory course in Digital Humanities. In the weeks prior to the session the students had worked with the tool Transkribus and done some basic TEI encoding. We wanted to test whether and how we could t the adapted workshop into a regular university lesson, which lasts 90 minutes in the Austrian university context. Because the group was larger, not all of the students could participate in the text production on the typewriter, so a small group of four people produced the text that served as material for the others to encode.
Meanwhile, we explained to the remaining students how to encode text genetic phenomena. For this short introduction to the representation of primary sources, the encoding of writing stages, and identication of scribal hands, we used examples of a typescript with handwritten corrections.
After the group of students had completed their text, they reported on their writing experience, especially on working with the typewriter, which is an unfamiliar tool for this generation. We had a portable document scanner at our disposal, and we produced scans of material and distributed them to all participants. All students then tried to reconstruct and encode the material in smaller groups of 4 to 5 people. They discussed the typescript and produced a basic transcription in Transkribus before switching to the XML editor for further enrichment. The task was not to encode the whole text but to focus on the parts with traces of the writing process. We also had the group that produced the text themselves encode their material with the help of one instructor so that it could later serve for comparison to the other groups' solutions. 24 We were pleasantly surprised that the shorter version of the teaching sequence seemed to be successful even in the limited time and with the bigger group-with the prerequisite that more than one instructor was present to handle the introduction to the encoding of text genesis and the text production in parallel. However, this format could still be modied slightly, given that the results were not yet completely satisfactory. For instance, the limited time meant that the comparison of the group's encoded text with the model solution and the video of the writing process had to be given as homework. We also noticed that this kind of teaching approach needs to be well prepared in the previous lessons and reected on in the following one. Students could be better prepared by introducing the means for encoding text genetic phenomena beforehand, especially because the whole spectrum of possible genetic phenomena that might appear in the text produced cannot be easily handled within 90 minutes. The presence of several instructors was helpful because they could support the students in looking up the elements in the Guidelines and applying what they learned to the encoding. Furthermore, it would be desirable for all students to participate in the writing process, but due to logistical limitations, this was not feasible in this particular case. The result of the writing process was only one witness, so it was only possible to focus on the micro genesis. Still, an advantage of having to deal with only one text is that instructors have time to produce a model encoding themselves or can at least address common mistakes in the group's solution in class.

Results and Outlook 25
During the workshops, we again realized that encoding textual genesis is a very demanding and complicated task. Therefore, it is necessary to nd ways to reduce the complexity for teaching purposes. In the given context we tried to make the task easier by working with selfproduced material, which eliminated the problem of not being able to read the language and/ or handwriting. None of the participants reported any problems in deciphering the material produced by colleagues, nor were there any problems in understanding the language or the content of the text. Working with this kind of material thus reduces typical barriers when working with original material from editorial projects. Furthermore, when learning how to encode phenomena in a pedagogical setting, it helps when there is a denite solution or right answer.
In particular, students want to know if the way they solved a task was the correct one. Having the authors of the text present can help in clarifying aspects of the textual genesis. As mentioned before, in a limited time it is easier to focus on micro genesis (at a document level) than on macro genesis (the inter-document level), as the second workshop showed. This may also be one of the reasons why the existing teaching materials discussed in section 3 focus on the encoding of micro-genetic aspects. Working with a typewriter allows students to produce a text with clearly distinguishable writing stages in a single document, thus allowing encoders to reconstruct the genesis of a text based on material evidence. The typewriter played a very important role in both workshops, at times seeming to be the real protagonist of our experiment. In fact, the very presence of this writing tool fascinated the younger participants who, as several of them remarked, had never used one before. On the other hand, we also registered a feeling of pleasant surprise on the part of more experienced participants, who enjoyed telling about their rst encounters in early days with this instrument. So, interestingly enough, the typewriter managed to capture the interest of all the learners, linking them with one another and giving the experiment a more craftsmanlike and authentic tone. This experience might be extended into contexts in which participants experiment with producing manuscripts using early writing tools such as quill pens.

26
Even if working with a typewriter did not directly benet the encoding skills of the participants, they still gained an understanding of writing processes with dierent media and of text production in general. This observation might be of some importance, especially when training encoders to work with material produced with historical writing tools: to really understand how what is on the page came into being, it is necessary to have an understanding of the writing tools used and how they function.

27
Our experiments have shown that having students produce the material for encoding themselves not only reduces several obstacles in the sense of pedagogical scaolding but can also be a motivating factor in such a teaching setting. It is essential not only to think about the editorial and technical aspects, but also to take into consideration that a deeper understanding of the writing processes should be conveyed as well. Allowing participants to experience the process themselves may support learning. The workshops also convinced us that with this approach even tricky aspects of encoding can be tackled with beginners. However, we think that a clearer methodology for conducting research on the pedagogy of text encoding in general is needed, especially when it comes to validating the eectiveness of a chosen pedagogical approach. It is dicult to recreate a prior pedagogical setting under the very same conditions.

28
Searching for available teaching material on the encoding of text genesis has shown that there is apparently a need for more (online) teaching material on genetic encoding. Workshops of this sort will always attract only a small audience at a venue or institution, are time-consuming in the preparation, and are quite logistically demanding to realize. Bringing resources like TEI By Example up to date with the current development status of the Guidelines and probably supplementing them with video content, e.g., of a text production process, which could later be encoded, would probably lead to a more solid foundation in the pedagogy of genetic text encoding.

INGO BÖRNER
Ingo Börner studied Russian and German studies in Vienna and Moscow. He currently works at the Austrian Centre for Digital Humanities at the Austrian Academy of Sciences, where he is involved in the creation of the "Critical Edition of Arthur Schnitzler's Early Works." His research interests are in the areas of digital scholarly editing and digital humanities.

ANGELIKA HECHTL
Angelika Hechtl studied Russian, Ukrainian and German Studies in Vienna and Moscow. She currently works as a teaching and research associate at the Vienna University of Economics and Business.