Towards a Model for Encoding Correspondence in the TEI : Developing and Implementing < correspDesc >

The encoding of letters has a long tradition in the TEI but there have been no ocial recommendations in the TEI Guidelines on how to deal with correspondence. Two TEI customizations present exemplary models: DALF: Digital Archive of Letters in Flanders and Carl Maria von Weber—Collected Works (WeGA). These were the basis of the work of the TEI Correspondence Special Interest Group that formed a task force—consisting of the three authors of this article—for developing encoding guidelines. This article discusses correspondence theory in brief, letters as an act of communication, and how these aspects of correspondence can be expressed by TEI entities. The development of the communication-oriented concept of correspondence and its direct implementation in correspondence-specic metadata structures will be discussed. Central is the new wrapper element  (correspondence description), which stores key metadata about the encoded piece of correspondence. After addressing this rst question of how one should encode correspondence with the TEI, we will discuss the question of linking and interchange between projects and editions dealing with correspondence material. To facilitate this, the Correspondence Metadata Interchange (CMI) format has been developed by the TEI Correspondence SIG’s task force as a subset of . Finally, we will describe some organizational diculties in implementing the new elements and encoding model into the TEI Guidelines in 2015.


•
And, going a step further, how can correspondence editions most eectively be linked to one another?
1.2 Two Inspiring Examples: DALF and WeGA When DALF and WeGA each started, the TEI Guidelines were extensive and covered a wide variety of ways to encode manuscripts, but specic encoding guidelines for epistolary material were still missing.Because of this, both projects tried to address the problem by developing their own customizations and documentation for encoding metadata and text transcriptions.The Belgian DALF project, 22 a pioneer of encoding correspondence, was launched by the Centre for Scholarly as a model for several correspondence projects thereafter.In 2012, a trial update to TEI P5 was tested in conjunction with a re-evaluation of the DALF guidelines.They followed two principles: rst, changing existing TEI P5 elements as little as possible, and second, isolating the project-specic changes from the ocial TEI P5 elements whenever possible.This new customization was documented in "DALF: A Preliminary P5 Proposal" (Van den Branden 2013).
In 2011, before DALF's trial update to P5, the Carl Maria von Weber-Collected Works (WeGA) project, founded at the University of Paderborn and funded by the Academy of Sciences and Literature Mainz, had launched with access to many manuscripts and letters.TEI P5 was used right from the beginning and in the same year, "TEI P5 for Correspondence: A Recommendation for the Encoding of Correspondence Material" (Stadler 2011) was developed and made available to the community.
Subsequently, all development of the WeGA correspondence schema was publicly accessible at GitHub 23 and became a model for several correspondence projects that used TEI P5 from the outset.
In keeping with best practice, both projects tried to keep the changes to the existing TEI elements as minimal as possible.They dened a special wrapper element, rstly in order to keep correspondence metadata in a single place for convenient encoding and querying, and secondly to keep the rest of the TEI header untainted by these additions.In each case, the correspondence description was understood as forming a part of the description of the source (that is a manuscript, print document, or digital le).Hence the wrapper element became part of the <sourceDesc> element.

Encoding Example of Digital Archive of Letters in Flanders
Comparing the correspondence wrapper elements in both DALF and WeGA shows that the structure and the introduced child elements dier slightly.In the DALF project, the wrapper element <dalf:letDesc> (letter description) "[g]roups together all letter-specic metadata for a DALF document." 24The main child element is <dalf:letHeading> which "[c]ontains a structured description of bibliographical information of a letter." 25These consist of four key metadata elds that are dened as follows, each with new elements: • <dalf:letAuthor>: the author of the letter (no dierentiation was made between the notions of an "author" and a "sender"), • <dalf:letAddressee>: the addressee of the letter, • <dalf:letPlace>: the place where the letter was written, and • <dalf:letDate>: the date of the letter's origin.
Information on parties other than the author who were responsible for the content of the letter are encoded within the already existing TEI element <respStmt> within <dalf:letHeading>.
Besides these basic characteristics of a letter in <dalf:letHeading>, additional information can be provided in other new elements.These are restricted to: • <dalf:type>: the formal classication of the letter, • <dalf:envOcc>: the occurrence of envelopes, • <dalf:figOcc>: the occurrence of illustrations in the letter.
Extra notes on the letter can be added in the TEI element <note>.All other information, for example on provenance, physical appearance, or history of the letter, are put in <msDesc> (manuscript description), as there are already sucient TEI elements with which to encode such data.

13
Within <sourceDesc> the grouping is thus rst <biblStruct>, second <dalf:letDesc>, and third The WeGA oers a dierent, shallower structure which groups the key metadata parallel to each other directly within the wrapper element called <wega:correspDesc> (correspondence description).The <wega:correspDesc> wrapper "groups together meta data about the (historic) correspondence such as sender, addressee etc." (Stadler 2011).The variety of the selected metadata also diers slightly.Newly dened elements are: • the sender, encoded within the element <wega:sender> (here, a dierentiation was made between an "author" and a "sender"), • the addressee in the element <wega:addressee>, • the place where the letter was written in <wega:placeSender>, • the date when the letter was written in <wega:dateSender>, • the place where the letter was received in <wega:placeAddressee>, • the date when the letter was received in <wega:dateAddressee>, and • the position of the letter within the whole thread of correspondence in <wega:context>.
For capturing the beginning of the letter, the already existing element <incipit> was borrowed from the manuscript description and included within <wega:correspDesc>.As in the DALF guidelines, there is no encoding of information on the source and its history, which remain in the manuscript description.
Within the source description, the rst element is <wega:correspDesc>, and the second contains the bibliographic information (<msDesc>, <listWit>, <biblStruct>, or similar).Example 2 shows the encoding for a letter from WeGA: We put the focus on standardization of the editorial information and its encoding within the <teiHeader>, and discussed and selected the "core" correspondence-specic metadata that should be encoded-and thus be conveniently identiable-in one xed location in the TEI header. 19 The development of this model for encoding correspondence-specic metadata was inspired largely by the work of DALF and WeGA, for these projects had oered the most successful and "reusable" customizations thus far.We built on these customizations rather than starting from scratch and began with an evaluation of the overlaps and dierences between the two customizations.We followed both DALF and WeGA in their two basic assumptions: on the one hand, the TEI standard should be modied cautiously by adding only a few elements while not changing existing elements.On the other hand, there should be a new wrapper element to store the key metadata on the encoded piece of correspondence instead of scattering this information throughout the TEI header.This wrapper element is called <correspDesc> (correspondence description), and it covers correspondence-specic metadata only.All information describing the manuscript (or any other text-bearing object) still resides within the manuscript description <msDesc>.This also means that the element <correspDesc> alone does not provide a complete description of a letter or other piece of correspondence, but gives selected correspondence-specic data.A full description of a given letter is provided by <correspDesc> in conjunction with the manuscript description in <msDesc>.

20
During the course of developing <correspDesc> and rening the underlying denition of correspondence in general, the resulting correspondence-specic guidelines and the setup of the added TEI elements moved somewhat away from the concepts of DALF and WeGA.This, however, does not diminish their importance for conceptualizing <correspDesc> and the encoding guidelines for correspondence.

Theory
As outlined above, providing better support (and standardization) for the encoding of correspondence was one of the central goals of the task force.The process of conceptualizing was guided by two things: rst, a theory of correspondence in general, and second, the possible application of existing TEI elements (as well as the need for new elements) for a given set of letters or other pieces of correspondence.

Theory of Correspondence in General
We rst had to settle on what constitutes a letter or any other piece of correspondence, e.g., a postcard or a telegram.Inspiration came from many dierent sources, the signicant ones being Halsband (1958), Bluhm and Meier (1993), Rolo (1998), Barton and Hall (2000), Zeller (2002), How   (2003), Stanley (2004), Wiethölter and Bohnenkamp (2010), Berg (2011), Bohnenkamp and Richter   (2013), and Hankins (2015).Three aspects emerged as the central ones: The materiality of a piece of correspondence, the text of the message, and the "eventness" of the communicative act.

The Letter as an Object
It seems obvious that any piece of correspondence is tightly bound to its materiality.First, the available space on a text-bearing object not only limits the amount of the text, it also puts stylistic constraints on the message; compare, for example, a postcard with a letter, where on the former you will not usually nd such elaborate openers or closers as on the latter.Second, there are implications for the communicative act, since the object is part of the (social contexts of the) message-was expensive paper used or just some scrap of paper?-and some material forms of correspondence conceal their contents while others bear text visibly.
Since these material aspects play a distinct role for many editions of texts in general (editions of correspondence being a subset of these), the TEI fortunately already features a full-edged module for the encoding of manuscripts (and other text-bearing objects in general) with <msDesc> (cf. Pierazzo 2011).The particular characteristics of correspondences-those material aspects that are genuinely related to correspondence (e.g., attachments, enclosures, and envelopes)-can also be encoded within <sourceDesc>.A taxonomy of media types (e.g., postcard, letter, or email) may be specied in the <profileDesc>.

The Text of the Letter
For the encoding of textual content of a message, the TEI Guidelines (TEI Consortium 2015) provide a most comprehensive tag set.One can elaborately encode names, dates, places, and their relationships.One can, as well, document all sorts of editorial interventions and features of the copy text.Again, particular textual characteristics of correspondence can already be dealt with.
For example, the existing elements <postscript>, <opener>, and <closer> provide for encoding the prototypic text structure of a letter.Of course, this prototypic structure does not apply to all letters, nor to correspondence in general, so the aforementioned elements and their applicability are constantly called into question.Yet, as pointed out in section 1.3, this issue was not part of the task force's mandate but needs to be addressed in a subsequent step.

The Letter as an Event
Besides the material and the textual features of a message, its "eventness" is of exceptional importance.In general, an event introduces change to the associated parties and to the particular communication continuum in which the correspondence takes place (Bohnenkamp and Richter   2013, 4).Each individual message is not a sole entity but a reaction to some previous message, and triggers a reaction itself.For some, this "temporal sequence of the Before and After with something happening in the middle of those" plays a predominant role: "More important than the question of what happens is the fact that something happens, that is the mere carrying out" 26 (Stenger 2010,   30).In fact, the mere (written) text of a message was not emphasized until the seventeenth and eighteenth centuries, Stenger argues.Very often, "the surrounding circumstances-sent works, presents, works of art, other oral messages-show that the letter is embedded within a whole ensemble of communication media that made correspondence a multimedia process" 27 (ibid., 32).
Correspondence is frequently called a "half dialogue," a "conversation amongst absentees"; Janet Gurkin Altman, for example, speaks about "the letter's function as a connector between two distant points, as a bridge between sender and receiver" (Altman 1982, 13).This view has much in common with the classic model of communication as laid out by Claude Elwood Shannon and Warren Weaver in the 1940s (Shannon and Weaver 1949).The so-called Shannon-Weaver model can be reduced to the entities sender, receiver, and a connecting transmission.
We therefore tried to incorporate the basic principles of this communication model into our concept of what constitutes correspondence.Although, of course, not every letter follows the same basic pattern, it nevertheless seemed reasonable to concentrate on these key points: • sender • receiver

• origin location
• destination location • message(s) before and after Hence, the proposed correspondence description needed to provide information about persons (or organizations) as sender, receiver, or messenger.In addition to this, it needed to support the encoding of the respective dates and places as well as to provide a mechanism that would point at (or reference) preceding and subsequent messages.

Theory of Correspondence and TEI Entities
The previously outlined "eventness" of correspondence, that is, the communicative act associated with correspondence, can be decomposed into various "actions."These actions (which include sending, receiving, and transmitting) form the atomic events of a communicative act and are associated with people, dates, and places.

Persons
An act of communication has a sender of a message who is not necessarily identical to the author of the text of the message.In the TEI Guidelines, "<author> in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority."(TEI Consortium 2015, "<author>") 28 Obviously, this denition is connected to specic concepts of "authorship" and "work" which cannot be discussed in great detail here.Of course, it is possible in an act of communication or correspondence to send a copy of a Shakespearean poem instead of writing an original love letter; it is possible for a writer to send a publishing contract in order to discuss it with a friend; and it is possible to send a disembodied ear in order to threaten an enemy instead of writing a message at all.Still, correspondence and communication happen in each of these instances.These sorts of possibilities make it crucial to dierentiate between the "author" and the "sender" of a letter.
If author and sender are the same person-which in most letters will be the case-one can use a pointer to the corresponding part of the TEI header to indicate that.
Furthermore, correspondence generally has one or more intended (perhaps ctional) "addressees" or "receivers."It is important to point out that the addressee "spoken" to by the piece of correspondence is not necessarily the same person as the one who actually "receives" the letter.
One might also want to encode persons involved in the transmission of the letter, such as a dear friend entrusted with the letter, a personal messenger, or an ocial letter carrier.

Dates
A piece of correspondence generally has certain points in time when it was written and sent which are not necessarily the same, and may even be unknown, approximate, or indeterminate.One could also argue that the date of receipt of a letter is documented in some cases and should thus be encoded as well.

Places
Similarly, a piece of correspondence is generally sent from a particular location, which may or may not be the same as the place(s) of composition.A letter is also typically sent to a specic location.
Very often these bits of information are evident in the text of the message, in an address line, postmark, or in the automatically generated header metadata in an email message, so they can and should be captured in the encoding.

Transmitting, Redirecting, Forwarding
A piece of correspondence generally has a process or a medium and/or one or more executor(s) of transfer (messenger, carrier, postman, carriage, fax machine, Internet) and may be redirected (usually unread/without acknowledgement of the content) or forwarded (usually read/with acknowledgement of the content).Again, this vital information should be recorded in the encoding.

Context 36
Typically, a piece of correspondence is not an isolated entity but a (written) act of communication within a communication continuum (see Bohnenkamp and Richter 2013, 4), in which the correspondence is dened by its relative position between messages sent "before" and "after." Frequently, a piece of correspondence is sent as a reaction to another piece of correspondence and triggers an answer itself.This communicative thread does not necessarily have a simple chronological order, but may contain overlaps due, for example, to postal delivery issues or procrastination.Establishing this succession is a common editorial task; thus, the envisioned encoding model should accommodate detailed description of context.

Putting Theory into Practice 37
Starting from the main question, "What changes to the current TEI Guidelines would be needed to support scholarly encoding of correspondence?," and after considering the abovementioned communicational aspects, we nally tried to dene as few new correspondence-related elements as necessary while maximizing reliance on the already existing TEI framework.In spite of our initial idea of just implementing dedicated elements such as <sender>, <addressee>, <placeSender>, or <transmission> (based on the customizations used by DALF and WeGA), we were inuenced by the arguments of some contributors on the TEI Correspondence SIG list.
These contributors suggested that it would be more appropriate to understand correspondence as an involvement of various persons and responsibilities rather than just sender, addressee, or transmitter, and, therefore, that it would be best to use one overall term (<participant>) with dierent roles (e.g., @role = "author", "sender", "signer", or "co-signer").We created encoding examples with this model but soon reached its limitations because of the sole emphasis on persons.
However, in combination with a dierent suggestion-namely to wrap all information about the sending (or receiving) side of the correspondence in one element each and to encode names, dates, and places using existing TEI elements-we nally managed to develop a specication that seemed both theoretically justiable and practically useful.

<correspDesc>
The direct implementation of our communication-oriented concept of correspondence starts with the new element <correspDesc> (correspondence description) that stores the key metadata about the communicative act.This element has one or more <correspAction> and <correspContext> elements as children.All other contents are supplied using already existing elements.
Initially, we wanted to include <correspDesc> in <sourceDesc> (and by that means make it model.biblLikein order to create, for example, <listBibl>s for collections of letters).
After discussions with the TEI Council, we nally decided that <profileDesc> would be more appropriate.In spite of the "bibliographical" character of information like sender, date, and place of writing, correspondence-related metadata puts the emphasis not on specifying the source but on describing the text and its several aspects.It is therefore better described by the "nonbibliographic aspects of a text, … the situation in which it was produced, the participants and their setting" (TEI Consortium 2015, "<profileDesc>"). 29w is a very simple example of an encoded letter using the new <correspDesc> which "contains a description of the actions related to one act of correspondence" (TEI Consortium 2015, "<correspDesc>"): 30 It is the "atomic unit" relating to events of a given communicative act and gives exible opportunities to include all the information about associated people, dates, and places as described in section 2 by using existing TEI elements.The <correspAction> element "contains a structured description of the place, the name of a person/organization and the date related to the sending/ receiving of a message or any other action related to the correspondence."(TEI Consortium 2015, "<correspAction>") 31 Suggested values for its @type attribute are: • "sent" (information concerning the sending or dispatch of a message), • "received" (information concerning the receipt of a message), • "transmitted" (information concerning the transmission of a message, that is, between the dispatch and the next receipt, redirect, or forwarding), • "redirected" (information concerning the redirection of an unread message), and • "forwarded" (information concerning the forwarding of a message).
Example 3 above shows how to encode a single letter with the actions "sent" (a known author/ writer/sender sends the letter from a known place on a known date) and "received" (a known addressee/receiver receives the letter at a known place-but on an unknown date).Information about the context in which the letter was sent is also supplied (in this case by referencing the previous and next letters of the author/sender).

<correspContext>
The element "<correspContext> (correspondence context) provides references to preceding or following correspondence related to this piece of correspondence" (TEI Consortium 2015, "<correspContext>"). 32 It therefore identies the proper place of a particular piece of correspondence (a letter, for example) in the communication continuum, as it is dened by its relative position between messages sent "before" and "after."This may be very useful in capturing the correspondence network of a single person, where dierent letters written on the same day may be part of dierent discussions with dierent addressees, or where one and the same (forwarded) letter generates dierent answers from dierent writers.
Sometimes one will have to deal with pieces of correspondence within other texts (such as biographies) and use a combination of <div> and <correspDesc>.Very often it will be necessary to combine many dierent letters, for instance, while encoding an anthology.One could then encode the text of each letter in a single <div> in the <body> and link to the corresponding <correspDesc> (one for each letter) in the <teiHeader>.For facilitating this use case, <correspDesc> is made a "declarable element" (TEI Consortium 2015, sec.15.3.2,"Declarable Elements"), 33 and thus can be linked to by means of a @decls attribute on the corresponding element within <text>, as demonstrated in the following example: More examples of suggested encodings as well as unusual encoding challenges (such as emails, multiple senders/receivers, ctional letters, redirections, a single document comprising two acts of correspondence, or one act of correspondence with multiple witnesses) can be found at the SIG's GitHub repository. 35

Interchange Format
A crucial aspect of the development of <correspDesc> has been the wish to facilitate interchange -even interoperability-of the metadata from encoded correspondence texts, as there is a general and growing demand from correspondence projects for interchange and linked-data capabilities.
To provide for this, we developed a model of the <correspDesc> element in a concentrated form that is essentially a constrained subset of the full TEI standard.This Correspondence Metadata Interchange (CMI) format relies heavily on authority les and external standard formats.Authority les like the Integrated Authority File (GND) 36 or the Virtual International Authority File (VIAF) 37 are used for identifying persons and places, and the stricter W3C format 38 (being a subset of ISO 8601) is used for the encoding of dates.Including such standards can introduce new problems, however, when there are several authority les for one piece of information or when there are none.
Nevertheless, along with the development of the CMI format, a corresponding Web service using this customization was created by Stefan Dumont at the Berlin-Brandenburg Academy of Sciences and Humanities.This Web service called correspSearch 39 makes the correspondence-specic metadata of (to this day) seven German-and French-language correspondence editions searchable with one query.This service gives an idea of what is possible when correspondence projects are linked.The CMI format and the correspSearch Web service, still under development, aim to address in greater detail the question of linking and interchange between correspondence projects.

Organizational Difficulties and Achieving Official TEI Status 48
As already noted in section 1, a TEI letters/memos module was previously requested in 2004.It was deferred in 2007 (with the advent of TEI P5) with a note by Syd Bauman, a member of the TEI Technical Council: "I still think it is quite a good idea to have better support for letters & memos in TEI, and I am hopeful that the original poster, the DALF folks, and others will help make this a reality for a subsequent release of P5." 40 There is almost no information documented on the TEI Sourceforge ticket tracker for the time between the creation of the ticket and the closing of the ticket in 2007, so most probably the initial momentum was lost and not much activity followed.
In 2008, the Correspondence SIG was established and held its rst meeting during the TEI members' meeting in London.That meeting was rather successful in reaching out to and involving various people and projects.However, it took another year before the rst steps were taken.At the 2009 TEI meeting in Ann Arbor, Michigan, an initial task force was set up consisting of Markus Flatscher, Bert Van Raemdonck, and Peter Stadler.Their goal was to map the DALF TEI P4 customization to TEI P5 as a basis for further work on a correspondence customization; hence the code name "Dalfy." One of the problems for this task force was the transatlantic makeup of the task force, with Markus Flatscher located in Virginia (USA) and the others located respectively in Belgium and Germany.
Modern telecommunication oers a variety of tools for conference calls, but dierent time zones limit the intersection of work hours, and face-to-face meetings remain a much more productive way to embark on such a venture.Some eorts were made to acquire funding, but the task force failed to secure funding and no "Dalfy" mapping was developed.
At the Correspondence SIG meeting during the 2012 TEI Conference in College Station, Texas, a spontaneous "hack session" led to the "second draft for a correspondence ODD." 41 At the 2013 TEI members' meeting in Rome, the second task force named "correspDesc" was established consisting of-as mentioned above-the three authors of this paper.The organizational preconditions were more favorable this time since all members were native German speakers and several face-toface meetings could be arranged in Berlin thanks to the generous funding of home institutions.
This time, the goal was explicitly to continue the work that had been started in College Station and to create a formal proposal for a <correspDesc> element for approval by the TEI Council as a new element of the TEI standard.A GitHub repository for developing the ODD customization and providing access to examples and documentation 42 was set up, and the task force's work was documented on the SIG's wiki space.

51
The task force sought but did not receive substantial feedback from the wider TEI community during our work on the <correspDesc> proposal.This may have had to do with the communication channels we chose to use for interacting with the wider community: besides the SIG's wiki page and the GitHub repository, we used the SIG and the TEI mailing lists and were co-hosts of the rst ever TEI tweet chat.Perhaps a series of workshops would have helped us acquire additional input from other domain experts. 43The feedback we did receive during the process was quite encouraging, though, and in June 2014 we had the proposal-that is, the formal specication with documentary prose and examples-in good enough shape to open a feature request on the TEI Sourceforge ticket tracker. 44This initiated the ocial process of integrating <correspDesc> and related elements into the TEI Guidelines.Fortunately, one member of our task force was also a member of the TEI Council during this period, so the Council was constantly reminded about this issue, and queries about the proposal could be answered instantly by the task force member also serving on the TEI Council.
Nonetheless, the TEI Council had some concerns and suggestions, and the proposal required a few revisions to address the Council's concerns and obtain approval.In November 2014, this issue was Editing and Document Studies (CTB).In 2003, the DALF Guidelines for the Description and Encoding of Modern Correspondence Material (Vanhoutte and Van den Branden 2003) were published, with the encoding based on TEI P4.The DALF customization was extensively documented and served
41<correspAction> is the "heart" of the new model for encoding correspondence with the TEI.