TEI, the Walt Whitman Archive, and the Test of Time

“Diachronic Markup and Presentation Practices for Text Editions in Digital Environments,” a 2014–15 transatlantic collaboration funded jointly by DFG and NEH, provided an occasion to test TEI’s relatively new provisions for encoding temporality. My experiences in attempting to apply these provisions to Walt Whitman manuscripts led me to conclude that chapter 11 of the TEI Guidelines should be significantly revised and that the basic dichotomy it introduces between and deserves renewed scrutiny. Some of the problems in the chapter can be traced to the way the Guidelines have evolved over time through a series of choices motivated by expediency. The Whitman Archive has, since its first foray into TEI markup in 2000, been anxious to find a way to leverage its potential to encode the temporal characteristics and relationships among various manuscript and print instances. The customization we created, however, has shown itself inadequate to describe and make processable important genetic information. Unfortunately, TEI also continues to provide inadequate guidance in the area of encoding genetic relationships, both between documents and within them. My experiments during the Diachronic Markup project included using to encode the sequence in which the text of a single document was inscribed and show that, although the Guidelines need to be developed with specific advice and examples, existing elements and attributes can be used to posit claims about the “way the page was filled,” thereby enabling useful processing and display.

to carefully reread chapter 11 of the TEI Guidelines several times and to discuss it with colleagues on the Faust project. Based on those readings and discussions, Brüning and I submitted a series of revision suggestions to the TEI Technical Council. Many of these revisions were what might be called, in TEI parlance, "phrase-level" tweaks and corrections that were implemented quickly and without need for much discussion. Others, however, involved at least chunk-level units. The most radical suggestion and the one that has generated the most discussion and debate is titled, in the TEI GitHub repository, "Restructure chapter 11" (issue no. 1427). 4 The process of reading the Guidelines, testing what they recommend, discussing results, and then proposing changes (which have sometimes been further discussed and negotiated) has resulted or will likely result in incremental improvements that, taken together, should make the Guidelines signicantly more usable. 5 That same process, however, has also convinced me that for the TEI Guidelines to become truly adequate to the task of creating digital genetic editions of various kinds, the content models for <sourceDoc> and <text> should be thoughtfully rationalized, so as to disentangle (or at least acknowledge the entanglement of) markup most appropriate to the document (<sourceDoc>) and that most appropriate to the text (<text>). 6 Doing so would require that the TEI Council be willing at least to consider the full range of options, not excluding those that would break backward compatibility. I say this knowing only too well that radical change is a sort of third rail for TEI, where, at least in recent years, incremental evolution has become the modus operandi. In most cases I believe this institutional conservatism to be a positive aspect of TEI and, in fact, important to its continued use and adoption, but in this case a fresh, large-scale reconsideration rather than a localized tinkering is in order, as it could help to solidify TEI's theoretical grounding, or at least better articulate it, for the benet of both present and future practitioners. 5 With regard to temporal encoding specically, however, there is ample scope to "build out" the Guidelines' facilities without attempting anything so fundamental and admittedly controversial as the redenition of <text> and <sourceDoc>. In what follows, therefore, I will focus on chapter 11's provisions for encoding a text's diachronic dimensions, by which I mean its temporal characteristics both within a single document (that is, the posited sequence in which the textual content of a particular document was inscribed) and those between and among multiple documents (the time-sequenced relationships of text across the various components that an editor sees as constituting its genetic dossier). 7 During the Diachronic Markup project we sometimes talked about these two related but distinct types or levels of diachronic relationship as the "micro" and the "macro," respectively. 8

6
Much of the weakness of TEI's provisions for conducting genetic editing generally, and for its inadequate address of the diachronic dimensions of texts specically, can be traced to the peculiar historical circumstances in which chapter 11 has developed. I would suggest, then, that a consideration of the genesis of the Guidelines themselves is in order, so that ongoing discussions of their remediation can proceed with a shared understanding of their historical context.

7
A second, closer look at the P5 v. 2 release notes reveals several signicant details. First, it is worth noting that although the Workgroup carried out its remit under the banner of genetic editing, the innovations that resulted from that work were largely aimed at, and conceived in terms of, better equipping TEI for the purposes of producing narrowly documentary transcriptions, with only a brief nod to an overarching genetic purpose: the lead story in the release notes is that "[s]everal new elements have been introduced to facilitate a more 'document-focussed' (as opposed to 'text-focussed') way of working" (TEI Consortium 2011). Of course, this characterization relies on the dichotomy between the semantic, intellectual signal (viz., the text) and the material, spatial properties of the carrier (viz., the document). In the French tradition of genetic criticism, diplomatic transcriptions have been a central feature. 9 It is probably for this reason that the transcriptions enabled by the revisions introduced in P5 v. 2 are frequently termed "genetic," even though they are for the most part concerned with the spatial layout of text without any necessary regard to genetic relationships. Also in this regard, I would point out that the release notes mistakenly (though tellingly) refer to the new container element as "<document>," making quite clear the connection between documentary editing and the sort of transcription provided for. The release notes' explanation of the primary function of this new element is similarly telling: "to organize digital images" in the way previously allowed by <facsimile>, but now possible in tandem with a transcription.

8
Another instructive oddity of the release notes is their surprising reference to chapter 11 as the "chapter on Physical Transcription," a slip that unwittingly evokes an earlier state of the Guidelines: The current chapter 11, "Representation of Primary Sources," began, in the P3 version of the TEI Guidelines, as chapter 18, "Transcription of Primary Sources" (Sperberg-McQueen and Burnard 1994), a title that it retained through the publication of P4 (TEI Consortium 2002). 10 The change from "transcription" to "representation" was necessitated by the inclusion, in the 2007 release of P5 v. 1, of the new elements <facsimile> (with its descendants <surface> and <zone>) and <msDesc> (with its descendants <msPart>, <physDesc>, etc.) (TEI Consortium 2007). Members of the TEI community had long lamented the Guidelines' deciencies for doing codicological work, a shortcoming that the P5 v. 1 release began to address by adding both an entirely new chapter, "10 Manuscript Description," and a relatively modest section at the beginning of what had been chapter 18, now retitled "11 Representation of Primary Sources" (TEI Consortium 2007).
In its earlier incarnation this chapter was explicitly restricted in focus to "the transcription of primary sources, particularly manuscripts" (TEI Consortium 2002), but it was now expanded to demonstrate the use of <facsimile> as a way to provide a nontranscriptional representation of the object of interest either alone or as complement to a transcription.

9
Having broadened the purview of the chapter to accommodate such a treatment, the Council no doubt viewed the subsequent request for a method to allow specic components of a transcription to be topographically located-a need of those wishing to create more faithful diplomatic transcriptions-as most easily accomplished by a further retooling of the existing <facsimile> and associated elements, along with the addition of several new elements, such as <metamark>, <transpose>, and <retrace>. In this way, the chapter took on an emphasis on layout not so much through careful planning and building out of what was embryonically present to begin with, but by a sequence of choices characterized by expediency.

10
I recount these aspects of the historical development of chapter 11 not simply to bemoan problems that this evolution has wrought. Rather, I suggest that not only is there a logical explanation for the continuing (though unfortunate) lack of solid guidance for those wishing to encode the temporal aspects of a text's coming-into-being, but that the chapter's developmental history has helped to obscure the lack. In short, although chapter 11 was chosen as the location of the revisions that grew out of the Workgroup on Genetic Editions, and although its contents are largely discussed in reference to genetic editing, the chapter brings with it a continuing slant toward physical layout, with provisions for encoding temporality still relatively poorly developed. an investment in its use for a specic project. In the past two decades, much of my work has been devoted to using TEI, in its various iterations, to edit manuscripts for the Whitman Archive.
Whitman was-and is-many things to many people: formal innovator of free verse, one of the rst "truly great" American writers, pioneer of literary gay identity, poet of the working person, and master of self-promotion, among others. Certainly part of Whitman's fame, in both his time and our own, can be attributed to the creation of an unusual public persona. Ed Folsom has claimed that "no author's life in the nineteenth century was more continuously photographed than Whitman's" (1986,(2)(3), and it was partly through the widespread distribution of the photographs he particularly liked that Whitman worked to construct what he believed to be his best public self.
In a similar vein, I have come to believe that every presentation regarding the editing of Whitman's works should make an opportunity to present the following image, which provides visual evidence that the poet was accurate in describing his living situation: "The oor … is half cover'd by a deep litter of books, papers, magazines, thrown-down letters and circulars, rejected manuscripts, memoranda, bits of light or strong twine, a bundle to be 'express'd,' and two or three venerable scrap books" (Whitman 1892, 517).  One of the ideas that this image vividly conveys is the multiplicity of Whitman's poetry collection, Leaves of Grass. In 1855 Leaves was an oddly tall volume of about one hundred pages, containing twelve poems and a long preface in prose. The last incarnation, which Whitman prepared shortly before his death in 1892, was a book more conventional in appearance, containing nearly four hundred poems on over four hundred pages and a (dierent) long epilogue in prose. Throughout the three and a half decades between those rst and last versions, Whitman's masterwork was always a protean entity, moving through a series of transformations. Poems were revised; poems were added; poems were deleted; poems were retitled. The initial prose preface was turned into poetry. Poems were combined or shued into new groupings. Groupings were added, discarded, or retitled.

15
One of the rst conversations that I can recall from around the time that I joined the Whitman Archive in 2000 concerned the hope of leveraging the power of the computer to analyze these various and multiple ways in which Leaves of Grass and its constituent works transmuted over time, across various manuscript and printed instantiations. At least part of the appeal of using the Text Encoding Initiative's scheme (embodied in P3, which I came to call "the green books": see Sperberg-McQueen and Burnard 1994) was its potential utility for such analysis. We found, however, that the markup they oered was inadequate to the task of indicating what seemed to us the most basic facts about the relationships among the various iterations of Whitman's masterpiece: namely, that a certain poem in one edition developed from a certain poem in a previous edition or manuscript. Our desires to represent such relationships prompted a series of project-wide discussions that eventually led to the decision to add to our TEI DTD the custom elements <relations> and <work>, with the goal of specifying such genetic relationships. These elements we included in our transcriptions of textual instances. Here, for example, is a relevant snippet, slightly simplied, from one of our TEI-encoded transcriptions of a manuscript draft: resulted in a less-than-ideal record from which any would-be Whitman editor must work. Still, the genetic dossiers of some of Whitman's individual poems or essays-especially those that date from his last years-is relatively fully extant. Far fewer draft materials have survived from the 1850s than from the 1880s, and manuscripts that predate the rst edition of Leaves of Grass in 1855 are fewer still. It is, alas, these earliest manuscripts that are of greatest interest to most Whitman scholars, who would welcome evidence that might shed light on what has been one of the great "mysteries" of Whitman scholarship: what Whitman Archive editors Ed Folsom and Kenneth M. Price have described as "his transformation from an unoriginal and conventional poet into one who abruptly abandoned conventional rhyme and meter and … exploited the odd loveliness of homely imagery, nding beauty in the commonplace but expressing it in an uncommon way" (n.d.). As Ralph Waldo Emerson famously wrote, the remarkable poems of the 1855 edition "must have had a long foreground somewhere, for such a start," but we have rather scant concrete evidence about the specic ways in which the text of that edition developed. 12

18
Although not many of Whitman's early poems are attested extensively in the manuscript record, the manuscripts that do exist are often fascinating (sometimes in part because of the riddles that arise from their being partial). For the purposes of the Diachronic Markup project, I was fortunate that at least a passage from one of those rst-edition poems exists in multiple manuscript forms.
That passage, from the poem eventually titled "The Sleepers," has the added distinction of being of particular interest to Whitman scholars generally as evidence of Whitman's treatment of racial issues. At least ve dierent manuscripts related to this passage exist. 13 Four of these are individual leaves in the collection of the University of Virginia; the other is contained in a notebook at the Library of Congress that includes, besides the three consecutive pages of writing relevant to the passage from "The Sleepers," much writing that is unrelated.

19
After a brief examination of images of these documents, nearly anyone would be able to surmise that they are related to one another. However, the exact "nature" of those relationships is far less apparent. The nding guide for the collection that houses the four manuscripts at University of Virginia (probably originally written by the famous descriptive bibliographer Fredson Bowers) describes their relationships thus: the rst one shown below (gure 3) "seems to have come rst or second"; the next (gure 4) "was probably second," the next (gure 5) "appears to have come third"; and the one in gure 6 "seems to have been the last" (Special Collections, UVA Library 2011).     These details, plus the relatively close resemblances between leaf 10 and the published passage, make the scenario outlined in the nding guide seem quite plausible. Nevertheless, other scenarios are also imaginable. In fact, the University of Virginia nding guide itself oers one of them: On leaf 10 "the word Sleepchaser's appears in the upper right corner," and this word's similarity to the title "Sleep-Chasings" (rst used in the 1860 edition) might indicate that, instead of inscribing leaf 10 just after leaf 11 and just before the printed version appeared, Whitman wrote leaf 10 after the rst printing-that it "is in fact a reworking of the section for the 1860 edition" (Special Collections, UVA Library 2011). Ordering of the other four manuscripts similarly turns out to be not entirely straightforward (as one might infer from the speculation that leaf 9 came either "rst or second").
My point here, fortunately, is not to argue for a particular genetic sequence but to illustrate the fact that the "dossier" with which the genetic editor must work often oers ambiguous evidence regarding the order in which documents were inscribed and, hence, might oblige one to oer multiple genetic descriptions as possibilities.

22
Our use of the home-grown elements <relations> and <work> to connect a particular documentary instance (manuscript or print) to a work has proven useful, most notably as the enabling presence behind our "Integrated Catalog of Walt Whitman's Literary Manuscripts" (EAD Project Team, UNL n.d.), a union nding guide generated from individual EAD-encoded nding guides for several dozen separate collections. 14 While it has been useful, however, our system is in no way adequate to the task of representing the sort of genetic details that I've outlined for the "Sleepers" manuscripts. Our encoding simply declares that the document currently under consideration (or some division of that document) is an instance of a specied work. That's all it does. Of the many things it does not do, perhaps the most fundamental is that it oers no machine-processable way of ordering the materials-of saying that a particular manuscript was the rst draft, another was the second, etc.-let alone of indicating alternative possible orderings. I sometimes shake my head at our lack of foresight in neglecting to design such a provision, but then I remember that during the nineteen years we have been using this markup the TEI community has not developed any consensus facility for associating, in a documentary transcription, all of the various instances with a given work. If the Whitman Archive were coming to TEI for the rst time today, in late 2020, we would search the Guidelines in vain for advice about encoding the genetic relationships among the various manuscript instances of "The Sleepers." The Dossier Level). Either approach, however, would require further development to specify such things as types of relationships, degrees of certainty, and alternative theories. No already existing RDF vocabulary appropriate to describing genetic textual relationships seems to exist; likewise, to my knowledge, nothing like a universally applicable set of values for the components of a directed graph has been devised.
What the TEI Guidelines do have to oer for the encoding of diachronic information pertains to the "micro" level, that is, those traces of the passage of time that exist within the bounds of a single document. In addition to the use of the @seq attribute to specify an order among the components of <choice>, the P5 v. 2 Guidelines introduced a means of characterizing the temporal relationships among a broad range of textual alterations via <listChange>, whose <change> children can be cross-referenced via the @change attributes of the various elements used to mark up blocks of text in the transcription proper. This method (though, importantly, not the <change> element itself) was developed for the encoding of "revision campaigns" (i.e., the inferred rounds/sessions during which specic portions of a single document were inscribed), and the Guidelines discuss the markup of revision campaigns as the primary use of this set of elements and attributes. Of the four authors whose manuscripts formed the corpus under consideration in the Diachronic Markup project (Whitman, Goethe, Virginia Woolf, and James Joyce), Goethe's writing processes most easily lend themselves to such an approach. Whitman's individual manuscript documents rarely exhibit traces of having been inscribed in more than one writing session. That is, there are usually no changes of hand or medium to suggest that he made an initial inscription, paused, and then came back at a signicantly later time or in a distinctly dierent "t" to make changes. Unlike some other authors, he had no consistent habit of using, say, ink during an initial drafting phase and pencil during a proofreading one, nor did he employ scribes, as Goethe, for example, sometimes did. The lack of such typical kinds of evidence for "revision campaigns" per se, however, does not mean that the diachronic dimension is utterly imperceptible within Whitman's manuscripts, only that the time frame is collapsed. The manuscripts are very frequently a tangle of writing and revision, and therefore pose signicant challenges to understanding the processes by which they came into being. On occasion, I have spent hours examining a single manuscript image, slowly developing a theory of how the page was incrementally inscribed. 15 Editors who want to record conclusions and/or posit conjectures about such processes might well wish to encode a sequential ordering of the parts of an inscription within a single "revision campaign." Although it is possible to use <listChange> and other elements to encode such time-dependent relationships, the Guidelines, as currently written, oer no clear direction for how this should be done.
I am uncertain whether this logic is scalable to a larger set of Whitman documents, let alone to documents created by others.

28
More immediately, however, I was uncertain whether the logic actually was machine-readable, that is, whether it was both consistent and complete enough to allow some sort of algorithmic translation. Specically, I hoped that two of my colleagues at the Center for Digital Research in the Humanities, Brian Pytlik Zillig and Jessica Dussault, could make use of the temporal information in my encoding to produce a meaningful display of the manuscript text. The grant period was very nearly at an end, so my expectations were modest, but in the few days that remained they did, in fact, create a JavaScript program that successfully uses the code to represent, albeit imperfectly, the temporal sequence of inscription via animation. Astute readers will quickly recognize that this animation doesn't get everything right, and I remain uncertain how much of the work to perfect it might depend on changes to the encoding and how much on further development of the script. 17 However, when I consider it as purely a proof of concept, I nd this to be a surprisingly powerful and satisfying rendering, one that suggests how an editor's assertions about the time-dependent sequence of inscription within a document might be brought to life with an immediacy that no amount of apparatus criticus prose could. 18 30 I believe that work remains to more fully implement the aspirations and recommendations of the Workgroup on Genetic Editions, especially with regard to diachronic markup, and I will continue to advocate for further development of the TEI specication. 19

NOTES
1 The term "diachronic" came into the eld of editing from linguistics and signies the historical dimension of texts (i.e., their change across time). In this way it contrasts with the "synchronic" dimension, which is static and ahistorical, like a snapshot. Marina Buzzoni has discussed the relevance of the diachronic and synchronic in digital editing in terms of Gianfranco Cantini's requirement that an edition "aim at injecting history into the critically reconstructed text by taking into account the dierent synchronic states that make up its diachronic dimension" (2016, 62).
5 As I was revising this essay for publication in August 2020, in fact, I was notied that the TEI Council was planning to address issue no. 1427, our request that the chapter be restructured, by issuing a new release of the Guidelines with some signicant changes to the prose of chapter 11.
Among the revisions, the several new introductory paragraphs are especially welcome, as they go some way toward making explicit the basic dierence in approach between transcriptions within <sourceDoc>, which emphasize "spatial features," and those within <text>, which emphasize "logical textual structure" (TEI Consortium 2020).
6 The distinction that I am referring to (and advocating) here is the one perhaps most succinctly articulated by Peter Shillingsburg, who says that "document" refers to "the physical vessel … that contains (or incarnates) the text," whereas "text" refers to "the series of words and pauses recorded in a document" (1996,174). 7 I use the term genetic dossier here as a synonym of avant-texte to refer to all of the documents that contributed to a published work. In doing so, I follow the usage of, for example, Dirk Van Hulle (2016,46) and Elena Pierazzo (2015, 14;2016, 49). Hans Walter Gabler notes, however, that the two terms are sometimes contrasted, the former used to mean that which comprises "transcription and image in conjunction," thus oering "the document perspective on what, from the text perspective, is named 'avant-texte'" (2018,210). In fact, although in the hands of later practitioners of the French school of genetic criticism "avant-texte" has often been associated specically with texts rather than documents, as originally coined by Jean Bellemin-Noël it seems to have been conceived as document-centric, referring to "the totality of the material written for any project that was rst made public in a specic form" (2004,31).
8 Note that I am using these terms to indicate temporal and notional relationships dierent from those indicated by the microgenetic and macrogenetic levels Van Hulle describes in his ve-part scheme (the others being exogenetic, endogenetic, and epigenetic), where the contrast is essentially a matter of size or wholeness: the macrogenetic level involves "the genesis of the work in its entirety across multiple versions," and the microgenetic "one specic textual instance across … versions" (2016, 50). 9 Pierazzo calls attention to some of the problems of printed genetic editions, which have not only been "criticized as unreadable, unusable, time-consuming, and, in general deceptive" (2009,171) but also as "incapable of represent[ing] the time ow" (2009,, even though, as she notes, the editors of Genetic Criticism: Texts and Avant-textes state that genetic criticism "aims to restore a temporal dimension to the study of literature" (Deppman, Ferrer, and Groden 2004, 2).
10 Equally meaningful is the interpolation in the mis-title of the word physical, which was not part of any actual version. The term physical transcription does, however, pretty accurately reect the foregrounding of diplomatic transcription in the 2011 version of the chapter.
11 IDs are recorded, along with regularized titles, in separate les, one per work (e.g., poem or essay). A third custom element, <workParent>, is used in these les to indicate constituency relationships (for example, to declare that a poem is one of several component parts that constitute a series).
12 The quotation comes from a private letter to Whitman, which he published widely (without Emerson's permission), even including it in some of the copies of the rst edition of Leaves of Grass.

13
The Whitman Archive has, in fact, recently completed work, funded by an NEH grant, to publish a "variorum" edition of the 1855 Leaves of Grass that connects each poetic line in the rst printed edition to known manuscripts ("avante-texte") as well as to signicant textual and spatial variations in dierent copies. In the course of that work, we identied two additional related manuscripts. These do not aect the substance of my point here, so I have not included them in the discussion. The Leaves of Grass (1855) Variorum is available at https://whitmanarchive.org/ published/LG/1855/variorum/index.html.
14 See https://whitmanarchive.org/manuscripts/nding_aids/integrated.html. It is worth noting, however, that the nding guide draws on the connections between instance and work as recorded in EAD records rather than in TEI-encoded transcriptions.
15 Hans Walter Gabler has provocatively analyzed this process of "ascertaining the order in which the manuscript pages were lled" as essentially iterative, moving "loop by loop, from document to text in ever ner steps of granularity" (2007,201).
16 It is important to acknowledge that, although populating <listChange> might be "simple," decisions about what constitutes a discrete "step" are never simple. For example, in the case of a substitution, should each deletion and each addition be a separate step or should the two be grouped as a single <change>? For my purposes, I chose the latter approach. It should also be noted that, in the encoding of the transcription, I chose to forego <subst> altogether, since substitutions are already implied by the corresponding @change values on the <del> and <add> elements.
17 Dussault and I have only recently resumed conversations about this and have begun exploring other, non-JavaScript alternatives for processing the TEI. As we are both doing this work in our "free time," however, progress is slow. We would, of course, welcome suggestions or other constructive input.
18 I was, in part, inspired by work done by Elena Pierazzo and a group of collaborators to produce a prototype interactive rendering of pages from one of Marcel Proust's notebooks. By encoding <zone>s and assigning to them a relative order, the project team was able to "create an interactive, accessible interface" (2014b, 18). Unfortunately, their prototype appears to be no longer available. Attempts to visualize the development of a text across documents have been more common, though they face their own signicant challenges, of course. Such projects as 19 In making my case for further development I am mindful of James Cummings's sage discussion of common "myths, misconceptions, and misunderstandings" about TEI, wherein he addresses "the criticism often levelled at the TEI" of being "too simple or too general." He rightly points out that such complaints are often born of a "supercial understanding of the scope of the TEI" or a lack of understanding of "the way in which any individual element can be further rened by using attributes or nested levels of encoding." Diachronic markup, however, is not such a case but, rather, an example of a "real instance[]" in which the TEI "provides only a general level of markup for some phenomena" because "the TEI community has not pushed the standard forward yet to greater detail in this area" (2019, i58, i61-i62). What I am calling for, then, is a harder collective push.