Curating Object-Oriented Collections Using the TEI

This article considers the possibilities and challenges in using TEI-based XML markup for curation of objects mentioned in historical documents such as catalogues and inventories, but also in unstructured forms such as diaries and personal correspondence. It takes as a case study documents related to early modern collections of curiosities. It first considers how far the current guidelines for manuscript description can be generalized for encoding other kinds of material objects and their contexts. It then examines what more is required for treating mentions and descriptions of objects in historical documents. It argues that the core affordance of curation for such materials is the ability to identify and select what constitutes a mention of an object and to relate that mention to its immediate context, including its relationships to object groupings.

narratives) are typically unstructured data, commonly discursive, often in continuous prose, but sometimes also in fragments. For this reason, the TEI is a good place to begin looking for metadata solutions for curating object-oriented documents and texts. 8 At this point, I need to return to the ongoing consideration of what the TEI might do to support representation and documentation of material objects. In recent years, the direction adopted by the Ontologies SIG has been to explore how far the current guidelines for representing the manuscript as material object could be extended to handle other kinds of objects: in the rst instance, text-bearing objects (for example, a vase), but also other kinds of objects. This means that the rst eorts are being applied to the header module of the TEI Guidelines (P5). What follows is an attempt to use an object-oriented case study to see how far the TEI as it presently stands could be generalized to apply certain Manuscript Description (<msDesc>) elements more broadly in a document to account for references to non-text-bearing objects. Although most of these elements belong to the Manuscript Description module and are intended for header metadata, some are also allowed within <body> in the context of elements that are commonly used for structured prose: <p>, <ab>, <seg>, or <list>. 9 The rst consideration, then, is how generalizable the manuscript object is and how a TEI treatment of it maps onto other kinds of objects. There is a great deal that does not apply to other kinds of objects: catchwords, signatures, and watermarks, for instance, are all specic to manuscript and printed books. The entire sections on heraldry (10.3.8) and rubrics, incipits, explicits, and other quotations from the text (10.6.3) apply only to manuscripts. There are, however, several elements that could be used to treat objects of any kind, and others could be employed if the rules governing their application could be modied: • <material> "contains a word or phrase describing the material of which the object being described is composed." 8 Of course, all material objects are composed of something.
• <desc> "contains a brief description of the object documented by its parent element." 9 This is a better, more generalizable option than <objectDesc>, which is limited to text-bearing objects and must be contained by <physDesc> within a header.
• <dimensions> contains a dimensional specication. Dimensions can be further particularized with <height>, <width>, and <depth>, or more generally represented with <dim>. Similarly, the @quantity attribute applied to <dim> would be useful in cases where more than one object is concerned (e.g., "ten shells"). Another option for recording quantity of objects is <num> with @value.
• <objectType> "contains a word or phrase describing the type of object being referred to." 10 This element could be useful for simple statements of a type or class of object: "a dog," "a necklace," "a monstrous sh." • <origDate> "contains any form of date, used to identify the date of origin for a manuscript or manuscript part." 11 A better treatment of dates associated with objects, however, might be the more general <date> element with an array of values for @type indicating, for examples, date of "origin", "exchange", or "creation".
• <trait> "contains a description of some status or quality attributed to a person, place, or organization typically, but not necessarily, independent of the volition or action of the holder and usually not at some specic time or for a specic date range." 12 Of course, this could apply equally well to a material object, but as suggested above, the more general <desc> is preferable regardless.
This much applies to the object itself, but there are also elements that pertain to relevant context and circumstances: not just the what (the object itself), but also the who (<persName>), where (<placeName>), and when (<date>). It is also possible to associate @role with <persName> as with <placeName>. All this adds up to a structuring of an <event> in the life of an object; however, <event> is intended principally for use in the header within <listEvent>, and therefore its application is awkward within the body of a document. It must be nested in <listEvent>, but in the sort of documents I am modeling here, there is no list, only a mention of a single event. It also must contain a <label>, and therefore its application is not as nimble as <name>, for example.
An alternative to <event> is <incident>, which can be applied almost anywhere within the <body>, but the semantics of this element are not a good t: "any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events aecting communication." 13 For documenting an event, then, one might best resort to a stand-o solution, either in the header of the TEI document, or in RDF, or in a database.

10
An important dierence needs to be observed in how these elements might be applied in the body of a document rather than in the metadata of the header. In the header, the values for these elements can be supplied by the encoder. In the body of a document, there must be strings of text already present to which these elements can be applied. In keeping with the general principles of the TEI Guidelines, my intention for the body of the document is, as much as possible, to mark the content as presented in the original document, to let the document's structure speak for itself.
I want to capture, in the rst instance, the way in which the document represents, expresses, articulates, and categorizes the objects it references. As we will see, this distinction leads to some complications in how we model the markup of mentioned objects in a document.

11
Now let us see how well these apply and to what degree they account for the qualities and circumstances of a mentioned object. The following is a fairly complex example of a documented object from an early modern collection of curiosities. It comes from a seventeenth-century collector's manuscript catalogue of his own collection. Charles II as nancier-that could be represented with @role in association with <persName>. Still on the theme of <event>, we have a <date> for the rescue and, implicitly, for Bargrave's acquisition of the objects. There is no date of origin for the objects, but we could approximate one for this set of artifacts (mid-seventeenth century), though there is no text to that eect. With respect to place of origin, we could use <placeName> to mark "the north-west (whether passage or no passage) of America, in the West Indies" with a specication of @type or @role as <origPlace>. There is clearly an <event> alluded to here-in fact, two events: the freeing of slaves and Couley's gift of this set of artifacts to John Bargrave in gratitude. In this case, one might nest these two events in a <listEvent>, which makes some sense, given that they are related events; but a list implies parallel items, while in this case, the depiction of the event of gift-giving encompasses the mention of the rescue, so that there is at once a contained and a subordinate relationship, where the rescue is mentioned as a cause to explain an eect (the gift-giving). One can also imagine cases where an event occurs in the context of another, encompassing event, which would imply a nesting of <event> s.

12
With respect to the objects themselves, <objectType> could be useful and appropriate for marking the "shass," "girdle," and "gaiters," as well as the "chains"; however, in early modern documents concerning curiosities, object type is not always stated. In many cases, it is not clear whether a mention of an object should be understood as a type of object. Often an object is mentioned by a name, such as "Nautilus," where the type would properly be understood as something like a "shell." In such cases, the type of object would have to be indicated as an attribute. We also have an indication of <material> in the porcupine quills that were used to make the "shass," "girdle," and "gaiters" as well as a <trait> in the adjective "curious."

13
A full accounting of the application of current TEI elements and attributes would require more extensive modeling than this paper can allow, but this is enough to demonstrate that much can be generalized and applied to mentions of or reference to objects, although there are some limitations in applicability. There is, however, one much more fundamental need in our adaptation of the TEI.
This need is for something analogous to the naming function of <name> and its variants as it is used to identify mentions of other real-world entities, but in fact, the need in this context extends far beyond simple naming: it involves the very function of selection described above.

Identifying the Span of a Mention of an Object 14
At the most basic level, for documents that refer to, mention, or describe real-world objects, we must be able to identify what constitutes a reference to an object, identify a string of text that can function as a verbal identier (e.g., a "name" in the loosest sense), and then correlate with that named object all the relevant context. This is the rst step in treating historical documents with a curatorial sensibility. This identication is crucial for the function of selection. We must be able to identify what constitutes an object (i.e., a mention of a real-world object) in a text so we can select it and then do things with it: correlate and compare it with other mentions of the same real-world object or with other mentions of similar objects; build events involving that object in relation to other entities (named people and places) and temporal indicators (dates); attach annotations to it; or simply represent it in some curated way for an imagined user.

15
The context of collections of curiosities highlights the challenges posed by diversity of materials and forms in this respect. The case of a simple mention of an object analogous to the mention of a person or place name in a novel, for example, is easily managed. A mention of the Eiel Tower can be treated much like the mention of a named person with a <persName> as <name type="object">, but the forms of reference represented in the documents related to early modern collections are not so straightforward. Here we are not typically dealing with proper nouns. Objects are not typically named or even namable, but rather described. The closest we come to a name is the identication of a type or class of object: a loadstone; a canoe; a unicorn horn. A more exible solution, given the semantic restriction of <name>, is the <rs> (referencing string) element, which will form the core of what I describe below.

16
The simplest case can serve as a starting point. An inventory of objects typically consists of little more than the mention of an object in a list. The TEI has semantics to handle this documentary form.
<list> <item>1 Siluer peece guilded of Charles I K of England.</item> <item>A Jubile peece of siluer, with the porta Santa.</item> </list> (Canterbury Cathedral Library and Archives 1658) In each case the mention is discrete and selfcontained; each item in the list constitutes a reference to a real-world object. The text string contained in each item is small enough that the user can quickly process and understand the sort of object that is being referred to. But, as noted above, lacking @type, there is no way to indicate this item is an object. So, we need some other way to identify what constitutes a treatment of or a reference to an object.

17
An issue arising from this example is the question of when an inventory becomes a catalogue, and whether a catalogue should be considered a list. The question is pertinent in the context of seventeenth-century collections. According to Paula Findlen, the emergence of the catalogue in the early modern period went hand in hand with the development of collections of curiosities, marking a signicant generic and epistemological departure from the inventory: Inventories record the contents of a museum. They quantify its reality, listing the objects without attaching analytical meaning to them. Catalogues purport to interpret. Their appearance in the late sixteenth century further suggests how novel the practices of Renaissance collectors were. (Findlen 1994, 36) The practical application of this distinction is dicult. In the case cited above, are we dealing with an inventory or a catalogue? Does the identication of a coin as gilded silver and as minted under the authority of Charles I constitute analysis? A related question is how to classify the structure of this document in TEI terms. Given Findlen's denition, an inventory is most certainly a list. But can a catalogue be a list? The Oxford English Dictionary considers the identication of "catalogue" with a "list, register, or complete enumeration" now obsolete or archaic, 14 noting that a catalogue is "[n]ow usually distinguished from a mere list" (OED 2). 15 This being the case, how do we identify what constitutes an entry in a catalogue, even if entries are numbered like items in a list? Unlike more familiar prose forms (e.g., the novel or the monograph), these documents tend to be fragmented and segmented. Sometimes paragraphs are clearly identiable in catalogues, but more commonly we are dealing with chunks of text for which <ab>-an "anonymous block" of text "analogous to, but without the semantic baggage of, a paragraph" (TEI 2013)-is more appropriate, or perhaps <seg>, and "arbitrary segment" which "represents any segmentation of text below the 'chunk' level" (TEI 2013 This entry, despite its length, seems to identify itself as an <item> belonging to a <list>, at least in TEI terms. In other cases, the scope of the text span treating an "item" in a catalogue might be so expansive as to constitute a full paragraph or even multiple paragraphs comprising an entire <div>.

18
Often the mention and description of an object, whether extensive or brief, are embedded in the context of continuous prose. In these cases the specic treatment of an object needs to be distinguished and separated from its broader context in order to dene the limits of its full mention. So then, the rst step is to identify the parameters of what constitutes a mention of an object, where a mention might include not only a reference to an object (analogous to naming) but also what is said about that object, whether as an item in a list (<item>) or in a full paragraph (<p>), an arbitrary, discrete chunk of text that is not exactly a paragraph (<ab>), or a segment within one of these larger elements (<seg>) or perhaps across more than one paragraph or anonymous block of text. We then need to indicate that each mention pertains to an object. We can use @type="object" to say, in eect, that this portion of text (as dened by one of the above elements) constitutes a complete mention of an object, but here we again run into limits in the TEI because the <p> and <item> elements cannot take @type. Moreover, semantically it might not make sense to give a type value of "object" to a paragraph. It might not make sense either to apply @type to an <ab> if it is understood to be a dened chunk of text "analogous to a paragraph." It does make semantic sense, however, to say that an "item" is an "object," and perhaps also that a <div> represents, structurally, the treatment of an object.

19
The most semantically neutral option for identifying the full context comprising a treatment of an object is the <seg> element with an @type attribute of "object". Another, less semantically appropriate, option would be <rs>, which seems to apply at the phrase level (as appropriate for a noun phrase referencing an object) rather than to a "chunk" of text as <seg> does. The <seg> element could be embedded in any of the structural elements identied above. One might attach the @type element to those structural elements that allow it, but for consistency's sake, it makes some sense to use an embedded <seg> in all instances. There are two advantages to doing so. First, it would allow distinctions between more than one object treated within a single <p>, <item>, or <ab>. Second, it would also be possible to link <seg>s that are part of a continuous or even discontinuous treatment of an object across document structures such as <p> or <div> (see TEI 2013, sect. 16. "Linking, Segmentation, and Alignment").

20
Another challenge is, again, in the semantics of the TEI. Exactly what might be meant by an "object" entity is somewhat ambiguous. In the clearest TEI manner, we are interested only in mentions of a real-world entity, as when "name" is used to identify a mention of a person. The work of correlating that mention with a real-world entity is managed either by a name list in the TEI header which provides commentary or, as in my case, by an authority database that correlates name (using @ref or, alternatively, @key) with biographical information about the referent. But such correlation is a challenge when we do not have named entities. And yet it is crucial to express, in some way, that the linguistic act in question is one of referencing.

Creating a Handle 21
We need, then, to be able to identify the core element of the reference, the object as distinguished from the things that are said about it in context. We need to give it a "handle" by which it can be known and by which we can grab it (i.e., select it For reference and analysis, it is helpful to be able to locate quickly and conveniently the core reference to the object in question by means of a string of text that can serve as a handle (analogous to a name), such as "small ancient lamp," or "small bottle with a long neck."

22
Following the principal of adopting (and if necessary, adapting) what is already available in the TEI, we might consider rst the possibility of using <name> with a @type="object" as we do for the mentions of people and places by name. There are a few limitations here. Our needs for "naming" most certainly stretch the semantic intent of the <name> element: these are not names. Very often there is nothing even analogous to a name, so it will be necessary to use an arbitrary string for identication.

23
There are a couple of options for identifying a string of text to serve as a convenient "handle." One option, which semantically suits the present case, is <objectType> because most of these objects are not named so much as categorized. They really are "types": a lamp; a girdle; gaters; a loadstone; a bird's skull. However, <objectType> is not allowed a @type attribute (probably because it is itself a type), but in the case of collections of curiosities, this attribute would be convenient.
For example, objects in seventeenth-century collections were commonly classed into three types: natural, articial, and objects that fall between these two categories (natural-articial hybrids).
It would be useful to have the option of making such distinctions. Moreover, in some cases, the text string might not connote a type of object: for example, "a coin" or "the horn of a rhinoceros." A more general and exible alternative is the <rs> (referencing string) element, which is more "general purpose" and perhaps less semantically specic than the <name> element. It is simply a string of text that references a real-world object, and this element can bear @type and @subtype attributes, enabling two levels of categorization for an object: <rs type="object" subtype="artificial">a very ancient AEsculapius, in brass</rs> But again we run into complications. How do we determine the extent of a handle-that is, what constitutes the minimal core that is the referencing string proper? In many of these cases, it will be dicult to determine what should be included: "a very fair small ancient lamp," "small ancient lamp," "ancient lamp," or just "lamp?" The question is crucial because everything else that comprises a mention or treatment of an object (minus the naming) is context. In other cases, it is hard to identify which string of text constitutes the core of the reference, and thus an appropriate handle. For example: Leg with hoof of an animal of the deer family. Perhaps the Elk or Elend of the Germans (The Book of the Principal of Brasenose, in MacGregor et al. 2006, 97) The handle should probably be <rs type="object">Leg with hoof of an animal of the deer family</rs>, which seems to be the core of the reference: the rest is added in apposition to this handle, even though it provides a more precise (if tentative) identication of "Elk or Elend." And then, how does one create a handle when the class of object is assumed (as in, for example, a list of coins) and where the contents of an entry are simple reports of what the object contains?
King James I on the obverse and his son on the reverse. Can one create a handle that is meaningful in cases where there is no meaningful noun to serve as the core?

Relation between a Mentioned Object and its Context 24
For curating in TEI, then, we must be able to dene what constitutes a mention of an object and be able to identify a handle for ease of reference and analysis, but we must also be able to relate each mention to its relevant contexts, most crucially to object groupings. Sometimes a mention pertains simply to a single object, but we have already seen that mentions of objects often come in the context of sets and groups. In the example above, we have parts ("Leg with hoof ") forming a single referenced object, but understood in relation to the whole (the "deer"). These contexts are essential to understanding the individual objects.

25
To return to our rst example, the whole mention-everything that is (misleadingly) identied as the "Item 22"-comprises in fact a group of objects that includes the cravat, shass or girdle, and gaiters, as well as the chains of the redeemed merchant. The rst three can also be properly considered a set. Collectively, this set, together with the chains, constitutes a group of objects. The set of objects belongs together by design or by some essential relationship. The group of objects belongs together by a looser association based on some common element-in this case, their connection with a common event-but also by virtue of Bargrave's document structure, which identies the group as constituting an item in his catalogue. The semantic distinction is obvious enough here, but it can be complicated in other instances where one has, for example, a "box of shells." Adding further complication, there is a second mention of the set of objects ("cravat, etc.") along with a vague reference to another group of objects: "divers other things." Thus we have: <seg type="objectGroup"> <seg type="objectSet"> <seg type="object">a<rs type="object">cravat</rs></seg>, <seg type="object">a<rs type="object">shass or girdle</rs></seg>, and <seg type="object">a small <rs type="object">pair of gaiters</rs></seg>… </seg> <seg type="objectGroup"> <rs type="objectSet"> <seg type="set"><rs type="object">cravat</rs></seg>, &amp;c.
</rs> <rs type="objectGroup">divers other things</rs> </seg> <rs type="object">chains of the redeemed</rs> </seg> Here the reference strings are limited to the smallest unit of reference, analogous to <name>, but the vague group ("divers other things") and chains require a longer string of text. The larger context -the entire context of a mention of either a group (as in this case), or a set or individual object -is delimited as a <seg>. Another option for indicating relationships between objects is, again, to use some form of stand-o solution, either in the header of the TEI document, or in RDF or in a database, which would enable explicit identication of relationships, rather than relying on parsing of containing structures to articulate relationships.

26
Once objects are identied with handles, one might select only the mentions of individual objects and present them in a browsable list: cravat; shass or girdle; pair of gaiters; chains of the redeemed.
Or one might elect to see the individual objects in their set and group contexts, such as the cravat in the context of its set: "cravat, shass or girdle, pair of gaiters." Here again questions arise regarding how to limit the identication of an object: • In considering the "pair of gaiters," we sidestep the issue of what constitutes a discrete object. Strictly speaking, there are two objects mentioned as one. In one sense the pair is considered a whole, but in an important sense they are two: it is impossible to mark them as two using an <rs> element. A compromise solution would be to add an indication of quantity, but the @quantity attribute is not allowed here. The only available solution is <num value="2">pair</num>.

•
There are other objects here, but I elect not to identify them because they are not collected objects. These objects are: butcher's shops (mentioned twice, the second time in more general terms as "shops"); porcupines, both as animals and as a type of meat; money (10,000 pounds sterling); and Bargrave's grave. If one were to identify every mention of an object in these sorts of documents, they would be considerably more plentiful than either names of people or names of places. It would not be practical, nor to the purpose of identifying and selecting mentions of collected objects within and outside of their contexts. For the purposes of my project, these other elements are part of the context and need not be identied as objects in their own right.
• Finally, should the "&c." following "cravat" somehow be marked and identied as constituting an object, a group of objects, a subset, or should it not be marked at all? In this instance we have a collective mention of the two objects as a group or class-"two large loadstones"-and then commentary on the "one armed with steel, in a black velvet case" before turning to "another triangular, unequilateral, bumped-up, large loadstone" on the other side of the cabinet, followed by commentary pertaining to it. But then Bargrave turns to talking about the two stones together and their "antipathy" for each other, and then how he used to have fun with young gentlemen by manipulating the "hidden qualities of these 2 stones." Marking out mentions of these individual objects and their grouping is not exactly straightforward. The whole paragraph could constitute a mention of a group, and "two large loadstones" can serve as a handle for this group. But then this corporate handle must serve as an implied referent for the rst mentioned loadstone: "one armed with steel …." The second yields a more complete handle ("another triangular, unequilateral, bumped-up, large loadstone"), although without the corporate referent, the pronoun "another" makes the reference incomplete: "another triangular, unequilateral, bumped-up, large loadstone." As dicult as it is to mark out the mentions of the individual objects in relation to the mention of the grouping of the two, it becomes even more dicult to sort out the rest of the paragraph, where the discussion of the two objects blends into a discussion of the two together.

28
Closely related to the set, but more theoretically and practically complicated, is the case of the composite object: an object that is one object comprised of two or more objects. This object is from a seventeenth-century inventory of John Paston's (1631-1693) collection at Oxnead Hall, Norfolk: A shell cup engraven, with ivory handles, with a tortois-shell foot and cover.
(Repton 1884, 150) In this case we have an ambiguous relationship between constituent objects, parts of an object, and materials. The principal object is the whole cup, but the main part of the cup is also an object in its own right-a shell-and not simply "material." By this analogy, the two tortoise shells (one for the foot, and presumably another for the cover) are also objects and not simply materials. Should the "ivory" then also be considered an "object" even though it is stated adjectivally? If the tortoise shell is an object in its own right, should the "foot" and "cover" also be considered objects, or should they be considered parts of an object?

29
In addition to sets and immediate groupings, documents also sometimes provide categorical groupings of objects. The easiest case is when an object is situated within a dened structure in a document, as is often the case with early modern catalogues of curiosities, such as Nehemiah Grew's catalogue of the Royal Society's Repository, which provides a hierarchical structure beginning at the top level with "Animals," "Plants," "Minerals," and "Articial Matters" (a fairly standard taxonomy for collections of the time): <div type="catalogue"> <div type="animals"> <div type="quadrupeds"> <div type="viviparous"> <div type="object"> as Grew's taxonomic structure for his catalogue. Again, for these complications, stand-o markup is probably the best solution.

Conclusion 30
As Øyvind Eide argues, "it is hard to understand [a modeling standard] at a deeper level without using it in practical work" (Eide 2014-15, para. 12). The preceding examples are as much about elaborating the challenges as providing solutions, deriving from my experience in building an archive of primary materials related to what was arguably the rst major cultural movement to pay close and critical attention to the world of material objects. In order to do things with this virtual museum, to be able to curate it in the way one might curate a material museum-that is, to package materials for public use-we must rst be able to select objects for manipulation. I conclude with some initial requirements to facilitate this core function of selection in order to treat object entities using the TEI with a curatorial sensibility:

1.
We must be able to determine and dene the limits of what constitutes a mention of an object. This is crucial so that we retain all information that is immediately relevant to that