The Code of Maya Kings and Queens: Encoding and Markup of Maya Hieroglyphic Writing

Maya hieroglyphic script (300 BCE–1500 CE) is a semi-deciphered logographic and syllabic autochthonous writing system from the Americas and is one of the most signicant writing traditions of the ancient world. Because of its incomplete state of decipherment, complexity and variation in graphematics, and partially lost lexicon, transliterations cannot be used within the encoding. The project Text Database and Dictionary of Classic Mayan approaches this challenge with an encoding strategy relying on stand-o markup, which is enriched with additional information sources. Using dierent formats (RDF, XML) and standards (CIDOC CRM, TEI P5), the inscriptions are encoded in a multilevel corpus: (1) a tei_all -compliant schema dening values and rules for the encoding of the text’s topological and structural features, (2) a “Sign Catalogue” for the classication of Maya hieroglyphs, and (3) the tool ALMAH (Annotator for the Linguistic analysis of MAya Hieroglyphs) for linguistic analyses. In this paper, we focus on the TEI schema and highlight our strategy for encoding hieroglyphs without using linguistic transliterations and transcriptions.


Text Database and Dictionary of Classic Mayan 1
In 2014, the project Text Database and Dictionary of Classic Mayan was established at the University of Bonn by the North Rhine-Westphalian Academy of Sciences, Humanities and Arts and the Union of the German Academies of Sciences and Humanities to research the written language of the pre-Columbian Maya. The project intends to use digital methods and technologies to compile the epigraphic contents and object histories of all known hieroglyphic texts. The aim is to create a dictionary of Classic Mayan, a language whose script has not yet been completely deciphered.
For this purpose, we will compile a machine-readable corpus of all known Maya texts, which are written on about ten thousand text carriers and four codices made of bark paper. To create a holistic environment that provides a solid information base for the dictionary, we have also developed additional resources and tools: a documentation of the text carriers, a classication The signs were combined with each other to build roughly quadratic blocks. One such hieroglyphic block usually corresponds to the concept of a word among the pre-Hispanic Maya. In most cases, these blocks were paired in double columns that were read from left to right and from top to bottom. Sentences were formed by the combination of hieroglyphic blocks indicating various parts of speech. Multiple sentences were combined to produce complex texts, whose syntax and textual structure are comparable to those of modern Mayan languages. The individual elements in the hieroglyphic block are traditionally subdivided into main and small graphs, with the main graphs being spatially larger and approximately square in shape, and the small elements attached to the periphery of the main characters oriented along the axis of the main graph. Within the block, the individual graphs were not only arranged side by side or on top of each other. They could also be conated into one single outline, similar to a ligature. In addition, two or more graphs could also be partially or completely overlapped or inserted into each other (gure 3). Prager. 6 The altered shape of the individual graph within the block itself did not inuence the spelling and meaning of the hieroglyph. The morphology of the graphs and their arrangement in a block is one of the main challenges for epigraphy in those cases where either all or individual signs are not deciphered or only tentatively deciphered and thus elude linguistic verication. The documentation of the original spelling-the actual graph variation and the intrablock arrangement using TEI XML-is therefore essential for the epigraphic work with hieroglyphic writing systems, since a simple and linear transliteration of a text no longer shows the original spelling or the placement of the glyphs within the block (Prager and Gronemeyer 2018, 145). 7 While these above-mentioned graphemic and graphotactic strategies only concern the graphic realization of words in written Mayan, the principle of underrepresentation of specic word endings also has an impact on the pronunciation of hieroglyphs in modern epigraphy. The omission of signs further enabled the scribes to vary individual words and texts (Zender 1999, 130-142). This scribal practice also has a high impact on the lexicography of Classic Mayan. By means of the above-mentioned graphemic and graphotactic strategies, Maya scribes were able to create a broad variety of texts with the avoidance of graphic repetitions. They were skilled calligraphers, sought a maximum of visual splendor and designed texts and pictorial works as individual pieces of art. On the other hand textual and pictorial contents are rather formulaic.

Encoding Allography and Graphotactics 8
Regular transliterations in TEI encoding would not be able to represent the graphemic variability of Maya writing even if all signs had been fully deciphered. Instead, we encode a proper description of each hieroglyphic block: the actual graphic variant and the spatial relation of each graph to another. A digital Sign Catalogue modeled in RDF provides the backbone of the TEI encoding. Each sign is indexed using a unique number, each type of graphic variation using a two-letter sux (Prager and Gronemeyer 2018). The URI of each graph is referenced in the TEI. Furthermore, such digital approaches to epigraphy can help overcome some shortcomings of predigital epigraphy, such as dealing with ambiguity. 9 The encoding in TEI can be divided into a formal, descriptive part (the text layout and design) and a textual part (the content and the glyphic spellings). In addition to the TEI encoding itself, we needed to create an environment that supports the entire workow through which dierent materials such as texts, photographs, drawings, artifact information, historical facts, and established research results are made available. In this virtual environment, a TEI-encoded corpus is enriched by other information resources which are themselves heavily annotated with metadata. The following sections present the key features of the encoding and enrichment strategy employed. Photo courtesy Ancient Americas at LACMA (http://ancientamericas.org/collection/aa010566).

10
A Maya text consists of three units: elds, blocks, and glyphs (gure 4). Text elds are encoded using the element <tei:div> (text division) with a dened @type ("textfield"). Strictly speaking, the @typeattributes would not be necessary, but they facilitate the human interpretation of generic TEI elements as representations of specic linguistic units. The same principle is used for encoding glyph blocks, for which the element <tei:ab> (anonymous block) is used. For encoding single glyphs, the element <tei:g> (character or glyph) is used with the attributes @refand @n The attribute @n is used to indicate the catalogue number of the sign with its graph variant according to the classication used by the project's Sign Catalogue. Instead of pointing to a <tei:glyph>, @ref points to the URI in the Sign Catalogue where the corresponding glyph is described. This mechanism is explained in more detail below. Americas at LACMA (http://ancientamericas.org/collection/aa010566).

11
The textual elements are being brought into a logical structure by nesting the elements (gure 5): <tei:div> for the text eld is the parent element to the <tei:ab> elements of all the glyph blocks contained in that text eld, in this example from A1 to D4, and the glyph block contains all glyphs belonging to that block (D1G1 to D1G3). Every element is provided with an @xml:id attribute for referencing purposes. In order to identify glyphs that form a group within a block, a fourth element, <tei:seg> (arbitrary segment), is used. This is needed to represent the positioning of the glyphs (see section 5).

12
Note how the usage of TEI elements diers slightly from approaches with similar aims, such as Comic Book Markup Language (CBML). CBML is designed for the encoding of a "class of documents that tightly integrate pictorial images and text. Comic books are just one such type of complex graphic document; other examples include illuminated manuscripts; seventeenth-century alchemical manuscripts, with hand-drawn gures and graphic symbols; artists' books; artists' sketchbooks; illustrated poems like those of William Blake; letterpress productions like those of the Kelmscott Press; illustrated children's books; newspaper and magazine advertisements; and even Web pages and other born-digital media" (Walsh 2012). One can easily imagine that Maya texts t within this scope as well. The main encoding dierence is the introduction of new specic elements under the CBML namespace, most notably <cbml:panel>.
According to Walsh (2012), "<cbml:panel> is a modication of TEI's <div> element, which represents a generic subdivision of the text in the TEI model." The <cbml:panel> element and other custom elements in the CBML namespace are used alongside generic TEI elements such as <tei:div> and <tei:sound>; together they constitute the markup of comic book pages or entire comic book issues. In contrast, all of the XML elements used in the Maya encoding scheme presented here are taken from the TEI namespace. No custom elements or attributes are introduced. Instead, information specic to Maya texts is encoded in the values of attributes such as @type (see above) or @rend (see below). While these attribute values are controlled by project-specic taxonomies, 1 this approach has the disadvantage that this information is not easily made machine-readable and interchangeable. The advantage, on the other hand, is that the markup as a whole validates to tei_all and may thus be processed and interpreted by generic tools. As explained above, the arrangement of glyphs within a block does not indicate a reading order.

Glyph Positioning in Glyph Block
When encoding the glyphs in the TEI document, only their arrangement is made explicit. The correct reading order is established later by the linguistic analysis. For this reason, it would be misleading to describe the relative positions of glyphs using the @next and @prev attributes. This is especially true when keeping in mind that Maya writing is not yet fully deciphered and there may be multiple possible reading orders. The advantage of keeping the encoding of arrangement and reading separate is that in the process of analyzing a text, one can still be improved without having to change the other. Therefore, instead of @next and @prev the @rend attribute is used with predened values. This attribute, combined with the @corresp attribute, can indicate a glyph's position in relation to neighboring glyphs, logically creating a statement like "The current glyph is arranged in a specic manner (@rend with respect to another glyph (@corresp." The order in which the XML elements are encoded in the document, such as which <tei:g> element appears rst inside <tei:seg>, is purely arbitrary and is not meant to imply any reading or writing order. 14 As the example in gure 6 shows, the <tei:seg> element is used to describe the positions more accurately. The rst glyph, D1G1, is positioned on the left-hand side of ("left_beside") the corresponding segment, D1S1, which contains two glyphs that are stacked over each other, their positions indicated by "above" and "beneath". . Predefined values for @rend describing glyph positions in Maya writing. In transliteration practice, operators like "., " ":, " and "+" are used to describe positions. 2 In contrast to that, using TEI encoding is much more precise because each glyph is described on its own. Drawings by Christian Prager. 15 There are other positioning options, of which we shall mention only a few here: for instance, glyphs can be "conated," that is, merged so that they share a mutual outline. As in the example on the right of the upper right side in gure 7, the hand-shaped glyph is used as an outline for the head-variant glyph. Glyphs can also be "inxed" by being completely enclosed by another glyph. In the example on the left of the upper right side in gure 7, the hand-shaped glyph is enclosed by the head-variant glyph.

16
In a way, these scribal phenomena are reminiscent of medieval European handwriting which, when encoded in TEI, requires special attention to be paid to special characters, abbreviation marks, punctuation, diacritical marks, and scribal sigla. Some of these medieval scribal phenomena may also be encoded using custom @rend values (Fredell, Borchers, and Ilgen 2013 Our approach can deal with a variety of dierent arrangements and therefore overcomes the insuciency of IDS for describing Maya glyph block arrangement. As mentioned above, the correct reading order of a glyph block is established by linguistic analysis, but we can also identify a reading direction on the level of a text eld. To describe the writing mode, "vendor-specic" CSS denitions are used within the @style attribute in the <tei:div> element. In the example pictured, text eld 1 is written left-to-right, text eld 2 right-to-left, and text eld 6 top-to-bottom.

19
At rst glance, these @style values seem similar to some of the standard values of the CSS writing-mode property, namely horizontal-tb, vertical-rl, and maybe also vertical-lr (W3C 2019). However, on closer inspection, it turns out that their semantics are slightly dierent.
Consider, for instance, text eld 6 from the example above. Whereas in reading glyphs in a horizontal band, the order would be from left to right (texts spelled from right to left are rare), Then the next two columns are read in the same order, and so on. Thus, to dene any reading direction for sequences of vertical lines would be misleading here. Furthermore, one would have to additionally specify the orientation of the glyphs: without the "text-orientation" property, CSS assumes that "[t]ypographic character units from vertical scripts are typeset with their intrinsic orientation" (W3C 2019). "Vertical scripts" include Chinese, Japanese, and Korean. For Maya hieroglyphic writing, however, there is no universally accepted specication that denes whether or not it counts among those vertical scripts, which makes it necessary to explicitly indicate the character orientation as text-orientation:upright. Otherwise one might think that the glyphs would be rotated 90° clockwise (as are characters from so-called "horizontalonly" scripts, such as Latin, when typeset as "vertical-rl"). Therefore, in place of cumbersome and ultimately inappropriate standard CSS expressions such as writing-mode:vertical-lr; text-orientation:upright, the vendor-specic property -idiom-writing-mode:tb is a more concise way of expressing the same rendering while at the same time being a more meaningful classication of a Mayan writing mode. When arranging text on a carrier, Maya scribes were very creative. Figure 10 shows Stela J from the archaeological site of Copán, Honduras. This stela has a very elaborate design: the sections of the text are braided like ribbons. The text arrangement is described within the <tei:sourceDoc> element using the elements <tei:surface> and <tei:zone>. With the @rotate attribute, we can indicate the relative orientation of <tei:zone>. This one is rotated 45° clockwise from its upright orientation. To relate the <tei:zone> elements to the logical text structure described in the <tei:body>, we use @xml:id attributes.

22
In practice, we use the virtual research environment TextGrid Lab (https://textgrid.de/ www.textgrid.de) to create these data. TextGrid Lab comes with a tool called Text Image Link Editor that enables the user to mark areas on a picture and stores that positioning information in a separate XML le-the TEI/SVG hybrid document pictured above. This allows us not only to <tei:sourceDoc>, but also to enrich it with actual positions on the image. This means that it is possible to indicate the appearance of every single glyph block or glyph.

23
By using stand-o markup, 4 we create a richly and deeply annotated text; by using the Text Image Link Editor, we refer to the appearance of the source text. By referencing the glyph in the Sign Catalogue, we also refer to the graph variants that are used.
7. TEI… and Beyond: Sign Catalogue Figure 12. Hieroglyphs in the TEI-encoded corpus referring to the Sign Catalogue, where the glyph is assigned to a transliteration value that is used for linguistic analysis.

24
Within the TEI document we encode the glyphs with the referring URI of the <idiom:Graph> recorded in our Sign Catalogue using a @ref attribute. The term graph 5 denotes an abstract, typed form of an individually realized sign. The graph of /ja/ in the variation "bipartite right" (br) recorded in the catalog represents a type which prototypically represents all individual writing variants and thus all actual occurrences for /ja/ (gure 12).
The Sign Catalogue is modeled as an ontology and stores information about the classication of Maya signs and graphs in RDF ). 6 The class <idiom:Graph> describes characteristics of the Maya glyphs belonging to the realm of graphematics, like the type of variation a "graph" can have. Every <idiom:Graph> is connected to an <idiom:Sign>. One <idiom:Sign> can have more than one <idiom:Graph>, but an <idiom:Graph> is only ever connected to one <idiom:Sign>. As Maya signs can have both a "logographic reading" and a "syllabic reading," we have to distinguish those rst, before nally assigning a transliteration value to the <idiom:Sign>. As the script of Classic Mayan is not yet fully deciphered, there are multiple hypotheses for the readings of various signs. Therefore a sign can be assigned multiple transliteration values. The Sign Catalogue also provides a mechanism for dealing with contradictory reading proposals (Diehr, Gronemeyer, Wagner, et al. 2019;Diehr, Gronemeyer, Prager, et al. 2019).

26
The TEI corpus and the Sign Catalogue provide information about the text as a linguistic and epigraphic subject. To bring that information together, the TEI document and the Sign Catalogue need to be processed further. To generate a transliterated text that can be analyzed, we add another component to the virtual research environment of the project: ALMAH, a tool for linguistic analysis of Classic Mayan, which has also been developed in course of the project (Grube et al. 2018, 5-7).
ALMAH will provide mechanisms for multivariant text analysis, based on the TEI corpus and the linguistic information stored in the Sign Catalogue. The text structure will be extracted from the TEI document, and the (multiple) transliteration values are to be assigned from the Sign Catalogue.
The assembled text will inherit linguistic, epigraphic, and structural text information. By using referencing mechanisms and stand-o markup strategies, the text will always be traceable to the original writing. 7 8. TEI… and Beyond: Artifact Documentation Figure 13. The text carrier referenced in the TEI document is further described by an ontology that provides a rich schema for an extensive documentation of the artifact. Drawing by Christian Prager. 27 We provide a rich documentation of the text carrier. In the <tei:teiHeader>, we refer to the artifact that is documented by using the Artefact Ontology of the project (gure 13). The ontology describes not only text carrier characteristics such as material and technique, but also the nding context of the object and information regarding its scholarly documentation (e.g., measurements, designations). 8 This Artefact Ontology consists of components from pre-existing vocabularies such as CIDOC CRM and Dublin Core Terms, as well as custom components in the idiom namespace. The information described by the Artefact Ontology is stored as RDF data. See section 10, "The Linked Corpus" below for details about the dierent components of our virtual environment.

28
It should be noted that this approach deliberately diers from ways of encoding text carrier information directly in TEI, as, for instance, discussed by Nelson (2017). The encoding of artifactrelated data in TEI has been further facilitated by the recent introduction of <tei:object> and related elements in the TEI P5 Guidelines (see TEI Consortium 2020, 13.3.5 alongside some metadata (e.g., editor of the document, information on the project) into a TEI document. The description of graphotactics and the relation of graphs to each other is realized by project-specic operators (see gure 7).

30
The Classic Mayan TEI Generator is specically not designed as a fully functioning GUI for editing purposes. Once the parser has created a TEI document, it cannot be used to edit the document any further. In particular, encodings like editorial notes on lost and/or reconstructed glyphs are still made within the TEI code itself. While these cannot be generated from a transliteration, we believe that through working within the TEI XML structure, the editors get a better understanding of what and how they encode. Combining automatic and manual procedures ensures a high quality of encoding.

31
The parser functions as a helper for providing the editors with a basic document structure, already containing all identiers, and a text structure. Even with its reduced functionality, the parser is a welcome help for the editors since there are about ten thousand texts to encode. Figure 14. The virtual research environment of the project uses multiple information sources. With the TEI document as the central part, a kind of linked corpus is assembled.

32
In this paper we have laid out our strategy for encoding the hieroglyphic texts of Classic Mayan.
The complexity of the writing system presents researchers with manifold obstacles and challenges.
By using stand-o markup and other mechanisms to link data, a corpus is established that joins dierent information resources together in a virtual environment (gure 14). The TEI document forms the central part of this linked structure: the TEI encoding is used to describe the formal structure of the text, its appearance, and its layout. The encoding is enriched by multiple other resources to support specic functions and workows for annotation, documentation, and analysis: a Sign Catalogue for the classication of signs and graphs as well as their variants; an ontology for the documentation of the text carriers; the tool ALMAH for linguistic annotation and analysis; a project bibliography (using Zotero); and documentation and organization of archival material (which is managed by the DARIAH-DE service ConedaKOR). Bringing all those information sources together provides a holistic research environment for analyzing and deciphering the script of Classic Mayan.

33
The data thus generated will be successively made available on the future project portal https:// www.classicmayan.org/. Furthermore, the corpus data will also be made freely accessible in the TextGrid repository. All schemata created in the project (ODD, OWL) can be viewed in the public area of our Git repository and can be used under a CC BY 4.0 license: https://projects.gwdg.de/ projects/documentations/repository.
Ontology-based, linked open data in a virtual research environment will for the rst time provide a single point of reference in Maya studies. A corpus-based dictionary not only facilitates the research on pending questions of historical linguistics and enables a better adjustment of theories with epigraphic data. As the meaning of a word is aected by its context, the database can also support eorts to identify metaphors or stylistic devices, or to reconstruct a vocabulary that was culturally lost. The encoding as applied by our project can also be modied to describe the graphotactics of other nonlinear scripts or for other (partially) undeciphered scripts, such as the rongorongo script of Rapa Nui (Easter Island). It can also be used to represent (undeciphered) scripts for which no Unicode block has yet been dened or approved.