Using ODD for HTML

Although the ODD (One Document Does it all) language is normally used to create TEI customizations or extensions, it is also a highly effective tool for editors working in other XML markup languages. This paper will discuss the use of ODD to define a highly constrained schema for HTML5 that will enforce stylistic rules and encoding practices, define custom attributes and value lists, and enable easier editing and validation of project content in the Oxygen XML Editor environment. I will provide a brief history of the project, whose first incarnation, created with the Dreamweaver HTML editor, was somewhat chaotically coded, and show how the implementation of an ODD-based schema provides huge advantages for authors, editors, and encoders, as well as substantially simplifying the code itself.


Introduction 1
Although ODD (One Document Does it all) is a feature of the TEI language, and used primarily for creating TEI schemas, ODD in fact "goes beyond this to provide a generic tool for the documentation and management of any XML encoding scheme, not necessarily one based on the TEI" (Burnard and Rahtz 2014). Syd Bauman (2019) points out that the TEI ODD language "can be used for two related but distinctly dierent purposes: (1) to create a markup language, including documentation and schemas; and (2) to customize a markup language that was already written in ODD." This paper describes a use case which does not quite fall into either of those categories: the use of ODD to create and document a highly constrained customization of a markup language not originally written in ODD. The language in this case is HTML5, in its XHTML serialization. 1 2 In 2017, our unit was approached by Kim Blank, a faculty member who had for some years been building a fascinating website called Mapping Keats's Progress. 2 The website is in one aspect a biography of the poet John Keats, but it has many other features. Blank describes its purposes as follows: • To map some of Keats's life in London

•
To account for Keats's remarkable poetic development • To re-imagine the book and explains the third point in these terms: The site's structure of progressive reduplication (between multiple, overlapping micro-chapters) acknowledges and attempts to embrace the fact that the dominant means to access information-via the technology you have in front of you right now-changes the way we nd, look at, and engage such information. (Blank 2018) rarely consumed serially, but rather sampled somewhat at random, so each page must be to some extent self-contained. Balancing this requirement against the need to cater also to the more assiduous reader who may read many articles in sequence is a dicult feat of style and authorship.

4
The site had been developed by the author and a collaborator using the Dreamweaver webauthoring software. The researcher initially asked for help with a single problem: the fact that the site banner was slightly dierent in appearance on some pages than on others, and the eect of navigating through the site was slightly jarring as the banner shifted around from page to page.
Examination of the code quickly revealed that the code for the banner was actually dierent on almost every page on the site. A deeper investigation determined that this was merely the tip of a vast iceberg of source-code chaos. Although its interface and arrangement were functional and attractive (see gure 1), the HTML code had become a huge mass of incomprehensible nested structures, including twenty-four JavaScript and sixty-nine CSS les. A rather haphazard approach to development had resulted in a tendency to add new features as they occurred to either collaborator, usually by taking example code or pre-built JavaScript and CSS libraries from the Web and dropping them into the project. An example of the kind of needless complexity that had resulted from the dependence on a WYSIWYG tool to manage style and layout can be seen in gure 2. Most developers will have encountered projects like this frequently, and will be familiar with the bleak anguish that aicts a coder who inherits such a codebase.

5
In the fall of 2017, I began the process of rewriting it, with the aim of keeping it as simple as possible while reproducing and enhancing the design and functionality. The result has only one CSS le and a few hundred lines of JavaScript, none of which is essential.  The instinctive response of a seasoned TEI encoder to content like this is to get it into TEI as soon as possible, and then build a rendering toolchain to create a fresh website. However, while the original HTML was severely out of control, the content itself was already basically complete, and encoded in HTML. Project participants were quite comfortable with HTML and preferred it as their master format for both editing and nal result. They were not interested in developing a system to export the same content in dierent formats, so for their needs, moving the master les to TEI would be a gratuitous impediment to their work.

7
I was able to use a toolchain consisting of HTML Tidy, 3 XSLT, and Python to clean up and simplify the content to a point where it required only some proong and enhancement. Since the project author was already familiar with HTML, but not with TEI, and the markup itself was relatively simple, it seemed easier to stick with HTML5 encoding. Given a suciently rigorous schema for that encoding, it would be trivial to generate TEI from the markup if we wanted it in the future.
And I was intrigued with the idea of using ODD for a non-TEI language, something that is rarely done, and that would provide an opportunity for testing the ODD processing toolchain in ways that it is not normally tested.

Why Use ODD?
8 The W3C provides an excellent validation tool for HTML5 in the form of the Nu Html Checker. 4 This is the tool we use for nal validation of all HTML sites we produce. However, it is a generic tool; it checks conformance against the entire schema (in fact, against any of multiple schemas, depending on the input document type). I wanted to constrain the HTML quite aggressively, provide closed value lists for normally open attributes such as @class, dene custom attributes (as HTML5 allows) with closed value lists, and incorporate Schematron rules, to ensure that the site style and structure remain consistent throughout the document set. Good praxis also requires documentation of the rules, along with encoding guidelines and examples. ODD is the perfect choice for this (Bauman 2019;Romary and Riondet 2018). ODD les, being TEI les, are also easily processable, a very useful feature whose value will become apparent below.
In designing the structure of the document collection, the rst decision I made was prompted by the problem which had given rise to this work in the rst place. Editing a page in the original Dreamweaver setup had involved editing an entire page, including its banner, footer, and so on, and this had given rise to a sort of speciation whereby originally identical blocks of boilerplate code had gradually diverged, resulting in dierent versions of core site components. To avoid anything like this, it made sense to specify that content documents only have content, which is easy to achieve using the @start attribute on <schemaSpec> (example 1). Analysis of the site content revealed that its features could actually be encoded using fewer than 20% of the elements available in HTML5, and only around 10% of the attributes (see gure 3). To further support easy and ecient authoring and encoding, I created an Oxygen project which provides template les for creating new content documents, Author Mode CSS for rapid proong of authored content, and a quick-and-dirty build process which creates a complete HTML le from the current content document in the form in which it will eventually appear on the site. This enables encoders to see their work rendered in two dierent styles and get a clear sense of what it will look like when it is published, without their having to deal with editing complete HTML les.

13
HTML5 also allows the use of custom attributes. These are attributes whose names are prexed with data-, and which are ignored by HTML5 validators. I was able to make use of this feature to provide a simple method of encoding a common scenario in the site, where a small graphic appearing in the text is paired with a larger version of the same graphic (typically not just a higherresolution graphic but one which actually includes more content) by specifying a custom attribute for the <img> element in the ODD le. 5 Example 3. Defining a custom attribute for a larger variant of an image on the <img> element. The TEI attribute class structure was used to create generic attribute classes such as att.classable, to which all HTML elements which need the @class attribute belong; then, at the element level, the denition of the @class attribute was overridden to constrain it to a xed value list appropriate for that element: Example 5. The class attribute, inherited from att.class, is overridden for the <img> element to allow only two values. This approach provides a considerable advantage over regular HTML5 editing and validation with the W3C tools, because the latter would allow any value in the required form for @class; using ODD allows us to constrain the values at dierent points in the schema.

17
A further advantage of ODD is that it allows us to include Schematron rules. A simple example of how eective this can be is shown by a constraint applied to the HTML5 <span> element. <span> is a generic inline element with no inherent semantic force at all. The only reason for using it is to apply some specic styling to a piece of inline text. Therefore we can formulate the following rule: Example 6. Defining a Schematron constraint to control how <span> is used.

19
Following the model of many TEI projects, we also separated out components resembling authority records such as the list of biographies, as well as index entries and bibliography lists, into separate les, all rooted on the <div> element and constrained by the same schema.
4. Building Output from the ODD 20 I created a build process, using Ant and Saxon, to turn our ODD le into the various products we need to create from it, as shown in gure 4. The ve phases of this process are:

1.
Refreshing the ODD content. This is a process we commonly use in our TEI projects, 6 where specic attributes are intended to be used only as pointers to particular items elsewhere in the project. One example involves the tagging of people in the text. A le people.xml contains a list of brief biographies, each of which is a <li> element with a unique ID: <li id="keats_t_sen"> [...] </li> When a name is tagged in the text, it needs to be linked to one of these IDs, using the custom @data-id attribute. In order to ensure that links are only made to real IDs in people.xml, the "Refresh" part of the build process collects all those IDs, then uses them to [re]construct the <valList> for the @data-id attribute, so that it is impossible to link to a nonexistent ID. The encoder in Oxygen is also helpfully prompted when linking a name:

5.
Compiling the Schematron. Although Schematron rules inside the RELAX NG le are enforced within the Oxygen editing environment, the automated site-build process needs to do a complete validation of all the les, and this is most eectively done by compiling the extracted Schematron into an XSLT le, again using XSLT from the Schematron project; the site-build process can then do automatic validation of the content documents before building the site.

Building the Site 22
Although the focus of this article is on the utility of ODD as a tool for managing encoding projects not based on TEI, it is important to include the nal stage of the process, which builds the nal HTML website from the fragmentary content documents created by the author and encoder. These are the stages in building the site, in a process which is managed by Ant and based largely on XSLT transformations: 1.
Validate the content documents against the RELAX NG schema, using the Jing validator.
Stop the build if anything is invalid.

2.
Validate the content documents against the Schematron rules as extracted into XSLT, using can run the site-build process and create a new version if they have the credentials to push it to the web server, but the build process cannot be completed unless both the edited content and the generated HTML and CSS are valid.

Conclusion 24
The use of ODD provides a range of substantial advantages when using a non-TEI XML language, even when other validators exist for that language: • A substantially reduced schema can be created, excluding elements from the larger schema which are not required for the project.
• Components such as attributes which are only loosely constrained in the main language can be aggressively constrained through the ODD le.
• Content documents can be rooted on elements other than the standard root element, so they can be simpler than full documents and contain only what is relevant.
• Intra-project linking can be constrained by the schema through processing of the site content itself to generate some of the schema specication components.
• Editors can be supported in their use of an XML editor such as Oxygen through the embedding of information from the <schemaSpec> into the output schema, providing popup prompts to help the encoder.
• Detailed project documentation can be integrated with the schema specication and built into a comprehensive guidelines document.

25
These advantages have paid o for the Mapping Keats's Progress project. The site has now been live for over a year at the time of writing, and the author is steadily adding to it and rening the existing content. This experience shows that, even where TEI is not used on a project, ODD may still provide a highly eective tool for managing schemas, encoding, and documentation.