Toward a Model for Marking up Non-SI Units and Measurements

This paper presents a markup model for encoding non-SI units and measurements. Historical texts contain many examples of compound measurements, composed of sets of units and numerical components. Instead of using the element, which requires a single set of @unit and @quantity, we propose a newly defined set of tags for encoding idiosyncratic measurement semantics, namely (model.encodingDescPart), (model.global and contained by ), (model.measureLike), and a relevant attribute @factor (which shows factors of numerical values given in a referenced element). All of these elements and attributes will be included in the TEI P5 Guidelines, and they are especially useful when encoding units that are not based on the decimal system. Though this paper offers example encodings based on a Japanese historical source, the Engi-Shiki, this model is also applicable to the markup of units used for measurement within various cultural spheres other than Japan.

. Sample description of a compound measurement for copper in the Engi-Shiki.
National Museum of Japanese History. 2 In order to encode such compound units, we propose three new elements and one new attribute: <unitDecl> (model.encodingDescPart), <unitDef> (model.global and contained by <unitDecl>), <unity> (model.measureLike), and a relevant attribute @factor (which shows factors of numerical values given in a referenced <unit> element), so that we can clarify the relations among the units in the <teiHeader>. All of these will be included in the TEI P5 Guidelines in due course, and the ODD File for them will be described in detail later in this paper. Although this paper provides a markup model based on a Japanese historical source, this model also enables us to encode idiosyncratic systems of measurement within various cultural spheres other than Japan. This set of tags is useful for encoding units that are not based on the decimal system or whose unit conversion system changed over time. For instance, conversion relations among the English coal measures keel, chalder, and bushel had changed between the fteenth century and the implementation of the Imperial Weights and Measures Act of 1824 (gure 2). Since texts related to measurement indicate such regional and chronological dierences, it is necessary to mark up the original historical gures as precisely as possible. The new markup model enables us to express the complex and changing semantics of the units and measurements.

Our Project 5
In this section, we describe the outline of our project-marking up the texts of the Engi-Shiki-and compare it with other related projects and previous studies.

6
Delmer Brown, an American scholar of the Engi-Shiki, describes it as "a 50-volume work compiled between … 907 and 927 [CE]. The rst 10 volumes are Imperial Shinto regulations (jingi 神祇) and the last 40 are codications of criminal (ritsu 律) and administrative (ryō 令) law" (UC Berkeley 2010). According to one of the most inuential researchers on ancient Japanese history, Toshiya Torao, the Engi-Shiki contains a wide range of detailed regulations on society and administration during the period from the Nara to the Heian era (eighth to thirteenth centuries). For instance, the text lists rituals and festivals held in various parts of Japan, the designation of oerings made at rituals, tributes and taxes paid to the Ritsuryō government, and allocation of Shōzei and Kugai (rice plants used as a kind of fund which were distributed to each administrative county). To put it simply, by consulting the Engi-Shiki, we can grasp various aspects of both state administration and daily life in ancient Japan (Torao 1995).

7
But there is a historiographical problem. Because so many elements of interest are described in such rich detail, previous research based on the Engi-Shiki has tended to be subdivided into very specialized and partial studies. For example, Sakamoto (1979Sakamoto ( -1980 comprehensively lists the products of each administrative county; Fukushima (1971) and Ōsumi (1996) focus on the aquatic products; Satō (2012) examines the dairy products; papers and paper manufacturing are the focal point of Ōkawa and Masuda (1981);and Miyahara (2014) traces the records related to abalone, which was the most circulated food at that time. This subdivision and specialization makes it dicult to verify quantitative analyses, because there is no text database in which researchers could search for numerical values attached to specic products and items.

Markup Examples Based on TEI P5
3. In this way, relations among non-SI units of measurement might be described in a human-readable way, within the <catDesc> element. However, it is dicult to express the unit conversion system among the measurements in a machine-readable way, especially when those units are not based on the decimal system.

Inline Markup 14
Instead of dening the unit conversion system in the <teiHeader>, we could encode the compound measurements in the form of inline markup, as Cummings and Wilcox (2013, sec. 7) show in this example of encoding a value in old English money: 4 Example 3. Old English money, as encoded by Cummings and Wilcox (2013).
<seg type="fee" rend="roman-numerals aligned-right"> <num type="totalPence" value="1240"> <!--orig: vli iijs iiijd --> <num type="poundsAsPence" value="1200">v<hi rend="superscript">li</hi></num> <num type="shillingsAsPence" value="36">iij<hi rend="superscript">s</hi></num> <num type="pence" value="4">iiij<hi rend="superscript">d</hi></num> One of the important points of their markup is the use of nested <num> elements with @type attributes, indicating "vli iijs iiijd" as a compound unit, not as separated units. Although this example is inspiring for our project, we would prefer not to store converted numerical values in @value attributes; we would rather keep the original data when it comes to inline markup. The reason for this is that the unit conversion system has altered over time, as in the case of the British coal measures shown in gure 2. In order to structure, in the <teiHeader>, a wide variety of measurement semantics within each of the historical documents, we propose a new set of elements and attributes. To take an example from the Engi-Shiki, the following is a denition of ⽄, 両, 分, and 銖, all of which are units for measuring weight. For the sake of readability, we omit the compulsory <fileDesc> element.

ODD File 18
We propose the following TEI ODD specications 6 for marking up units: Example 5. Proposed ODD specification for marking up units.

Markup Solutions 20
Finally, we present two markup examples of a passage in the Engi-Shiki. First we encode the passage based on the current TEI P5 Guidelines, using the <measure> element and the @unit attribute, and then we demonstrate an example using <unit> elements within the <body> texts. These two examples do not necessarily dier from each other except in encoding, but both refer to the @xml:id attribute dened in the <unitDef>.

21
The text shown below is a part of the section describing regulation of taxes imposed upon the two administrative counties, 備中 Bicchū and ⻑⾨ Nagato. In this case, the taxes were paid annually to the contemporary Ritsuryō government in copper 銅 and lead 鉛 mineral resources.

23
Or, to mark up each of the units separately, <unit> elements can be used in the inline markup, as follows: Example 8. Using <unit> element.

26
Among the components of our proposal, @factor and attributes from the att.datable class (e.g., @from, @to) within the <unitDef> element are the most eective for the purpose of describing the measurement semantics. As the example of the English coal measures keel, chalder, and bushel shows, the unit conversion formula is subject to change over time even within the same country.
Therefore, it would be a reasonable practice when encoding non-SI units and measurements to describe the unit conversion formula with @factor between variable units, capturing the dating context with @from and @to in the <unitDef>.

KIYONORI NAGASAKI
Kiyonori Nagasaki, PhD, is a senior fellow at the International Institute for Digital Humanities in Tokyo. His main research interest is in the development of digital frameworks for collaboration in Buddhist studies.
He also investigates the signicance of digital methodology in the humanities and in the promotion of DH activities in Japan. He has participated in various DH projects at several institutions in Japan and abroad. His activities also include postgraduate education in DH at the University of Tokyo, as well as administrative tasks at several scholarly societies including the Japanese Association of Indian and Buddhist Studies.

YUTA HASHIMOTO
Yuta Hashimoto, PhD, is an assistant professor at the National Museum of Japanese History, National Institutes for the Humanities. He is one of the co-directors of the largest crowdsourcing DH projects in Japan, Minna-de-Honkoku. He majored in the history of mathematics and mathematics education before entering the doctoral course and focusing his research on digital humanities. He has a bachelor's degree from the Faculty of Literature but has also worked as an IT engineer, hence his interest in computer programming.

A. CHARLES MULLER
A. Charles Muller, PhD, is a professor at the University of Tokyo, in the Graduate School of Humanities and Sociology and its Center for Evolving Humanities. He specializes in Korean Buddhism and East Asian Yogâcāra, and has published numerous books and articles on these topics. He is one of the earliest and most prolic developers of online research resources for the eld of Buddhist Studies, being the founder and managing editor of the online Digital Dictionary of Buddhism, the CJKV-English Dictionary (http://www.acmuller.net), the H-Buddhism Scholars Information Network, along with having digitized and published numerous reference works.

MASAHIRO SHIMODA
Masahiro Shimoda, PhD, is a professor in the Department of Indian Philosophy and Buddhist Studies, within the Graduate School of Humanities and Sociology, and also the director of the Digital Humanities Initiative, within the Center for Evolving Humanities, at the University of Tokyo. He specializes in the history of the formation of the Buddhist scriptures, which elucidates the production and the process of passing down the scriptures of traditional Indian Buddhism. As the head of the Center for Evolving Humanities, established in 2013, he teaches the "General Introduction for Digital Humanities" (http://dh.iii.u-tokyo.ac.jp/) as a core module of the center.