Quotes, Paraphrases, and Allusions: Text Reuse in Sanskrit Commentaries and How to Encode It

The Sanskrit literary genre of commentaries has several characteristics that are a challenge to the structural encoding of its texts. One particular tricky feature is the skillful and sophisticated reuse of text in such commentaries. This article examines practical examples of encoding these types of passages, drawing on the documents of SARIT 1 (Search and Retrieval of Indic Texts) and the author’s own encoding projects.

due to the author explicitly saying that the current passage was said, thought, or held by someone.
The addition of this apostrophe is a binary sign: either a passage is reused secondarily, or it is not.
It allows no further dierentiation. 13 The presence of the nal letter, e (for modus edendi), is supposed to reect the editor's estimate of the dierence between the reused text as it appears in a source and as it appears in the context of its reuse, insofar as one might reasonably assume that dierences are due to an intentional change by the author of the text that incorporates the passage from another. Several editors (including this writer) have found it hard to give good reasons for this kind of decision in their editions, and have avoided the use of this category altogether. 9 In any case, this siglum never was intended to characterize dierences that could be due to simple mistakes in copying manuscripts, but to mark intentional changes on the part of the author reusing the text. This siglum is again a binary operator, and as such allows only for a basic distinction without further characterization.
14 Even though this system of characterizations has been criticized in recent years because of a basic impossibility of being certain about which copy of a text an author quoting another text might have had, 10 it cannot be disputed that the distinctions in the conceptual space opened up by this analysis are signicant and useful ones for any research concerned with these kinds of texts: C and R mark reused material as either literally accurate or only paraphrased or alluded to; e, together with a source description, makes it clear where the quote is from; the apostrophe (') reects a basic observation about the passage's context in the text being edited (whether it occurs as an explicit quote or not). And the nal, optional e represents the editor's estimate of the quoted material's accuracy as a representation of the source. 15 In the rest of this article, these categories will be applied to some sample passages using the conceptual framework of the TEI Guidelines. The purpose is to illustrate light-weight markup rules for some general cases of textual reuse in the works of this genre that allow for easy querying of the resulting structured information. Before doing that, however, it will be useful to briey consider the current infrastructure that the version 4.1 of the Guidelines (TEI Consortium 2020) oers as to its general applicability to these phenomena.

Markup of quotations in the TEI P5 Guidelines version 4.1 16
The part of the TEI P5 Guidelines version 4.1 most relevant to our inquiry is section 3.3.3: Quotation (TEI Consortium 2020). It occurs within a longer section that "… deals with a variety of textual features, all of which have in common that they are frequently realized in conventional printing practice by the use of such features as underlining, italic fonts, or quotation marks, collectively referred to here as highlighting" (TEI Consortium 2020, 3.3: https://tei-c.org/Vault/ P5/4.1.0/doc/tei-p5-doc/en/html/CO.html#COHQHighlighting and Quotation). Many of the texts under discussion here show features that are particular to South Asian manuscript culture. So, the categories of "conventional printing practice" (primarily European and American practice, that is) cannot be naively applied to the encoding of these texts: the devices employed in the manuscripts, and also the texts, for emphasizing, distancing oneself from, or quoting words or longer phrases are often quite dierent from those encountered in printed editions. 11 Before looking at some of these conventions, it will be useful to summarize the main points in section 3.3.3. of the TEI Guidelines.
The elements presented there are as follows, grouped by their nearest model class (all quotes in the following list are from TEI Consortium 2020, 3.3.3): 12 • in the class of model.hiLike (and model.common) ⚬ <q>: the most generic marker for quotations; • in the class of model.attributable ⚬ <said>: for quotations of what someone says or thinks; ▪ with the boolean attributes @direct and/or @aloud ⚬ in the sub-class model.quoteLike ▪ <quote>: for any passage that is "attributed by the narrator or author to some agency external to the text"; allows the attribute @source, which points at a source; ▪ <cit>, "cited quotation," a container for <quote> in which it is associated with an explicit (in the text) reference to a source; 13 • in the class of model.emphLike ⚬ <mentioned>: "marks words or phrases mentioned, not used" 14 in a given context; and the most general marker-<q>-belongs to model.hiLike. This is a result of the presentation in section 3.3.3, which focuses on the use of quotation marks. Since modern printing does not, on the analysis proposed there, employ them to mark only related features, there need not be any expectation of a common superclass for the elements discussed here.

19
The quotation elements as described in the TEI Guidelines all share the @source and @corresp attributes. These two properties are expressive enough to provide the required intertextual links -that is, a link to a work, either in the abstract or as found in a certain edition, and to a passage in that edition. 17 For this reason, they need not be discussed further here. What is not so easily solved, however, is how these various elements map onto the conceptual distinctions underlying Steinkellner's model. Should one just use <q>, the most general form, and be done with it? Or can something be gained from dierentiating among the elements for dierent cases? The next section will consider some examples to answer these questions.

Simple Quotations (and Their Context) 20
Example 2 shows a quotation in the fullest sense of the word.
Example 2. Simple quote with attribution (the target of tei:quote/@source is not shown).

22
There are three voices that must be distinguished in this passage: 1.
The passage as a whole is spoken by Kamalaśīla, the commentator.

2.
He is connecting a passage in the base text, the text he is commenting upon, to an objection by a non-Buddhist opponent. The base text is here not quoted but only pointed at, by saying Through this, that is, through what is said in the base text.
3. The opponent's text is quoted verbatim, as a subphrase in a statement that rejects it. In this case, we can verify that it is a faithful quote, since the sentence is found in Uddyotakara's work ("Nyāyabhāṣyavārttika" 312.21-22, in Thakur 1997).

23
The commentary here performs various functions: 1.
The construction "Through this … has been rejected." ("etena … pratyuktaṃ bhavati"): connection of the text commented upon and the provided quote Since it is generally useful to capture the context of a quote, it is embedded in a <seg> element, which carries little semantic baggage. One could also consider the employment of a <cit> element instead of it in this case. But in the absence of a "bibliographic reference to [the] source" (TEI Consortium 2020, 3.3.3) of the quotation, this would not be an appropriate use case for <cit>.

25
All of these functions can be tied to parts or segments of the sentences under consideration. Ideally, all these functions that the commentary performs should be easy to nd by general queries run on the group of texts. A simple search for //quote will catch this type of quote, and its context can be discovered by checking the parent (//quote/parent::*). Possible renements that could be made by employing @type and/or @ana attributes, and which are not reected in the structure of the markup, will not be discussed here.

26
In Steinkellner's analysis, Uddyotakara's words in this passage would be categorized as Ce, an explicit quotation that is exactly the same as in a source available to us today.

Quotes as References 27
Later in the same text, Kamalaśīla introduces verse 1061 of the text he is commenting on as shown in example 3.
Example 3. Quotes as references, markup following typography .
<div n="1061"> <div xml:lang="sa"> <div type="base-text"> The passage in example 3 shows a type of quote signicantly dierent from the one we saw before.
The commentator's introduction of the passage from the base text, the text the commentary is written on, contains at least two elements (here still marked graphically by <hi>) 18 that should be categorized as <quote> elements of some sort: 1. The rst is indriyair ityādi. This refers to an earlier verse, Tattvasaṅgraha 939 (see Krishnamacharya 1926), which itself is a quote, in the base text, of a passage by an opponent.

2.
The second is agobhinnaṃ ca ityādi, which points to the beginning of the verse in the base text, Tattvasaṅgraha 1061, in which the author of the base text answers the opponent's claim that was reproduced in Tattvasaṅgraha 939.
Both of these phrases have as their main function the indication of the passage that the commentator is going to speak about, and they indicate it by repeating a few words from the beginning of the passage. 19 It is characteristic about this type of quotation that its content is usually irrelevant: it is a reference to a particular string of characters, or sequence of sounds, much like a lemma in a note of a critical edition indicates the place to which the note applies. Steinkellner (1988,117) remarks that these cases should be categorized as "quotations from another text" (Ce), though in his own editions he usually marked these kinds of quotes by graphical means (bold face), probably so as not to make the apparatus overow. 20 Example 4. Quotes as references, first attempt (translated in example 3; the element with the @xml:id of "ts__939" is not shown).
<div n="1061"> <div type="base-text"> <lg xml:id="ts__1061"> <l>agobhinnaṃ ca yad vastu tad akṣair vyavasīyate /</l> <l>...</l> </lg> </div> <div type="commentary"> <p> <seg>yac coktam <quote type="lemma" source="#ts__939">indriyair</quote> ityādi, tad asiddham iti darśayann āha-<quote type="lemma" source="#ts__1061">agobhinnaṃ ce</quote>tyādi.</seg> A rst solution, shown in example 4, is to use the <quote> tag for these repetitions, with @type set to "lemma" (in the sense used in textual criticism, not to be confused with the linguistic value as in the @lemma attribute). With this markup we can separate these <quote> elements from other types quite easily. The reference to the base text is encoded in a @source attribute. A possible drawback in this encoding is that the referring function of the passage in which the quotation appears does not become evident. Ultimately, one would have to establish (in the <encodingDesc>) that the value "lemma" in the @type of a <quote> element means that this element has some features of a reference. 21

31
The next suggested solution, in example 5, tries to make this feature explicit by using <ref> elements. However, this creates problems. Semantically, it is problematic because it is not the content of the quote that <ref> refers to: the commentary uses the phrase "that starting with xyz" to refer to the base text but only "xyz" is an actual quote. It also has the signicant drawback that any query intended to catch all quote-like elements will now have to include certain <ref> elements. This would increase the complexity of queries signicantly.
Example 5. Quotes as references, second attempt (translated in example 3; the element with the @xml:id of "ts__939" is not shown).

Quotes of Individual Words/Phrases for Elucidation 33
In a second passage that follows later in the commentary on the same verse, Kamalaśīla takes up the Sanskrit word ca (usually and or furthermore), and explains how it is to be understood in the verse: its function there is the inclusion of objects that are not explicitly mentioned in the verse, rather than the simple conjunction of this sentence with the previous one. This is not the same kind of referring quotation that we considered above in examples 4-6, because its main purpose is to comment on the signicance or meaning of the term, not to indicate a particular point in the base text. A proposal for its markup with <mentioned> and <gloss> is shown in example 7.
Example 7. Quotes for explanation.

35
The primary function of repeating the word that is to be explained, ca, is not to refer to the text, but to say something about the meaning or content of that term. This explanation can be encoded as a <gloss> element. Both elements, <mentioned> and <gloss>, are loosely tied to the base text, and not to each other, because there are variant forms of this phenomenon where either there is no clearly identiable <gloss>, or the term which could be <mentioned> is not repeated in the text. As before, the context useful to understanding this occurrence is wrapped in an anonymous <seg> element.

36
Steinkellner marks text segments of this kind with T' (textus usus secundarii), though he notes that "[f]ormally this kind of text would correspond to the group of 'Ce'-texts'… But in order to distinguish the group of quotation-texts in their hierarchical position more clearly I prefer the reference to the textus in the present denition" (1988,. This hierarchical position amounts to an expression of the value that an editor assigns to a text as a witness for some other text. In the present case, it declares the editor's conviction that the direct commentary is a privileged witness that is second in reliability only to direct witnesses (usually manuscripts) for the text that is being commented upon. The markup solution proposed here (with <mentioned>) does not explicitly express this facet, since it captures only the typed structural correspondence between text passages. However, this facet can be dened by the editor in a systematic fashion through the annotation of the target of the @source attribute: the determination of the "hierarchical position" is something that the editor will usually make explicit in their organization and interpretation of the sources. The encoding proposed here thus has the benet of keeping two unrelated things separate: the similarity (or, in some cases, identity) of sequences of characters in dierent texts, and the editor's estimate of their value as witnesses for certain editorial decisions. For example, an explicit quote from the base text in a commentary (corresponding to Ce in Steinkellner's schema) would receive the same markup as an explicit quote in any other text (<quote> with @source). If the editor decided that the commentary had greater reliability than the other text, this could be made explicit in any of the many ways that the TEI Guidelines provide for: for example, since a document encoded on the basis of witnesses will probably have a <listWit> element, one could at least add an @ana attribute to the <witness> elements contained therein, and derive the desired hierarchical position from a query of that attribute.

Silent Quotes and Allusions 37
The last case of text reuse to be discussed is that of "silent reuse." In this variant, phrases, sentences, or even longer passages are reused without explicitly marking them as being due to another agency. This phenomenon is characterized as usus secundarii by Steinkellner. These passages can, in this process, also change to varying degrees, and it is a dicult philological problem to judge the dierences between the "reused" and "original" texts. 22 The markup outlined for passages in this category also applies to cases where a passage is only paraphrased or alluded to.

40
Its rst reuse (encoded in example 13) is taken from the "Apohasiddhi" (see McAllister 2020) of Ratnakīrti, who lived in the rst half of the eleventh century: he calls his texts "abridgements" or "summaries" (saṃkṣepa) of the much more extensive works of his teacher, Jñānaśrīmitra. The same passage is again reused a few centuries later by Mokṣākaragupta in his "Tarkabhāṣā" (see Iyengar 1952; encoded in example 15). This third manifestation of the passage is heavily abridged.
This case illustrates how text passages were reused by several authors. It also contains a quote-within-a-quote, an explicit quotation of a partial verse from a foundational text for the logico-epistemological school of Buddhism, the "Pramāṇaviniścaya" (see Steinkellner 2007) (in the examples below, a modern partial edition of it is referred to by "#pvin1"; the @target "PVin.xml#v15" refers to a TEI XML edition containing the quote). One might hesitate about how to categorize this quote from the "Pramāṇaviniścaya": it is, on the surface, an explicit quote, but it is unlikely that any of the two authors reusing the passage in which it occurs should be said to have quoted this contained item itself. It is more likely that the verse was transmitted by them in the form found in the passage surrounding it, which was being reused as a whole. This question has to be answered when one wishes to judge the value of the preserved version of the verse for an edition of the "Pramāṇaviniścaya." The proposed markup should enable a researcher to easily collect information that allows them to answer that question.
Two solutions will be proposed here. The rst employs only lightweight stand-o markup and might be suitable for cases where the markup of the source documents cannot easily be changed.
The second solution has the benet of simpler querying, but at the cost of deeper interventions in the source documents.

42
The rst solution is to use encoding such as that shown in example 8. This proposal marks up parallel text passages in a way that is minimally invasive on the source documents. It adheres closely to the <linkGrp> solution proposed in section 16.5.2 of the TEI Guidelines, Alignment of Parallel Texts. 24 This solution has at least two advantages: often, the interpretation of textual parallels is a rather subjective matter. The separation of this interpretative layer from the main text requires only <anchor> elements in the document, so that the impact on existing applications should be minimal. The <linkGrp> elements can be easily shared, stored in separate les, and one can have several of them, so that they can accommodate conicting interpretations by dierent researchers. A second benet is that they also work when the quoted material does not align nicely with block-level elements, as is the case in example 8, where the reused material crosses over a paragraph border in the rst case of its reuse.

43
A drawback of this solution is that the search for quotes will become more complicated. But I feel that in this situation this disadvantage is justied to a certain extent: all members of model.attributable are dened, after all, as having been explicitly marked as quotes in the source text. These "silent quotes" are not marked in such a way. Both solutions could be simplied by using CTS URNs as suggested and discussed by Berti, Almas, et al. (2015) and Berti, Blackwell, et al. (2016). Such references would replace the simple internal references (like "#j1" in the examples). For the rst solution, we would then not even have to introduce <anchor> elements, since we could link directly into the text. For the second solution, it would at least save us the necessary markup in the document containing the reused text.

Conclusion 46
All cases of text reuse analyzed so far can be quite clearly expressed within the framework of the TEI Guidelines. So in this sense, no particular changes are necessary to accommodate this facet of Sanskrit literature. The markup proposed here is expressive enough to allow for all of the distinctions that were developed expressly for the study and edition of these texts (Steinkellner 1988). However, it also allows for a cleaner separation of the raw textual evidence and the editor's judgement, by encoding only the textual correspondences. The TEI Guidelines provide a rich set of mechanisms for attaching interpretations to the basic set of structural elements discussed here.
A simple way would be the addition of @ana and/or @type attributes to the appropriate <seg> or <quote> elements. An alignment of a source text and a derived text, even through multiple derivations, is possible and could be opened for further automatic processing after adequate normalization of the <linkGrp> elements.
Example 10. List of passages connected through <link> elements (for documents as in example 8).

⚬
On the second solution proposed above (see example 9), the results of //quote would include these cases; one could lter on attribute values (not discussed here).
• e added to Ce or Ce', signifying that the editor believes that the text was changed by the author of T (modus edendi) ⚬ This is judged from the comparison of two passages; the editor's decision can be recorded in an @ana attribute to all of the proposed elements, but should not be expressed in the structure of the markup. •

Ci(') citatum in alio (usus secundarii), (silent) quotation of T in another text
⚬ This is a simple reversion of previous relations (i.e., a //quote or similar search on the target text).
• R citatum in alio modo referendi, a text of T referred to or paraphrased ⚬ This case belongs to the ones discussed in section 5. An editor might express peculiarities of the reference or paraphrase with an @ana attribute that can be added to any of the given relations. In most cases, the "quotation" expressed through a <link> element will be the obvious place for such a specier. In the solution that encodes silent quotes as <quote>, an example was given that would match // quote[@type="silent"].
It does not actually mark the passage as a quotation in the context of the text being edited.
8 This distinction becomes relevant when a passage is reused more than once by subsequent authors, or when it is contained in a passage that is itself a case of text reuse. See the discussion of example 8.
9 Steinkellner (2017, xxv) has recently maintained: "When comparing two texts, it is not always possible to determine the nature and origin of specic noticeable dierences, yet the dierence between changes due to a text being adapted to a new context and those due to scribal and/or editorial work is usually recognizable." There is, as yet, no consensus on this point.
10 Steinkellner (2017, xxi-xxv) summarizes and responds to the main concerns that have been raised against his suggestions. Since the debate is, by and large, about the certainty that can be had about what these symbols posit about a span of text, the issues raised should be further discussed in publications pertinent to the corresponding eld of research.

11
The modern editions of these texts generally do use the categories of modern printing. Insofar as texts are digitized not from manuscripts but from printed editions (as are many of the texts in the SARIT library, accessed October 10, 2020, https://github.com/sarit/SARIT-corpus), the categories are a better, though not perfect, t. For further observations on the history of book printing in relation to South Asian manuscript culture, see Pollock (2007).
12 See also table 1 for an overview together with the attributes most useful for linking the segment of text to either a source or a parallel instance.
this: "When a word or term is not used functionally but is referred to as the word or term itself, it is either italicized or enclosed in quotation marks." It is for this kind of phenomenon that the <mentioned> tag is intended.
15 Unless indicated otherwise, all XPath and XQuery expressions shown in this paper operate in the default namespace, http://www.tei-c.org/ns/1.0.
16 A more elegant solution would be to derive the elements' names from the corresponding documentation (or ODD le). See example 1 for a slightly less naive approach that returns the local names of the elements in model.attributable and its immediate subclasses. Even this simple example shows that "a search for all quotations" in a TEI document needs to take a schema as its basis, for it includes <floatingText> in its results, which is, however, dierentiated from quotes in an important respect: "… the semantics of <quote> suggest that its content derives from a source external to the current text, <floatingText> carries no such implication …" (TEI Consortium 2020, section 4.3.2) 17 At least if a well-construed system is used. A noteworthy example is the denition of the Uniform Resource Name (URN) dened for the "Canonical Text Services" (CTS) in the Homer Multitext Project (http://www.homermultitext.org/hmt-doc/cite/texts/ctsoverview.html see Blackwell and Smith 2013). Berti, Almas, et al. (2015) show that this reference format has proven useful even for organizing fragments, i.e., text passages that have been taken from works that did not survive as a whole. Applying this observation, Berti, Blackwell, et al. (2016) provide a detailed discussion and case-study of constructing valid URNs that are expressive enough to encode text reused in several ways, together with its referential aspects. Especially interesting here are their proposed solutions for encoding text that is either reused in a direct quotation, alluded to, or repeated for further explanation. They propose a very general system of aligning texts, mainly through the judicious use of three locators: one for the passage as found in a canonical edition, one for the passage as found in the work reusing it, and one for the new edition of the work as constituted by the collection of the reused passages.