A TEI-based Approach to Standardising Spoken Language Transcription

structure of transcriptions according to two different transcription conventions (HIAT and cGAT). Using this tool, transcribers can continue to work with software they are familiar with while still producing TEI-conformant transcription files. The paper concludes with a discussion of the work needed in order to establish the proposed standard. It is argued that both tool formats and the TEI guidelines are in a sufficiently mature state to serve as a basis for standardisation. Most work consequently remains in analysing and standardising differences between different transcription conventions.


Introduction 1
Spoken language transcription is an important component of many types of humanities research. Among its central areas of application are linguistic disciplines like conversation and discourse analysis, dialectology and sociolinguistics, and phonetics and phonology. The methods and techniques employed for transcribing spoken language are at least as diverse as these areas of application. Different transcription conventions have been developed for different languages, research interests, and methodological traditions, and they are put into practice using a variety of computer tools, each of which comes with its own data model and formats. Consequently, there is, to date, no widely dominant method, let alone a real standard, for doing spoken language transcription. However, with the advent of digital research infrastructures, in which corpora from different sources can be combined and processed together, the need for such a standard becomes more and more obvious. Consider, for example, the following scenario: A researcher is interested in doing a cross-linguistic comparison of means of expressing modality. He is going to base his study on transcribed spoken language data from different sources. Table 1 summarises these sources.  (Nivre et al. 1999) 2 Undoubtedly, the corpora have a lot in common as far as their designs, research backgrounds, and envisaged uses are concerned. Still, as the table illustrates, not a single one of them is compatible with any of the others, neither in terms of digital file formats nor transcription conventions used. In order to carry out his study, the researcher will thus have to familiarise himself with eight different file formats, eight different transcription conventions and, if he is not able or willing to do a lot of data conversion, eight different techniques or tools for querying the different corpora. Obviously, the world of spoken language corpora 1 is a fragmented one. The aim of this paper is to explore whether an approach based on the Guidelines of the TEI can help to overcome some of this fragmentation. In order for such an effort to be successful-that is, to really reduce the variation-I think that it is necessary to take the following factors into account: • Since spoken language transcription is a very time-consuming process, it is crucial for transcribers to have their work supported by adequate computer tools. Any standardisation effort should therefore be compatible with the more widely used tool formats. This compatibility should manifest itself in something that can be used in practice, such as a conversion tool for exchanging data between a tool and the standard.
• The reason for variation among transcription conventions and tool formats can be pure idiosyncrasy, but it can also be motivated by real differences in research interests or theoretical approaches. A standardisation effort should carefully distinguish between these two types of variation and suggest unifications only for the former type.
• Not least because the line between the two types of variation cannot always be easily drawn, any standardisation effort should leave room for negotiations between the stakeholders (that is, authors and users of transcription conventions, and developers and users of transcription tools) involved. This paper therefore does not intend to ultimately define a standard but rather to identify and order relevant input to it and, on that basis, suggest a general approach to standardisation the details of which are left to discussion.

3
Following these basic assumptions, the paper is structured as follows: Sections 2 and 3 look at two fundamentally different, but interrelated, things to standardise. Section 2 is concerned with the macro structure of transcriptions-that is, temporal information and information about classes of transcription and annotation entities (for example, verbal and non-verbal)-as defined in tool formats and data models. Section 3 is concerned with the micro structure of transcriptions-that is, names for, representations of, and relations between linguistic transcription entities like words, pauses, and semi-lexical entities. This is what a transcription convention usually defines. Both sections conclude with a suggestion of how to standardise commonalities between the different inputs with the help of the TEI. Section 4 then discusses some aspects of application-that is, ways of using the proposed standard format in practice.

Macro Structure and Tool Formats 4
Transcription tools support the user in connecting textual descriptions to selected parts of an audio or video recording. I will call the way in which such individual descriptions are organised into a single document the macro structure of a transcription. Transcription macro structures, and, consequently, the file formats used by the tools, usually remain on a relatively abstract, theory-neutral level. They are concerned with abstract categories for data organisation and with the temporal order of textual descriptions and their assignment to speakers, among other things, but they usually do not define any concrete entities derived from a theory of what should be transcribed (such as words and pauses). This latter task is delegated to transcription conventions (see the following section Although there are numerous differences in design and implementation of the tools, and although each tool reads and writes its own individual file format, their data models can all be understood as variants of the same base model. The basic entity of that data model is a time-aligned annotation-that is, a triple consisting of a start point, an end point, and a field containing the actual transcription or annotation. 5 Further structure is added by partitioning the set of basic entities into a number of tiers and assigning tiers to a speaker and/or to a type. As  have shown, this simple structure can be viewed as a common denominator of all tools, and it can be used to establish a basic interoperability between them.

7
Beyond the common denominator, the tool models also differ in several details: • Implicit vs. explicit timeline: In some models (like ANVIL and Praat), start and end points of the basic entities point directly to a time point in the recording. In other models (like EXMARaLDA and ELAN), they point to an external timeline-an ordered set of time points, which, in turn, can (but need not) have timestamps pointing into the recording.
• Speaker assignment of tiers: Some models (like EXMARaLDA and ELAN) allow (and sometimes require) tiers to be explicitly assigned to a speaker entity. Other models (like ANVIL and Praat), although they allow tiers to be characterised by a name and other features, do not have an explicit concept for speakers.
• Simple and structured annotations: In some models (like ANVIL and ELAN), the basic entities can have an internal structure, while in others (like EXMARaLDA and Praat), they always consist of simple text strings.
• Single layer and multi-layer: Some models (like FOLKER and Transcriber) provide a single tier for each speaker in which all annotation for that speaker has to be integrated. Other models allow multiple tiers for each speaker onto which annotations of different kinds (such as verbal vs. non-verbal or segmental vs. supra-segmental) can be distributed. In most models of the latter type, tier categories and semantics can be freely defined on the basis of a few abstract tier types (as in ANVIL, ELAN, EXMARaLDA, but see next point), whereas CLAN/CHAT predefines an extensive set of tier categories and a semantics for them.
• Tier types and dependencies: All multi-layer tools provide a system for classifying tiers according to their structure and semantics. The tier types can be associated with certain structural constraints on annotations within the respective tier or in relation to annotations in another tier. This often results in a tier hierarchy where one tier is regarded as primary and other tiers as subordinate to (or dependent on) the primary tier. No two tools use the same system of tier types, but there are some obvious commonalities and interrelations between the systems. 8  conclude that, "given that the diversity in tool formats is to a great part motivated by the different specializations of the respective tools", a full assimilation of the different data models is neither theoretically desirable nor practically possible. However, the similarities between the data models clearly outweigh the differences. I would therefore like to argue that, at least for the purposes of this paper, it will be sufficient to declare one of the formats as a typical exponent of a class containing all the others, and use this typical exponent as the basis for a transformation to TEI. The fact that EXMARaLDA has conversion filters for importing the formats of all the other tools shows that this assumption is not only true in theory, but can also be put to use in practice. In what follows, I will therefore use EXMARaLDA's data model as a representative of all the other tools.

EXMARaLDA's Data Model and Format 9
Concerning the above parameters, EXMARaLDA's data model has an explicit timeline, allows speaker assignment of tiers, uses only simple annotations, allows multi-layer annotations, and distinguishes three tier types which I will illustrate with the help of the following example. Figure 1 shows a transcription as displayed by the EXMARaLDA Partitur-Editor. Figure 1: Example transcription as displayed in the EXMARaLDA Partitur-Editor with a waveform representation of the recording (top) and a musical score representation of the transcription (bottom). Annotations (white fields in the musical score) are assigned to tiers ("rows" of the score) and intervals of the timeline ("columns" of the score). The tiers are labelled with abbreviations for the corresponding speakers ("DS" and "FB") and with a category ("sup", "v", etc.).
• Likewise, elements having a close semantic relationship, like the orthographic and phonetic transcriptions in the last two tiers, are not necessarily close to one another in the document.
• The dependency between annotations in tiers of type T and tiers of type A is not explicitly represented in the document structure.
• Since the division of annotations is motivated by the temporal structure of the discourse, the boundaries of individual annotation elements may cut through linguistic entities. This is the case, for example, for the utterance "Alors ça dépend ((cough)) un petit peu.", which is distributed across three <event> elements in order enable the representation of different simultaneity relations in the discourse.
14 One resulting disadvantage is that certain XML techniques (like XPath queries) can become inefficient for such documents because the techniques are optimised for processing tree structures, whereas the principal structure of the document is not represented in the document tree. Another disadvantage is that the (manual) insertion of additional markup, such as with the help of a standard XML editor, becomes difficult because the elements of the document do not behave as in a "normal" (i.e. written) text. As a basis for a transformation to a TEI-conformant form, this kind of document organisation is thus not ideal. A first question on the way to a TEI-based standardisation therefore is whether an equivalent XML representation of the data model can be found which does not suffer from the same drawbacks.

A TEI Representation of EXMARaLDA's Data Model 15
My suggestion is to derive such an equivalent representation on the basis of the concept of a segment chain. With respect to the EXMARaLDA data model, a segment chain can be defined as any maximally long, temporally connected sequence of annotations in a tier of type T. The above example contains three such segment chains, marked with grey boxes in figure 4. These segment chains-which loosely correspond to an entity often called a turn or a speaker contribution -have three important structural properties: • They are implicitly contained in the data model and can be automatically derived from it.
• They re-combine the character data of linguistic entities (words and utterances) from tiers of type T, which were separated in the data model due to temporal considerations (temporal overlap of annotations) into a superordinate entity.
• Since annotations in tiers of type A will, by definition, not cross the boundaries of such segment chains, each such annotation can be assigned to exactly one segment chain.

17
Subsuming all annotations in tiers of type A under "their" segment chain and ordering segment chains by their start points, a document can thus be constructed whose document order is globally analogous to the actual sequence of events in the transcribed discourse, whose elements locally behave like normal written text, and in which dependent annotations are grouped together with the annotations they depend on.
Although some of them claim to be "unified systems" (GAT) or even "standards" (GTS), they exist more or less independently of one another. In contrast to the situation with tool formats, there have been few attempts to establish "interoperability" between transcription conventions; real standardisation efforts have, to my knowledge, not been undertaken at all. The present paper is not a place to carry out a full comparative analysis of the systems that would be needed for such a standardisation effort. Instead, I will restrict myself to discussing some commonalities and differences by using examples and working under the assumption that the same method can be transferred to other aspects of the systems. Schmidt (2005a) carries out a more comprehensive and detailed analysis of two of the systems mentioned here (HIAT and GAT).

Commonalities and Differences 22
Perhaps the most fundamental commonality among the conventions is that they depart from standard written orthography in order to motivate and explain their rules for representing spoken language in the written medium. An important consequence of this is that the entity "word" is present in all the conventions with more or less the same meaning, namely that of a word as defined by standard orthography. Two other basic entities shared by all the conventions are unfilled pauses and audible non-speech events like breathing, laughing or coughing. Furthermore, all of the conventions specify ways to represent uncertainty in transcription (sometimes with the possibility to provide alternatives to an uncertain part) and to represent incomprehensible passages. I will call these five elements the basic building blocks of transcription conventions.

23
Another class of entities to be found in most systems consists of prosodic characterisations of words or parts thereof. This class can comprise phenomena like (emphatic) stress or lengthening of syllables. Finally, most systems define entities which summarise words and other basic building blocks into larger units analogous (but explicitly not identical) to the sentence in written language.

24
Taking these commonalities as a starting point, I will illustrate some important differences between the conventions using the set of examples in figure 6 in which a fictitious stretch of speech is transcribed according to five different transcription systems. 8 Obviously, some variation is due only to symbolic differences among the conventions. Thus, HIAT, GAT and cGAT describe non-verbal incidents ("coughs") in double parentheses, whereas CHAT marks such descriptions with the prefixed symbols &= and DT1 chooses capital letters between single parentheses and, additionally, has special predefined symbols for certain such incidents (laughing is represented by the symbols @@ ). Similarly, each system has its own symbol(s) for representing a short, unmeasured pause: the bullet • in HIAT, the symbols (-) in GAT and cGAT, the hash sign # in CHAT, and two full stops (periods) in DT1.

26
The conventions also vary in what phenomena are represented in the transcription. Thus, the lengthening of the vowel in the word "please" is indicated in HIAT through a reduplication of the vowel symbol and through the insertion of a colon in GAT (this being another case of symbolic variation), but it is not represented at all in the other three systems. Similarly, transcriber uncertainty with respect to a given word can be marked in HIAT, GAT, cGAT and DT1 (through single parentheses in the first three and through a pair of <X and X> in the latter), but only GAT and cGAT also provide the possibility to specify one or more alternative transcriptions for an uncertain word (added inside the parentheses after a slash).

27
While symbolic and other variation discussed so far remain on the level of basic building blocks, a last type of variation is more complex and concerns the way basic transcription units are organised into larger structures. This type of variation is visible in the punctuation symbols used in figure 6, specifically: • HIAT divides the stretch of speech into two entities called utterances. Utterances are pragmatic units of speech, identified and classified according to function-based criteria, most importantly their mood. The first utterance is terminated by a full stop (period), indicating that it is in declarative mood, while the second is terminated by an exclamation point, marking its mood as exclamative. A third punctuation symbol-the forward slash behind the word "must"-indicates a self-repair but does not act as an utterance terminator.
Note that in contrast to all other systems, HIAT uses capitalisation of words at the beginning of utterances.
• GAT divides the same stretch of speech into three entities called intonation phrases.
Intonation phrases are prosodic units of speech, identified and classified according to formbased criteria, most importantly their intonation contour. The first and third intonation phrases are terminated by a hyphen, indicating a level final pitch movement. The second intonation phrase is terminated by a semicolon, which stands for a falling final pitch movement.
• CHAT proceeds similarly to HIAT, but has three utterances instead of two. The first is terminated by an ellipsis symbol (three dots), marking it as an interrupted utterance. The other two are marked by a full stop (period) and an exclamation point, making them declarative and emphatic, respectively.
• The corresponding entities in DT1 are called intonation units. The first is terminated by two hyphens (an interrupted intonation unit), the second one by a full stop (period) (a terminative intonation unit), and the third one by a question mark (an "appeal").
• cGAT, finally, does not group basic building blocks into larger entities at all.

28
If the information codified in transcription conventions is to be standardised, these different kinds of variation between the systems must be taken into account. Ideally, a standard should make sure that pure symbolic variation is harmonised by mapping different surface forms onto standard single form, and that all other variation is expressed in a manner that conserves the original diversity while still making it possible to process transcriptions from different sources on a common basis. I think that the TEI Guidelines furnish all the necessary elements for such a standardisation; at least the following elements from chapters 3 (Elements Available in All TEI Documents), 4 (Default Text Structure), 8 (Transcriptions of Speech) and 17 (Simple Analytic Mechanisms) will be necessary to adequately represent transcriptions according to any of the above conventions: • <w> and <c> to mark up individual words and punctuation characters (unless the semantics of a punctuation character is already represented through another mechanism in the markup), possibly with an attribute @type to characterise a word as a repaired form, as an assimilated form, etc. or to note that a character represents a lengthened phoneme • <pause> with a @dur attribute and <incident> with a <desc> child to represent pauses and non-speech events • <unclear> elements, possibly with a superordinate <choice> element to represent uncertain transcriptions and alternatives • <seg> elements with a @function attribute to provide the general name for such units in the respective conventions (such as utterance vs. intonation unit) and a @type attribute to capture the specific characterisation of that unit (such as declarative vs. interrupted) 30 Using these elements, the <u> elements in the example from figure 5 (which follows the HIAT convention) could be marked up as shown in figure 7. and <pause> where there is only symbolic variation, but it can differ with respect to elements like <seg> where there is a "real" difference between the systems.  Having defined a proposal for a TEI-based standard, I will now turn to the question of how to use it in practice. Most importantly, this means thinking of ways in which transcribers can efficiently produce standard, conformant transcriptions. Ideally, they will continue to be able to use the tools they are familiar with and to focus on the transcription task itself rather than on issues related to XML and TEI encoding.

33
These requirements are relatively easy to meet as far as the macro structure of transcriptions is concerned: the format illustrated in figure 5 is isomorphic to EXMARaLDA's tool format. This format, in turn, is compatible to a large extent with all the other tool formats mentioned in Section 2 because of the import and export routines built into EXMARaLDA and several other tools. By virtue of transitivity, making all tools compatible with the format in figure 4 is therefore simply a matter of defining a one-toone mapping between one tool format and the TEI format. In order to ensure maximal portability, this mapping should be accomplished with an XML-only approach using XSL stylesheet transformations. XSL stylesheets which transform an EXMARaLDA transcription into an equivalent TEI representation and vice versa have been made available on the EXMARaLDA website at http://www.exmaralda.org/tei.html. The stylesheets have also been integrated into the EXMARaLDA editor, where the transformations can be carried out using the tool's import and export functions. For formats from other tools, either a direct mapping could be defined in an analogous manner, or EXMARaLDA could be used as an intermediary representation.

34
The requirements are harder to meet for the micro structure of transcriptions. Most commonly used tools (FOLKER being an exception) do not provide a way of directly representing micro structure in their file formats. While the markup expressing the micro structure could be added manually in a generic XML editor after a tool's format has been converted to the TEI representation of figure 5, this procedure would be rather inefficient since it requires a second tedious manual processing step after the actual transcription has been completed. A more efficient way is to automatically derive the micro structure markup from the regularities formulated inside the transcription conventions. This is possible if we interpret some of the symbols defined by a convention as an implicit (and non-standardised) markup and formulate an algorithm-a parser-to transform this implicit markup into explicit, TEI-conformant XML markup. Figure 9 exemplifies this process for the HIAT example from figures 5 and 7. 1. Unparsed <u> <u> <anchor synch="#T3"/>Alors ça <anchor synch="#T4"/>dépend ((cough)) <anchor synch="#T5"/>un petit peu. <anchor synch="#T6"/> </u>

Parsing: Transforming implicit to explicit markup
Alors␣ça␣dépend␣((cough))␣un␣petit␣peu.␣ 9 <seg function="utterance" type="declarative"> <w>Alors</w><w>ça</w><w>dépend</w> <incident><desc>cough</desc></incident> <w>un</w><w>petit</w><w>peu</w> </seg> The implicit markup in this case consists of spaces indicating word boundaries, double parentheses indicating non-phonological descriptions, and the full stop (period) indicating and qualifying an utterance boundary. Of course, in order for the parsing algorithm to work reliably, the symbols interpreted as implicit markup must have been rigidly and unambiguously defined in the respective convention. Luckily, all conventions claim to ensure this unambiguousness in their choice of transcription symbols. 10 The parsing algorithm can then, in principle, be implemented in any technology and does not need to take any prescribed form as long as it produces correct output (a well-formed TEI-compliant XML fragment) for correct input (a string following the rules of a given transcription system). 11 EXMARaLDA has built-in parsing algorithms for HIAT, GAT, cGAT and CHAT which are implemented as finite-state transducers in Java, showing that a very simple parsing technique can be sufficient to deal with several of the transcription conventions mentioned above.

36
Transforming a tool format to a corresponding TEI format in which both macro and micro structure are represented is thus a two-step-process. First, a generic TEI document is produced in which only the macro structure is represented. Second, a parsing algorithm is applied, which adds markup for the micro structure. Figure 10 gives a schematic illustration of the transformation workflow. 12 In order to make this transformation workflow available to users in a maximally accessible way, we have written a Java droplet which takes as input any CHAT, ELAN, EXMARaLDA, FOLKER or Transcriber transcription file and transforms it to a TEI file using a set of parameters-the parsing algorithm to be used among them-specified by the user. Figure 11 shows a screenshot of that application, which will be made freely available as a part of the EXMARaLDA tool package. In this paper, I have formulated a proposal for standardising spoken language transcription with the help of the TEI Guidelines. The proposal consists of two principal components. First, a TEI-conformant format is defined that is structurally equivalent to the formats written by several widely used transcription tools and which represents the macro structure of the transcription in a form that is well-suited for standard XML processing. Second, implicit markup contained in the character data of such documents is transformed to explicit TEI conformant markup using a parsing algorithm that embodies the formal regularities of a transcription convention. The resulting document then represents both macro and micro structure of the transcription in a TEI-compliant way. A droplet application enables users to carry out the transformation from tool format to TEI format and the parsing of the TEI format according to a specific transcription convention in a user-friendly way.

39
The route to standardisation formulated here can be viewed as a synthesis of work in three areas related to spoken language transcription: tool development, TEI encoding, and transcription conventions. All three can be said to have as one of their goals unification or harmonisation of similar practices, but each of them foregrounds a different aspect in that goal.

40
Tool developers usually aim at defining data models and formats which are both general and flexible enough to be used for different data types and different research interests while at the same time specific enough to allow for efficient processing of the data. As the present paper has shown, the solutions they have developed to meet these requirements are sufficiently interoperable to become the first ingredient of the standardisation effort.

41
The goal of the TEI is to provide a common tag set for the representation of texts in digital form where spoken language transcriptions are simply viewed as "texts of a special kind". Again, the present paper has shown that the existing solutions-as formulated in the P5 version of the Guidelines-are comprehensive and detailed enough to adequately represent commonalities and differences between transcription formats and conventions. They can thus become the second ingredient of the standardisation effort.

42
The situation is a little less clear for the third ingredient, the transcription conventions. Here, the present paper has shown-as a proof of concept at least-that existing conventions are sufficiently systematic to become the basis for a parsing algorithm. However, the formalisations required to derive such an algorithm are usually not explicitly defined inside the conventions but have to be inferred from a potentially errorprone interpretation of an informal text. Likewise, the distinction drawn here between symbolic and other variation among transcription conventions, though arguably very important for standardisation, is not a topic that the conventions themselves deal with at greater length. It seems, therefore, that in this area, the idea of formal standardisation has not yet gained as much ground as in the area of tools and the TEI. If the approach suggested here is to become the basis of a full-grown standard, most work will probably remain in standardising transcription conventions.

2.
In a way, CHAT is an exception to this because it is the name both of the data format used by the CLAN tool and of a transcription convention. However, the CHAT format and the CHAT convention can be clearly separated conceptually. Thus, it is possible to use the CHAT format with a different transcription convention and to use the CHAT convention with a different format.

3.
It is by no means uncommon to use such tools for transcription. However, the resulting data are more or less unstructured texts, and this lack of explicit structure makes them ill-suited for a standardisation effort.

4.
Further tools belonging to the same family are: the TASX annotator, tools from the AG toolkit and WinPitch.

5.
The data models can therefore all be understood as special types of annotation graphs as defined by Bird & Liberman (2001).

6.
Note that the definition given in the TEI Guidelines for the <u> element -"a stretch of speech usually preceded and followed by silence or by a change of speaker" -is compatible with the way it is used here to represent a segment chain. The name "utterance", however, may not be too lucky a choice for this element since some transcription conventions use the same name to denote a much more specific entity of speech (see next section).
7. There are of course many possible alternative representations which also conform to the TEI Guidelines. However, as Schmidt (2005b) and others have repeatedly argued, processing of the data is much facilitated by selecting one option out of the many and disallowing all others. For example, the document in Figure 4 might just as well connect a <u> to the timeline by giving it a @start and an @end attribute. The representation chosen here is not in any way superior or inferior to that alternative, but it is still important to minimise variation by explicitly declaring one alternative as the preferred one.

8.
The examples use a selection of the conventions' rules only. Proficient users of the respective conventions may disagree on some details of what is transcribed here and how it is transcribed, and the example is certainly not a realistic one. Remember, though, that the aim here is to exemplify some differences between the systems, not to fully and precisely describe them. 9. Implicit markup is printed in bold face here. The symbol ␣ represents a space. 10. E.g. MacWhinney (2000) for CHAT: "Codes, words, and symbols must be used in a consistent manner across transcripts. Ideally, each code should always have a unique meaning independent of the presence of other codes or the particular transcript in which it is located." 11. Since the algorithm relies on the regularities defined in the transcription conventions, any incorrect input (a string violating the convention) should lead to an error in parsing, indicating the non-validity of the input string with respect to the conventions. In the tool described below, such parsing errors will be signalled to the user, and an unparsed TEI version will be produced as output.
12. Solid lines stand for existing conversion routes; dashed lines indicate additional possible conversion routes.

ABSTRACTS
This paper formulates a proposal for standardising spoken language transcription, as practised in conversation analysis, sociolinguistics, dialectology and related fields, with the help of the TEI guidelines. Two areas relevant to standardisation are identified and discussed: first, the macro structure of transcriptions, as embodied in the data models and file formats of transcription tools such as ELAN, Praat or EXMARaLDA; second, the micro structure of transcriptions as embodied in transcription conventions such as CA, HIAT or GAT. A two-step process is described in which first the macro structure is represented in a generic TEI format based on elements defined in the P5 version of the Guidelines. In the second step, character data in this representation is parsed according to the regularities of a transcription convention resulting in a more fine-grained TEI markup which is also based on P5. It is argued that this two step process can, on the one hand, map idiosyncratic differences in tool formats and transcription conventions onto a unified representation. On the other hand, differences motivated by different theoretical decisions can be retained in a manner which still allows a common processing of data from different sources.
In order to make the standard usable in practice, a conversion tool-TEI Drop-is presented which uses XSL transformations to carry out the conversion between different tool formats (CHAT, ELAN, EXMARaLDA, FOLKER and Transcriber) and the TEI representation of transcription macro structure (and vice versa) and which also provides methods for parsing the micro structure of transcriptions according to two different transcription conventions (HIAT and cGAT). Using this tool, transcribers can continue to work with software they are familiar with while still producing TEI-conformant transcription files. The paper concludes with a discussion of the work needed in order to establish the proposed standard. It is argued that both tool formats and the TEI guidelines are in a sufficiently mature state to serve as a basis for standardisation. Most work consequently remains in analysing and standardising differences between different transcription conventions.