Encoding Interruptions in Parliamentary Data: From Applause to Interjections and Laughter

Parliamentary data, especially parliamentary discourse, is of interest to researchers from various elds in the humanities and social sciences. The growing number of machine-readable and annotated parliamentary text corpora opens up this eld for computer-based quantitative analysis. In this paper I aim to give an overview of how parliamentary interruptions are recorded in ocial parliamentary records and how they are modeled and encoded in currently available machine-readable parliamentary corpora. Furthermore, I will discuss whether these encodings are suitable for computer-based quantitative analysis of parliamentary interruptions. I will suggest detailed encodings of parliamentary interruptions in TEI as an extension to the Parla-CLARIN recommendations, to enable the extraction of parliamentary interruptions and to facilitate computerized quantitative analysis based on these encodings. As an example, I will use the encoding of interruptions in the Austrian Parliamentary Records.


6
Even though interruptions are a general feature of parliamentary discourse, it has to be said that occurrence, type, and frequency depend on the parliamentary tradition and the rules of procedure (see Bevitori 2004, 87).

Recordings of Interruptions in Parliamentary Records 7
In order to study parliamentary discourse, and in particular parliamentary interruptions, researchers have to rely on the documentation of parliamentary debates, such as transcripts, shorthand records, and audio/video recordings. In most countries, parliamentary debates are recorded formally in parliamentary records and published. Winters (2017) states that "[p]arliamentary records form a unique longitudinal dataset about human behaviour: they can span hundreds of years, with periodical (often daily or weekly) accounts published according to a stable procedure. The nearly verbatim character of much of the transcription, albeit within certain conventions, also renders it closer to the spoken word than other sources with similar chronological coverage." 8 Most parliamentary records, whether transcriptions or stenographic verbatim records, record not only the speeches of members of parliament, but also the interruptions. Typically, these interruptions are distinguished from the rest of the text by some typographical convention (e.g., italics, boldface, or indentation). I will present some examples taken from parliamentary records of dierent countries in dierent languages. In the parliamentary records of the Italian Senate (see gure 1), for instance, the interruption is written in italics and enclosed in parentheses. The example shows a nonverbal interruption, namely applause (applausi). In the following example of a German parliamentary record (gure 3), the interruptions are set in parentheses and indented, to contrast with the text of the regular speeches. In gure 3 we also see a combination of interruptions: a nonverbal interruption (Beifall, applause) and two verbal interruptions (by MP Michael Grosse-Brömer and MP Sören Bartol).   14 These examples show that interruptions recorded in the written records in these ve countries range from terse descriptions like "Laughter" in the British Hansard or "Applausi" in the Italian records to more detailed descriptions, as in the Polish, German, and Austrian records, where the wordings of the verbal interruptions are also recorded.

28
As an example for a corpus using the performance text module, we will examine the GermaParl corpus. In this corpus the following information is annotated: the speaker's name, the party aliation, and whether an utterance is a speech or an interjection (see Blätte and Blessing 2018, 812). Individual utterances in speeches are annotated as <sp> with paragraphs as <p>. Interjections are encoded as <stage type="interjection"> within speeches. However, as can be seen in example 3, the encoders do not distinguish between nonverbal interruptions such as "(Beifall bei
In the manually annotated de-parl corpus the speeches and verbal interruptions are encoded as utterances with the element <u>. Nonverbal interruptions such as applause are annotated with <incident> and other comments by the stenographers with <note> (see Truan 2019b, 10). In example 4, utterances and incidences can be identied. However, without having the original records there is no way to categorize the utterance <u who="#HAUSSMANN"> Sie sind doch in der Regierung!</u> as a verbal interruption, an interjection. There is no attribute with a value indicating that in the original transcript this <u> was an interruption, typographically marked as intended text within brackets. As a consequence, it is impossible, relying only on the encoded corpus, to distinguish between speech and verbal interruptions, and therefore to extract, for example, only the verbal interruptions. Furthermore, there is no markup for interrupted speech, such as speech by the same speaker that continues after the interruption, which in the original is implied with the dash "-" at the beginning of the sentence (see gure 7). <desc>Lachen bei der F.D.P. sowie bei Abgeordneten der CDU/CSU</desc> </incident>habe ich Ihnen sehr sorgfältig zugehört.</u> <u who="#HAUSSMANN">Ein bißchen billig!</u> <u who="#FISCHER">Das ist überhaupt nicht billig.</u> <u who="#HAUSSMANN">Doch! Aber Sie sind nicht kollegial!</u> that although an "interruption might be … encoded as a <note>… , it is more precisely encoded as a separate utterance" (Erjavec and Pančur 2022). However, the recommendations do not suggest any attributes to indicate that these utterances are verbal interruptions or interjections. Once they are encoded as <u> they are not distinguishable from other speech encoded as <u> unless they are splitting utterances. In this case the guidelines suggest using the @next attribute on the rst part of the split utterance and the @prev attribute on the second part. However, not all interjections or interruptions split a speech; they can also occur at either end, where there are no means to distinguish them from other utterances.

31
As for nonverbal interruptions, Parla-CLARIN suggests using <note> for transcribers' comments in general. For more specic purposes, the following elements from the transcription of speech module can be used: • <vocal> marks any vocalized but not necessarily lexical phenomenon, e.g., laughter, sounds of (dis)agreement from the benches etc.
• <kinesic> marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc. Parliamentary Records in the form of transcribers' notes. In this section, we will describe how these notes can be encoded in order to facilitate quantitative computational analysis and computeraided studies of parliamentary interruptions.

33
As a general rule, we followed the Parla-CLARIN recommendations (Erjavec and Pančur 2022), adding more detailed encodings as necessary, especially through attributes. As we have seen in section 2.1, the Austrian records provide some of the most detailed descriptions of interruptions.
Designing an encoding scheme with the Austrian Parliamentary Records in mind, we will oer an encoding suggestion that is ne-grained enough to deal with their level of detail regarding interruptions. That same scheme, because it relies upon the Parla-CLARIN recommendations, can be used to encode records containing less-detailed descriptions.

34
To organize these attributes, further described below, we used SKOS vocabularies. We created two concept schemes: one for utterances with the prex ucat: (which stands for "utterance categories") and one for notes with the prex ncat: (for "note categories"). The denition of the prex is included in a <prefixDef> element in the <encodingDesc> and points to the SKOS vocabulary. The values of @ana are dened in a scope note. In the following sections we will describe the encodings for verbal interruptions and nonverbal interruptions.

Encoding Verbal Interruptions 35
We encoded verbal interruptions where the wording is recorded as utterances <u>; we encoded verbal interruptions where the wording is not recorded like nonverbal interruptions (see section 4.2 and example 6), as notes. In order to distinguish verbal interruptions encoded as utterances from the authorized speeches, which are also encoded as utterances, we used the attribute @ana with the value "ucat:unauthorized" for interjections (see example 5) and the value "ucat:regular" for authorized speeches. Interjections frequently interrupt speeches and, therefore, we used the attributes @prev and @next to connect parts of interrupted speeches. This practice is also suggested by the Parla-CLARIN recommendations. We also transformed verbal interruptions by a collective into utterances encoded with <u>. The earlier example (Ruf bei den Grünen: Freudenau gibt's nicht! -Ruf beim BZÖ: Was ist mit der Freudenau?) was split up into two utterances. Usually the @who attribute is associated with a specic individual. Since the interruption at hand is by a collective, no single speaker is specied. To handle such cases, we introduced <personGrp>s for the collectives in each le header, to which the @who attribute can point (e.g., "#SPEAKER_COLLECTIVE_Grünen", "#SPEAKER_COLLECTIVE_BZÖ", and analogously for any other party). In our encoding proposal, we not only encode the utterance itself, as for example Truan (2016Truan ( , 2019a) did in de-parl, but we also encode the associated description in the parliamentary records. In the example above there are the phrases "Ruf bei den Grünen:" and "Ruf beim BZÖ," which denote interjections from the Green party and from the BZÖ party. These descriptions are encoded as <note> with @type value "comment" and @ana value "ncat:interjection_speaker_collective". An example of the encoding can be seen in example 5. For interjections where the wording is recorded by the stenographer but which are anonymous, no @who attribute is assigned.
Example 5. Encoding of verbal collective interruptions.

Encoding Nonverbal Interruptions 36
As described in the Parla-CLARIN recommendations, transcribers' comments are encoded as <note>. We followed these recommendations. Since transcribers' comments may cover more than just interruptions (see section 2.2) and also because other information is commonly encoded as <note> we used the following values for @type: "speaker", "comment", and "time" (see also Erjavec and Pančur 2022). The notes of @type="comment" were further subcategorized with the attribute @ana, which can have the following values to describe a nonverbal interruption: "ncat:applause", "ncat:bell", or "ncat:laughter" (see example 6). Moreover, descriptions related to verbal or nonverbal interruptions can be encoded as <note> of @type value "comment" with the following values for @ana: "ncat:direction" to describe if the interrupter is facing toward a specic person or party group; "ncat:nonverbal_hands" to denote that the interrupter is holding something up like a sign, document, newspaper article, or picture; this encoding may be further combined with "ncat:inscription" in cases where the sign's content is reported; and, nally, "ncat:nonverbal" for other nonverbal phenomena that are not covered by the values specied above.
Furthermore, as mentioned in section 4.1, verbal interruptions where the wording is not recorded are also encoded as <note> using an @ana value of "ncat:interjection" (see example 7). Here are two examples of encodings for nonverbal interruptions: Example 6. Example of the encoding of a nonverbal interruption.
<note ana="ncat:interjection" type="comment"> (weitere Zwischenrufe bei der ÖVP) </note> 38 It might seem inconsistent that we use both @type and @ana and do not choose one or the other.
There are two main reasons for this. First, @type is also used in the Parla-CLARIN recommendations and we wanted to be as compatible as possible with them in order to do comparative studies with other corpora that are encoded according to the Parla-CLARIN recommendations. Second, the values for @type, like "speaker", "comment", and "time", are less interpretative and can be inferred from the text, whereas the values we used for @ana are more interpretative. Therefore we use both @type and @ana.

Conclusions 39
Parliamentary data is a comprehensive and unique resource for many dierent elds in the humanities and social sciences. In this paper we have shown that research on parliamentary interruptions is one of the areas that can benet from the growing availability of machine-readable and annotated parliamentary corpora especially regarding computer-based quantitative analysis.

40
In this paper, I have shown that although interruptions are clearly marked in the original parliamentary records, and are easily recognizable using close reading methods, they are,