Materiality of TEI Encoding and Decoding: An Analysis of the Western European Union Archives on Armament Policy

By combining traditional historical enquiry with TEI XML encoding and decoding in a corpus analysis phase

emphasis on the importing of the encoded data into the corpus analysis platform, the dierent types of analysis performed, and a discussion of results.Section 5 concludes the paper with a review of both the benets and limitations of the applied methods and the materiality underlying the encoding-decoding mechanism.

Description of the Collection
The corpus (WEU-Diplo) chosen for TEI encoding represents a selection from the Archives nationales de Luxembourg 1 of institutional documents concerning armament production and standardization, and armament control within the WEU, 2 from 1954 to 1982.TEI XML has been considered an appropriate format both for building a scholarly online edition and for enabling corpus analysis.The general workow was conceived with the aim of (partial) re-usability, albeit with some project-specic adaptations and readjustment, in order to support a variety of projects and document types in European integration history (primary/secondary sources: text, image, audio, video, and their transcriptions).As a matter of principle, an alternation of manual and automatic sequential processing has been applied to the corpus in such a way that, independently of the manual interventions, no information should be lost if it is needed to regenerate a certain state of the corpus in subsequent automatic phases (e.g., via XSLT).
The rst criterion for the selection of the documents was their relevance to a specic research question: what were the French and British positions on the major defense and security matters within WEU and, as subsidiary questions, the identication of the main defense and security matters discussed within the Council of the WEU and the importance placed on the WEU by the two member states, France and the United Kingdom, in their diplomatic strategy between 1954 and 1982. 3The choice of this case study was inuenced by the fact that these two states played a key role in the birth, development, and organization of the WEU, as evidenced in the primary and secondary sources consulted.The idea was that by analyzing these issues we can also shed light on the organization's contributions to European defense under the leadership (or lack thereof) of France and the UK.
Other criteria inuencing document selection were to obtain a balance among control, standardization, and production, together with a balanced number of documents across the decades, as well as taking into account that some documents are a logical follow-up of others (e.g., documents mentioning recommendations).Another criterion for selection, of secondary importance, was the actual form of the documents, since the research was also intended to evaluate the accuracy of OCR (optical character recognition) processing for dierent types of layout (title or content page), paper quality, and legibility of typewritten or handwritten characters and of particular markings as stamps.
The selected sample contained 127 documents, 60 in English and 67 in French.For the rst phase of the project, 55 documents in French were retained (a total of 290 pages) because of their importance for subsequent corpus analysis 4 and publication.The majority of the documents were notes from the Secretary-General or Secrétaire Général (46, of which 37 were encoded because of their relevance to the research question), followed by minutes of meetings of the WEU Council or the Standing Armaments Committee (SAC) or of the working party on production and standardization of armament of the Interim Commission (15, all of which have been encoded).The sample also included 2 memoranda (both encoded) and 2 studies (1 of which has been encoded).
The linguistic aspect was also considered.Although, given the availability of time and resources, the French version was prioritized for encoding (the English processing being planned for a later stage), in general, the chosen documents existed in both French and English versions, and when not mentioned as original language, the French documents were exact translations of the English ones or had the same status, since the documents were produced in both languages.The provider of the translations was the WEU itself (its daily work being undertaken in both languages) and, therefore, a source of ocial translations.
From the 55 French documents selected for encoding, 5 had no English equivalent (3 notes, 1 study, and 1 memorandum); the remainder (all the meeting minutes and remaining categories) were available in both languages.More precisely, there were 16 documents mentioning the original was in French (of which 1 had no English equivalent), 12 with the original indicated as being in English, 10 mentioning the original both in French and English, 11 with no indication of the original language but a comparison between the French and English documents showed a similar structure and content, 4 available only in French bearing no indication of the original language, and 2 with an ambiguous marker (1 indicating French and English as original in the French version while the corresponding English document mentioned English as the original language, the other bearing no mention of the original in the French version but indicating English as the original in the corresponding English document).

The Paper Archive 10
As a general rule, documents from the Secretary-General all exist in both French and English.In the nearly 400 folders consulted, there were very few exceptions; only occasionally were documents published in just one language.The French version was printed on blue paper and the English on white paper.Internal documents or notes from the Agency for the Control of Armaments (ACA) were only published in French, on very thin ("tracing") paper.The notes and minutes were formal documents distributed to all the delegations of the Member States.
The research aim was to gather a set of representative documents that expressed the French and British positions within the WEU's dierent bodies, on dierent topics-the exploration of the design, production, and control of armament being only one of them.For this purpose, we used the WEU's collection database, held in the Archives nationales de Luxembourg, and consulted several sections and collections including: (1) Interim Period; (2) Brussels Treaty Organisation (BTO); (3) 1954-87 within the Secretariat-General/Council's archives; (4) Armament Bodies-Agency for the Control of Armaments (ACA) and Standing Armaments Committee (SAC).Other collections such as those relating to the military bodies or WEU operations were considered too recent to be consulted or still held a classied status.The documents were selected primarily based on their themes, strictly following the thirty-year rule.Each collection, comprised of several folders, with a "che" indicating the name of the section and collection, title of the folder, security classication such as WEU or NATO (NATO classied documents were not available), the period of time covered in the folder, the reference, and either keywords or a small summary of the main questions, although sometimes this was quite general.Once the folders or boxes were located in the WEU collection, they were consulted in situ and a selection of several documents was made according to the general theme armament.Before the digitization, a closer reading and nal selection was performed.

Paper to Electronic Text
The initial documents were typewritten materials from the WEU archives.The transformation into electronic text necessitated the use of document scanning, OCR, manual post-processing error correction, conversion of the resulting styled Microsoft Word les to TEI XML P5 via OxGarage, 5 and further XSLT transformation and enrichment using the oXygen XML Editor and GATE (General Architecture for Text Engineering) for NER (Named Entity Recognition).To prepare the digital documents for publication on the Web, further processing was carried out, to facilitate visualization and navigation in the browser at the document and collection level.
Next we identied, for each document, principal metadata and semantic elements necessary for encoding, as well as the form of encoding required for the computational linguistic analysis in order to answer the main research question.The metadata included elements such as (1) the author of the document: for the majority of documents the author is a collective entity (institution), except for rare cases-internal documents-where the author is an individual; (2) the date on which the document was distributed or produced, such as the date of the meeting; (3) the location (generally London, where the Council's Secretariat-General was located, or Paris, home of the ACA and SAC headquarters); (4) the title; (5) the version (whether or not it is a nal version); (6) the language of the document with the mention of the original, when present; (7)   the classication: most of the documents were classied as condential, secret, or even top secret, according to the degree of sensitivity of the information.Likewise, documents of this type also had a copy number (element 8), since they often contained military and strategic information and were only distributed to a limited (sometimes very small) number of people, generally ocials at national or institutional level.The organization had its own system of codes (references) for each document, which varied depending on the institution or the type of document. 6The document reference (element 9) only partially identies a document, since the same code was sometimes used for several documents, particularly for minutes of meetings which were divided into thematic sections and incorporated into dierent folders.The folder code (element 10) is therefore an important means of identifying the theme and even the institution within the WEU. 7rder to address the research question, the main aim of the TEI XML encoding was to identify the speakers (element 11) in the various documents and the views (element 12) that can be attributed to them, whether directly or indirectly. 8The representatives (ministers, parliamentarians, A semi-automatic NER processing involved the identication of other elements (13) in the texts, intended to be considered as whole units in the analysis, such as dates and the names of places, people, oce positions, organizations, and bodies (Western European Union, the North Atlantic Treaty Organisation, the Standing Armaments Committee, and the Agency for the Control of Armaments), the event to which the document refers, as well as any other events mentioned. 10Finally, a series of "products" associated with armament were also encoded, for example Mirage, short-haul transport aircraft, light tank, Pluton, etc.

Encoding
Although other types of elements were annotated in the corpus (metadata, e.g., title, author, availability date, origin place, and condentiality status; and structure, e.g., headers, footers, sections, paragraphs, and line breaks), the paper will focus on the content-related encoding-that is, speakers and their discourse-along with the above-mentioned categories of named entities, considered from the perspective of the subsequent decoding phase (analysis and interpretation).
The TEI P5 specications 11 were applied, with no need for adding new classes, elements, or attributes.

Participants 18
The identication of the agents responsible for the production of texts represents an important step in the analysis of institutional discourse, irrespective of the level of this analysis ERHARD; Belgique: M. A. de STAERCKE …"). 12 For other cases, external knowledge or prior research was needed in order to be able to assign a role to the speaker ("M.Selwyn LLOYD déclare que …" 13 as a UK representative) or to identify the contributors to the discourse (the France representative, Georoy Chodron de Courcel, and the ACA representative). 14xample 1. Generic list of participants.WEU-Diplo: CR/58/8.• an identier (unique for the corpus) provided only when it was possible to refer to a particular person.
Since the conversion to TEI XML and the semantic enrichment of the corpus supposed both automatic transformation (via XSLT) and manual annotations, in a preliminary form for all of the documents, we generated a list of all generic labels for the country/institution representatives (<profileDesc> section of the <teiHeader>) (example 1).
Then the list was manually customized according to the particularity of each document and the specic "actors" involved in the production of the text.Example 2 illustrates a case where only three representatives were retained and further details were provided on their identity.For other situations, the generic label was enough (when the identity was not required or not available).

Discourse
In order to be able to analyze the discourse of dierent participants within the WEU's policy on armament issues, we have applied a "kaleidoscopic" approach to the corpus.More precisely, discrete fragments were identied and manually annotated inside each document, with reference to the speaker and his or her role as an institution or country representative manifested in the text (example 2).Given its exibility of use (either inside a paragraph or encompassing several paragraphs), the <said> tag was chosen for delimiting the dierent pieces of discourse corresponding to a particular agent (example 3).The choice also facilitated assembling these pieces of information for analysis in the decoding phase (section 4).
<p><name type="person">M.Faure</name> souligne avec force que <said direct="false" ana="#oral_disc" who="#faure" corresp="#repres_fr">le succès de l'entreprise<lb/>dépend de la volonté politique des gouvernements d'assurer une<lb/>coopération effective.Les propositions de <name type="person">M.von Brentano</name><w>cons<lb rendition="#hyphen_before" break="no"/>tituant</w> un pas important dans cette direction et il s'y rallie.</said></p> The @corresp and @who attributes were used in order to link the marked-up fragment with its producer dened in the <particDesc> unit.Additional attributes (@ana and @direct) were needed to dierentiate situations referring either to transcribed oral (direct or indirect speech) or to what we considered written discourse, such as the text of notes (example 4) or studies usually resulting from internal meetings or discussions among institutional bodies (sometimes including "narrative" prose 15 or arguments not necessarily coming from an oral account), and then circulated for further discussion/approval within the WEU.
<said ana="#written_disc" corresp="#repres_aca"> <p>Un an après, dans des conditions analogues, le <name type="person">Ministre<lb/>LUNS</name> était amené à prendre comme président une position dans<lb/>le même sens.</p>[…]</said> A particular occurrence of "discourse within a discourse" is presented below (example 5): a direct citation of the oral intervention of a WEU Council representative, M. Heath, from a previous meeting of the WEU Assembly, within the written account of the ACA. 17The project also included the identication and annotation in the text of named entities intended for later use (such as indexing and linking to an authorities list) or as a prerequisite of the corpus analysis phase.This identication and annotation allows multiword expressions to be counted as single units of a given type (e.g., organization) in the analysis, rather than as separate words (for instance, Union de l'Europe Occidentale instead of Union, de, l', Europe, Occidentale).

27
The NER task involved a semi-automatic approach using GATE (French NE, Gazetteer, and Gazetteer List Collector plugins) 19 for the detection of seven classes of entities: persons, places, organizations, events, dates, products, and functions (ocial positions).Manual corrections were applied, when necessary, in a post-processing phase.Since the GATE XML output format was dierent from TEI, an XSLT dedicated stylesheet was created for the transformation of the GATE tags (such as <Person>, <Location>, <Organization>, and <Date>) into corresponding TEI tags (<name> with the attribute @type, and <date>, respectively).A few examples of <name type="person">, <name type="org"> are presented in the previous examples.Further transformation was necessary during the importing of the annotated corpus into the software for textual analysis (see section 4.1).

Decoding
The so called "decoding" phase, for corpus analysis and interpretation, consisted of importing and processing the TEI XML annotated documents within a specialized platform, TXM (Heiden 2010), 20 that allows the analysis of a large body of texts by means of lexicometrical and statistical methods.
The previous encoding served as a basis for discerning or grouping together dierent types of semantic or structural elements needed for analysis.

Importing
Since TXM supports XSLT transformation at the moment of import (XML/w+CSV option), an XSLT stylesheet was created to accommodate particular formats or conversions required by the software.Therefore, it was not necessary to store dierent versions of the corpus, one for TXM analysis, the other for Web publication.
First, a lowercase conversion 21 was provided for consistency reasons relating to the varying ways of capitalizing (e.g., Comité militaire de standardisation, Comité militaire de Standardisation, Comité Militaire de Standardisation).Second, for the named entities to be interpreted as a whole instead of as separate units, a supplementary conversion was needed, all the <name> tags being converted to <w> tags each denoting a "word" of a given type (e.g., person or organization).The special case of hyphenated words where the hyphen appears at a line break (see example 3) had to be considered in an earlier transformation, before import, so that the whole word and not its parts could be counted in the analysis (in the example, constituant instead of cons and tituant as the software would treat it without a <w> tag).
Part-of-speech tagging via the TreeTagger module integrated into TXM was also applied to the corpus at import in order to allow lemma and part-of-speech statistics and queries.

Analysis
The annotated corpus (only the content inside <text> tags, without metadata) contained 6,512 items (unique words) with 76,558 occurrences in the text. 22

Partitioning
Given the identication and annotation of dierent semantic and structural elements in the encoding phase, TXM allows the creation of partitions (Textométrie 2014, section Construire une partition) by selecting a Structure unit and a corresponding Property (i.e., an XML element and one of its attributes) from the list of structural units and properties recognized by the software for the imported corpus.
For instance, as fragments of discourse spread throughout the documents were assigned to particular countries or institution representatives (section 3.2), a partition was created based on the <said> element and its @corresp attribute.Figure 1 shows the dimensions of the representatives' discourse partition, in number of words (occurrences).One can observe that for the selection of documents, there is a "dominance" of the French (14,338 occurrences) and WEU Council (11,276 occurrences) representatives' discourse, the categories with the lowest size being those corresponding to the French-English delegation (185) and WEU Assembly (72).
Other types of partitions were also created and analyzed: by speaker, based on the <said> element and its @who attribute, with dimensions varying from 41 (Brindeau) to 6,141 occurrences (Parodi); by type of discourse, using <said> and @ana, and counting 22,780 occurrences of oral discourse versus 32,355 occurrences of written dicourse; and by subtype of institutional documents, taking into account the <text> element and its @subtype attribute, with occurrences numbering between 2,684 for the study category and 38,565 for the minutes.

Specificities
The use of the Specicities feature (Textométrie 2014, section Spécicités) allows a comparison of the vocabularies: what is "specic" (either as "overuse" or "decit") in a part of a partition, as compared with the parent corpus and a certain threshold. 23The feature is based on a probabilistic model (Lafon 1980) used in TXM to compute a log10 specicity score of a word property (e.g., word form, lemma, or part of speech) for a given part.In the analysis of the WEU-Diplo corpus, it was assumed that the specicity score may draw attention to forms "specic" to the discourse of dierent country/institutional representatives as compared with the whole.Figure 2 shows an extract from the specicities table computed for the lemma property and the said_corresp partition, sorted by increasing order of the specicity score corresponding to the respres_aca part.Each line in the table corresponds to a value of the chosen property (lemma or a group of lemmas) displayed in the Units column.The second column indicates the frequencies or number of occurrences of the property values in the corpus (with a total T).The other columns contain the number of occurrences of the property values in a part (cumulated by t) and are followed by a corresponding logarithmic score of specicity that can be positive or negative.The table may be sorted in increasing or decreasing order, according to a given column.In the case of an increasing score (as presented in the gure for repres_aca), the rst property values displayed (e.g., matériel, industrie(l)) 24 indicate a decit in use as compared to the whole corpus and the last ones displayed indicate an overuse, while the values with scores around 0 (inside a certain interval) are considered "trivial" (i.e., the specicity measure may not be pertinent for them).
Before creating a specicities   The result is not very surprising given the role of the ACA: it was created to control the the repres_uk's discourse is not surprising either, considering that the United Kingdom, although interested in the topic, was not primarily concerned with this issue.Likewise, expected results are revealed for the positive specicity scores for fabrication-production and harmonization-normalization and the negative specicity scores for control-inspection-verication in the repres_fr's discourse.
The former are most probably linked to the selection of documents and, in particular, to the French memorandum presenting the armament agency which focuses on fabrication-production and harmonization-normalization (PWG/A/2). 27Experiments excluding this document 28 from the corpus analysis have conrmed the hypothesis (with scores for fabrication and harmonization groups being lowered to the positive banality area).However, the negative specicity score for the control group in repres_fr's discourse persisted 29 after the exclusion of the document from the analysis and may be associated either with the assertion of France's resistance to submitting its stocks to the ACA's controls or with an underrepresentation of the control topic in the selection of documents.

42
A more general, comparative analysis can be provided for the positive and negative specic forms appearing in the representatives' discourse, as inspired by the synthesis tables proposed by Bergounioux et al. (1981) and Bonnafous (1981).Table 1 synthesizes the results for seven participants, a selection of shared lemmas or groups considered of interest for the study, and a set of specicities scores (in brackets) above and under the positive and negative banality thresholds.

43
The high positive specicity score for coopération/coopérer 30 in the repres_cons_weu's discourse can be explained by the role of arbiter and conciliator of the WEU Council, which was intended to promote cooperation among its members in all the domains.A closer look at the contexts where this group appears shows recurrent, general patterns like coopération en matière d'armements, coopération des pays européens en matière d'armements, and cooperation européenne en matière d'armements 31 occurring both in the repres_cons_weu and repres_deleg_fr's discourse (which also displays a positive score but with a lower value) or more specic occurrences, such as coopération intergouvernementale en matière de recherche, coopération intergouvernementale en matière d'études 32 (repres_cons_weu), coopération en matière de missiles, or coopération européenne dans le domaine aéronautique 33 (repres_deleg_fr).
The highest positive score in the table (nabel for repres_sac) may be explained by the frequent mentions of the organization during the meetings of the SAC (acting as a link between it and the United Kingdom, not a member of Finabel), 34 as well as by the adherence of Great Britain to this organization referred to in the repres_sac's discourse (see also section 4.2.2.4).
The second highest positive and negative scores (g.e.i.p.: 35 for repres_cons_weu and repres_fr, respectively) are less clear but one can observe that the term tends to co-occur with the name of another organization (c.d.n.a.) 36 in the repres_cons_weu's discourse, while being completely absent (0 occurrences) from the French representatives' discourse.The main reason seems to be the substance of the discussions linked to the competences of dierent organizations about standardization.Further examination is also needed to interpret the decit of the group controlinspection-verication reected by negative specicity scores for repres_uk and repres_fr (0 and 7 occurrences, respectively, out of a total of 85) that can be determined, as already mentioned, by the selection of documents, potentially more centered on the production and standardization of armaments than on their control.

Lexical Profile
Another type of analysis resulting from the encoding was an exploration of the combination of lemma and part-of-speech (POS) tagging and specicity measures, which may be related to the socalled "lexical prole" (Guyard 1981) of a participant in the institutional discourse.It consists of a list of relevant lemmas (with positive specicities above the banality threshold) and corresponding to certain parts of speech.Table 2 illustrates this type of prole for two representatives (France, United Kingdom) as individuals and three categories of POS (noun, adjective, and verb), obtained by taking into account specicity scores for the said_who partition.For the adjectives, we can point out an opposition on the axis commun versus bilatéral, multilatéral 37 manifested in contexts such as programme (régional), intérêt, défense, études, fonds commun(e)(s) 38 (Chauvel) versus base, discussion, arrangements, comités directeurs bilatéra(l)(le)(ux) 39 (Lloyd).The rst prole is probably less clearly dened, but for the second one, the association of the lemmas provided for all three categories seems to convey a certain sense of action towards cooperation and dialogue.
Similar specicities-based analyses (not described here in detail) were performed for other categories of word properties: POS, for instance, which indicates a high positive specicity score (17.95) for the conditional verbal form in the repres_fr discourse (said_corresp partition); or other partitions, such as said_ana or text_subtype, that take into account the type of annotated discourse (oral/written) or the subtype of the document (minutes, note, study, or memorandum).

Queries, Concordances, and Co-occurrences
The analysis of specicities was combined with other methods, both of a quantitative and qualitative nature, for examining the documents, for instance, by querying for specic word properties (lemmas, word forms, POS, or combinations of these elements) and by means of the concordances and co-occurrences features (Textométrie 2014, sections Construire une concordance, Cooccurrences, Lexique et Index).Figure 4 presents the results of a query for "nabel" and the corresponding list of concordances that displays a left and right context, the le, and the representative's discourse containing the word (i.e., repres_sac with the highest positive specicity score for this unit, as shown in table 1).As illustrated in the previous section and in gures 4, 5, and 6, Great Britain and Finabel often co-occur in the discourse.This is probably (1) because of the particular attention of the SAC to informing the British representatives about the activities of the organization and (2) because the adherence of the United Kingdom to Finabel and its consequences is often brought into discussion.
In order to avoid misinterpreting the specicities or to conrm some of the hypotheses suggested by this method, we often needed to combine it with co-occurrences, concordances, and visualization of the document.Therefore, co-occurrences can provide a quantitative perspective on the co-presence of some words, lemmas, or entities in the context of a given target, which in combination with concordances and document visualization may support qualitative analysis and interpretation.

Results Discussion
The TEI XML encoding and TXM analysis related to the research questions on arms design, production, and control within the WEU have enabled a set of more or less predictable results, the latter needing further examination.Among the former, we can mention those referring to the SAC and ACA roles.Arms production and control was a major part of WEU's work, despite its somewhat mixed record in this area.Protocol IV of the Modied Brussels Treaty established the Brussels Treaty.The SAC was subsequently created on May 7, 1955.Although the United Kingdom never opposed the SAC's activities, it actively attempted to restrict its role, both because of its resistance to any notion of supranationality 42 and because it was convinced that NATO and the organizations related to it were more eective and better positioned to achieve standardization in the eld of armaments. 43The interpretation of the less predictable results is not straightforward, since they may have been determined by an under-or overrepresentation of certain elements in the discourse, based on the selection of documents.The same could be said about the negative specicity score for the control group in repres_fr's, but this nding is also likely to be associated with the assertion of France's resistance to submitting its nuclear stocks to the ACA's controls and the need to avoid making statements on the subject.Since the size of the corpus was relatively small, and not all the information for the documents on the selected topic and their types in the WEU archive was available, extrapolations about the TXM probabilistic model and the observed linguistic patterns at a larger scale than the pilot sample should be avoided at this stage.
The TEI XML combined with the TXM analysis tools can also reveal inconsistencies which may draw attention to the need for further encoding and testing additional documents.On the other hand, it is also important to take into consideration how far (or how well) the researcher/user knows the content of the documents, as a lack of context can sometimes lead to misinterpretation.

Conclusion
The goal of the pilot project has been to address research questions mainly related to the French and British positions on the topic of arms design, production, and control within the WEU from 1954 to 1982.In particular, we were interested in combining traditional, historical inquiry with TEI XML encoding and decoding in a corpus analysis phase for the identication and interpretation of linguistic patterns in the discourse of dierent countries and institutional representatives on armaments issues.
Given the small scale of the corpus used in the project and the fact that it may not be a suciently representative sample means that a generalization of the results should not be performed without extended testing on an additional set of documents, conducting further evaluation of the probabilistic model and an estimation of the sample as compared to the collection from which it was extracted.From a methodological perspective, however, the TEI XML encoding and decoding experiments have proved that the approach can assist qualitative and quantitative methods for the study of historical and discursive phenomena in a collection of institutional documents and a chosen theme.More precisely, the encoding may be quite helpful for researchers, even if they have no previous knowledge of the content of the documents.That is to say, if meaningful thresholds for analysis are set, one can nd out the main topic of the documents, and the tags may help in discovering who the speakers are and the main "orientation" of their speech (i.e., their position) in terms of what is specic to their discourse.However, in order to enhance the reading of the specicities, combination with other methods, such as concordances, co-occurrences, and document visualization, is required.

57
Going back to the initial idea of considering a new link in the decoding-encoding chain suggested by Burnard, we see the TEI encoding as adding a "material" layer to the original text, which further supports both machine and human interpretation (decoding).In a larger sense, despite the inherent bias and limitations related to the selection and the number of documents used in the study, we have attempted to prove the "materiality" of the TEI encoding and decoding as a basis for hermeneutic inquiry in the quest for producing knowledge via digital "instruments," as Capurro and Idhe have previously stated in their accounts on a digital and material hermeneutics.3 Eric Rémacle referred to the "instrumentalisation" of WEU by these two countries, which he identied as "successive leader states" within the organization.See Rémacle 2009, 197.
4 Related to the specic research question, that is, the study of linguistic patterns in the discourse of dierent country/institutional representatives as described in section 4.
6 For example, CR(58)8 = CR (for compte-rendu, or minutes, of a Council meeting), the year in brackets, then the document number; PWG/CR/4 = PWG (we think that PWG stands for Production Working Group), CR (for compte-rendu) and the number of the meeting (the fourth meeting); C(80)40-we have noticed that the letter C was generally used for the nal version of the reply to a recommendation or written question, followed by the date in brackets, although the meaning of ambassadors, and experts) from France and the United Kingdom were systematically identied, and, depending on their relevance to the research, the contributions of the German representatives were also encoded.Some examples of British speakers' names are Christopher Steel, Samuel Hood, and Selwyn Lloyd; French speakers' names included Alexandre Parodi, Jean Chauvel, and Georoy Chodron de Courcel.A generic nomenclature was developed to maintain consistency among the various speakers and to deal with cases when the speaker was not named: repres_fr (French representative), repres_uk (United Kingdom representative), repres_frg (German representative), repres_frg_fr_it (contribution on behalf of the German, French, and Italian representatives), repres_deleg_fr (French delegation), repres_deleg_uk (United Kingdom delegation), 9 repres_cons_weu (representative of the WEU Council), repres_assb_weu (representative of the WEU Assembly), repres_aca (representative of the Agency for the Control of Armaments), respres_sac (representative of the Standing Armaments Committee) (see example 1).
According to the diagram, the ACA representative's vocabulary is characterized by a high positive specicity score for the groups control-inspection-verication and limitation-restriction, and by negative specicity scores for the groups fabrication-production, harmonization-normalization.

12
Thornborrow, Joanna Sarah.2002 Power Talk: Language and Interaction in Institutional Discourse.Harlow: Longman.Van Dijk, Teun A. 1993."Principles of critical discourse analysis."Discourse & Society 4(2): 249-83.Accessed March 6, 2016.http://www.discourses.org/OldArticles/Principles%20of%20critical%20discourse%20analysis.pdf.NOTES The exploitation of the WEU archives follows the decision C(11)05-Final of May 10, 2011, by the Permanent Council of the Western European Union, which appointed ANLux-the Archives nationales de Luxembourg-as the ocial depository of the WEU archives and gave the CVCE (http://cvce.eu/),a research and documentary center in European Integration Studies, the task of scientically exploiting these holdings, including their publication in any form.The WEU was created on the October 23, 1954, with the signing of the "Modied Brussels Treaty" by France, Belgium, Luxembourg, the Netherlands, the United Kingdom-all ve previously members of the Western Union created in 1948-and the Federal Republic of Germany invited to join the new organization.It became the rst European Defence Organisation and its missions covered the settling of the problem of the Saar, the monitoring of German rearmament, and the promotion of the defense of Western Europe.The Treaty of Brussels contained a mutual defense clause in Article V.For more information, please consult http://www.cvce.eu/en/recherche/unitcontent/-/unit/72d9869d-72-493e-a0e3-bedb3e671faa/1c06c877-402b-45d1-a126-792e99cf3fc3.
table, a set of basic operations (merge, delete, export, import) are allowed via the Lexical Table feature (Textométrie 2014, section Table lexicale).The groups of units presented in gures 2 and 3 were created using Lexical Table and the Merge Lines feature, in order to merge units considered to be close from a semantic point of view in the context, like, for instance, harmonisation/harmoniser-norme/normalisation-règle/règlement-standard/