Selected Papers from the 2013 TEI Conference ReMetCa : A Proposal for Integrating RDBMS and TEI-Verse

This paper describes the technical structure of the project ReMetCa (Repertorio Digital de Métrica Medieval Castellana), the rst online repertoire of Medieval Spanish metrics and poetry. ReMetCa is based on the combination of traditional metrical and poetic studies (rhythm and rhyme patterns) with digital humanities technology, TEI-XML integrated into a Relational Database Management System (RDBMS) through an XMLType eld, thus opening up the possibility of launching simultaneous searches and queries by using a searchable, user-friendly interface. The use of the TEI Verse module to tag metrical and poetic structures is especially important for two reasons: rst, because it lets us tag dierent kinds of poems with a variable metadata structure, making it possible to express a very high level of detail for poetry description that cannot be registered in a conventional database; and second, as it enables extensibility and adds more information to enrich the conceptual model, which is under constant development. Journal of the Text Encoding Initiative, Issue 8, 23/09/2015 Selected Papers from the 2013 TEI Conference ReMetCa: A Proposal for Integrating RDBMS and TEI-Verse 18

ReMetCa 1 is a computer-based metrical repertoire of Medieval Castilian poetry.It gathers all poetic witnesses, from the very beginnings of Spanish lyrics at the end of the twelfth century through the rich and varied poetic manifestations of the fifteenth and sixteenth century, the "Cancioneros." When complete, it will include over 10,000 texts and offer a systematic metrical analysis of each poem.ReMetCa is the first digital tool to analyze Medieval Spanish poetry.It enables users to carry out complex searches in its corpus, following the models of other digital resources in the Romance lyrical traditions, such as the Galician-Portuguese, Catalan, Italian, or Provençal.ReMetCa is not merely a metrical repertoire; it combines metrical schemes together with text analysis as well as data sheets with the main philological aspects that characterize the poems.One of its most important values is that it is a born-digital project designed to be interoperable with other existing poetry databases and digital repertoires, as conceptualized within the Megarep 2 project, in which Journal of the Text Encoding Initiative, Issue 8, 2015 Selected Papers from the 2013 TEI Conference Seláf (González-Blanco and Seláf 2014).The Spanish repertoire is conceived as an essential tool to complete the digital European poetic puzzle, enabling users to conduct powerful searches in many fields at the same time.
2. The Role of ReMetCa in the European History of Poetic Repertoires 2 ReMetCa belongs to the latest generation in a long tradition of metrical repertoires, which can be divided into three significant periods based on the different technologies used for their construction.The first stage, in which repertoires were published as printed books, started at the end of the nineteenth century. 3The second period started after the Second World War, when repertoires became computer-assisted. 4Technological advance has made it possible to create a third generation of repertoires, available online and searchable through the web, in which research time has been considerably reduced by richer search capabilities and the accessibility of online interfaces, especially compared with the complexity of printed indexes and lists of metrical schemes.The first online digital poetic repertoire was the RPHA: Répertoire de la Poésie hongroise ancienne jusqu'à 1600 (Horváth 1991-); Galician researchers created MedDB: Base de datos da Lírica profana galego-portuguesa (Brea et al. 1994-); Italian researchers digitalized BEdT: Bibliografia Elettronica dei Trovatori (Asperti et al.); the Nouveau Naetebus (Seláf), the Oxford Cantigas de Santa María Database (Parkinson), the Analecta Hymnica Digitalia (Rauner), and the Dutch Song Database (Grijp) all appeared later. 5Most of these repertoires are built on relational databases; some of them are open source, but the majority are built on proprietary software.Some of them were previously published as books and later as CDs, but now almost every project has developed an online version.The technical description of these resources is not the topic of this paper, but it will be studied in depth in a later stage of the project, as further possibilities for interoperability are analyzed.

The State of the art in Spain
Compared with the European tradition described above, the Spanish panorama looks weak, as we do not have a published poetic repertoire which gathers the metrical patterns of Medieval Castilian poetry (except for the book by Gómez-Bravo [1999], restricted to "Cancionero" poetry, which only covers the fifteenth century, but not the poems written during the thirteenth and fourteenth centuries), and, until now, there was no digital resource available.However, researchers are nowadays more conscious of the importance of metrical studies for the analysis and understanding of Medieval Spanish poetry, as has been recently shown. 6On the other hand, metrical studies have flourished thanks to the creation of specialized journals, such as Rhythmica.
Revista española de métrica comparada, 7 available online, and Ars Metrica, 8 born as an electronic journal whose scientific committee is composed of researchers from universities and centers in different countries.
At this point, ReMetCa's aim is to fill the Spanish gap with information in order to build a metrical repertoire within this complex puzzle of digital European poetry, but the objective of the project in its future stages is to design a system-to be based on the TEI model, but integrating linked data technologies-which allows interoperability, information exchange, and combined searches across the different repertoires and databases.

The Conceptual Schema of ReMetCa
The description of ReMetCa's structure starts with the definition of a conceptual model based on the domain of medieval metrics.Its global structure is synthesized into a relational database, which will be exportable in the future to full TEI in order to facilitate data interchange and interoperability with other projects and systems.For the moment, TEI is restricted to tagging metrical and poetic phenomena by using the TEI Verse module in only one of the database fields.
Although there are many possible ways to structure such a database, our proposed model takes into account the elements we need to describe and analyze our field of study.For the graphical representation we use a light version of UML (Unified Modeling Language).Entity names and their attributes are shown as boxes, and these are connected by lines which represent the existing relationships among them, together with numbers to show cardinality.The relationships established among entities can be "one-to-many" or "many-to-many."In the first category you can find the relationship between obraCompleta (literary work) and refHismetca (Classification System).As one refHismetca may be shown in many different literary works, the solution given in the ER model is to define a table for refHismetca with at least two attributes: one to identify the refHismetca, and the other to name it.For the entity obraCompleta (literary work), our database has a field with a foreign key which points to the id field of the genre table [refHismetca].The second type of relationship, many-to-many, is larger.It is necessary to create a third table to contain each of them.This happens, for example, in order to express the relationship between topics and poems, whose third entity consists of as many tuples as there are topics assigned to each poem.Every tuple has at least two attributes to identify poema (poem) and tema (topic).A similar situation happens with the entity bibliografía (bibliography), whose third entity is called referencias.

8
The structure described above covers the majority of entities in our relational database model.Some problems arise when entering entities with a complex textual description, such as the terms used to describe parts of the literary work.Poema (poem), estrofa (stanza), and verso (line) are the components that define the hierarchical structure of each literary work analyzed.Applying the previously described E-R model would drive us into a complex model of relationships among those components which are very difficult to represent in a database.These relationships are created through composition, which means that the entity poema is composed of one or more entities estrofa, and the entity estrofa consists of one or more entities verso.The relations of multiplicity among the components vary from one work to another, depending on the number of stanzas and lines in each entity.For example, a sonnet shows a multiplicity of 1 for each stanza contained by it (i.e., each stanza is part of only one sonnet), and the stanza has a multiplicity of 4 in relation to the sonnet (the sonnet contains four stanzas).The problem of representing this composition in an E-R model is that the representation is data-centered, and it does not work for poem, line, and stanza, as these components need to be analyzed as textual items.It is necessary to insert metadata into these textual items to show their compositional relation and to add other relevant information to them.The E-R model is inappropriate for this purpose due to its centerbased structure, with the entities of poem, line, and stanza in the middle of its referential domain of study.
The difficulties described above remain in each stage of the project and reappear in the programming phase and when designing the different web components, both in the administrative part of the project (for aspects related to the creation, modification, and deletion of registers) and in its querying module.Also, the structure of each of the three components mentioned (poem, stanza, and line) differs significantly depending on the literary work, as there are different kinds of poems, stanzas, and lines.For example, some of the poems are monostrophic, whereas others are composed of several different stanzas.Lines may be rhymed or unrhymed, may be inserted into a strophic structure or not, may form part of a series of compositions, or may constitute a literary composition themselves.From a technical point of view, all of them could be considered "semi-structured data." 6.The Addition of XML Tagging: The Verse Module of TEI P5 10 This hierarchical and semi-structured pattern is quite ungainly for an E-R database model, which is not flexible enough to show hierarchical structures.For this reason, using an XML-based markup language, such as TEI, is the perfect solution to show all the complex relations and properties.As the ReMetCa model is built on a database which works very well in terms of structure to describe the main fields of our projects, TEI is just used for tagging entities and relationships that cannot be represented in the database structure because of the complexity and diversity of the contents of the texts included, in terms of metrical definitions and properties.We have selected only the tags of the Verse module of the TEI P5 Guidelines, as this module works perfectly to show the hierarchy and properties needed, which may vary depending on the type of poem.Here is an example: Example 1: Auto de la huida a Egipto, escena III, vv.36-44 <lg type="estrofa" subtype="redondilla" asonancia="consonante" met="8,8,8,8" rhyme="cddc"> <l n="36">Guía al hijo y a la m<rhyme label="c">adre</rhyme>,</l> <l n="37">guía al viejo pecad<rhyme label="d">or</rhyme>,</l> <l n="38">que se parte sin tem<rhyme label="d">or</rhyme></l> <l n="39">a donde manda Dios p<rhyme label="c">adre</rhyme>;</l> </lg> <lg type="estrofa" asonancia="consonante" met="8,8" rhyme="+b+a"> <note>estos dos versos son de vuelta</note> <l n="40">y pues al niño bend<rhyme label="b">ito</rhyme></l> <l n="41" rhyme="aste">y a nosotros tú sac<rhyme label="a">aste</rhyme>,</l> <lg type="estribillo" asonancia="consonante" met="8,8,8" rhyme="*a*b*b"> <l n="42">Ángel, tú que me mand<rhyme label="a">aste</rhyme></l> <l n="43">de Judea ir a Eg<rhyme label="b">ipto</rhyme>,</l> <l n="44">guíanos con el chiqu<rhyme label="b">ito</rhyme>.</l></lg> <!--lg are nested because "estribillo" belongs to the stanza but it needs to be marked as an indepent structure --> </lg> 11 We have designed a TEI customization based on the TEI Verse module and represented as an XML Schema (.xsd) which includes the following tags and attributes: • The stanza, marked with the <lg> tag, is our highest tagging level and bears the following attributes: ⚬ @type describes the function of the stanza in the poem and may have one of three values, "estrofa", "estribillo", or "cabeza", depending on its function: a normal stanza, a chorus, or the head position of the poem (which is usually repeated with variations).
⚬ @subtype contains the name of the stanza form (e.g., "soneto", "cuadernavia", "tirada", "romance", "zéjel"), provided it has a standardized name in the poetic tradition.The classification of the terms is gathered in a controlled vocabulary registered in the guidelines established by ReMetCa.If there is no known conventional name (which happens in many cases), the attribute may not appear.
⚬ @n contains the number of the stanza as per the print edition on which our digital editions are based.Since the transcriptions of the texts we use are not complete, it is necessary to number the lines in order to make references to other editions.
⚬ @met contains the metrical structure, based on the number of syllables in each line and on the number of lines, separated by commas.A stanza of four lines of octosyllables would be encoded as met="8,8,8,8".
⚬ @rhyme contains the rhyming structure of each stanza, represented by letters, as for example in rhyme="abba", to show that the first and fourth lines have the same rhymes and so do the second and third.
⚬ @asonancia is a new attribute which does not exist in the TEI Verse module and has been added to our schema.Although our philosophy is to respect and follow the TEI Guidelines (TEI Consortium 2013), we have felt obliged to introduce three new attributes.This one indicates the two possible values of the rhyming typology: "asonante" (which means that only vowel sounds are repeated at the end of two rhyming lines), and "consonante" (which means that every sound after the stressed syllable is repeated).The second added attribute is @unisonancia, which takes the values of either "unisonante" or "singular" and shows whether the same rhyme scheme is repeated in different stanzas (unisonante: abba, abba) or not (singular: abba, cddc).The third new attribute is @isometrismo, which takes the values of either "isométrico" or "heterométrico", which indicates whether all the descendant stanzas have the same number of syllables ("isométrico") or not ("heterométrico").
• The element "line," <l>, a child of <lg>, is a mixed element which contains the child element <rhyme> to mark the rhyming string of the line which is pronounced after the stressed vowel, and one attribute, @n, which marks the line number, following the same principles as those indicated by the same attribute described at <lg>.Selected Papers from the 2013 TEI Conference

Possibilities for Extending our TEI Schema
As the ReMetCa project was initially built as a relational model, the only TEI component that has been added to its data model is the Verse module.However, in the future there will be more TEI markup additions to reflect the need for gathering more information on the different manuscripts witnessing each poem.Our team is analyzing the possibility of replacing the manuscript table with the element <msDesc>.The reasons for this decision derive from the complexity of describing many different witnesses, each of them with complex relations to many collections, libraries, owners, and copyists.Following the E-R model would force us to multiply the number of tables in order to be able to reflect the complexity; therefore, using a model based on XML would ease our work by offering both exhaustiveness and flexibility, but this advantage is counterbalanced by the scale of changes that would be required in order to introduce XQuery functionality to the system, so each decision to change in this way has to be taken gradually and carefully.

The Physical Model
Once the logical model is established, the next task was to implement the selected schema into a database management system, what is known as the "physical model."(Silberschatz 2002: 40) As far as the selection of the system is concerned, native XML databases, such as eXist or BaseX, were discarded since we wanted to preserve the original relational structure of our logical system, and also because we considered it interesting to use an RBDMS model, seeing that most of the online repertoires are built on similar systems.The problem lay in choosing a combined system to integrate both types of information: E-R relational structures and text marked up with XML.
We compared two of the most popular E-R database systems: MySQL and Oracle9i XML.Both of them offered the possibility of adding an XMLType data column, thus offering the possibility of introducing hierarchical structures inside the relational model via the provision of XML nesting structures.The combination of both systems offers great advantages, such as the possibility of making combined queries with SQL and XPath languages. 9 This first trial model of ReMetCa, combining the MySQL RDBMS and TEI XML fields, is already working online and can be visited at http://ww.remetca.uned.es.To access the database, it is necessary to log in to "Área de trabajo" and then into "base de datos," using a username and password that can be requested from the same website.Eventually the search screen will be online, free and open with no password, but not yet, as we are still working on populating the database and designing the search engine.
During the next stages of the project, more TEI modules and elements will be added in order to complete some of the sections that are not yet satisfactorily described by the E-R model in the database, and also to make it possible to export all the database contents to full TEI.A further stage will consist of working on a linked-data model to allow interoperability among the different European repertoires based on a common ontology.

Figure
Figure 2: Logical Model

Figure
Figure 5: Index of verses