TEI4LdoD: Textual Encoding and Social Editing in Web 2.0 Environments

In this article we describe how textual encoding is used in our current project of constructing a digital archive of Fernando Pessoa’s Livro do Desassossego technical solution for TEI XML encoding that is responsive to dynamic changes over time. With our software architecture proposal for processing TEI markup, the Collaborative Digital Archive of the Book of Disquiet will be able to instantiate the cooperative and social editing functionalities of Web 2.0 environments.

engage with the dynamics of variation in authorial and editorial witnesses. They will be able to examine not only the dierences between autograph sources or between manuscripts and print editions, but also the dierences among the various print editions. 13 Besides using TEI XML encoding and programming to recreate the history of the authorial and editorial dynamics, the LdoD Archive also explores the simulative potential of the digital medium as a space for virtualizing the Book of Disquiet in ways that will enable users to experiment with the processes of editing and writing in relation to this work. Expert and non-expert users will be able to collaborate by making their own editions and by adding further textual fragments to the archive. Interventions can take two forms: selecting, ordering, and annotating fragments as part of a user-dened virtual edition; or selecting fragments or parts of fragments as textual seeds for creating variations and extensions based on LdoD as part of a user-dened virtual writing process.
This interactive feature of the archive is further enhanced by search and navigation functions that will allow a strong integration between the initial closed set of scholarly materials and the open virtual editing and writing additions.
14 The participatory aordance of the digital medium has two major aspects: an environment for collaboration and social interaction, on the one hand, and the possibility of marking material changes at the level of code, on the other. Material changes can be marked up in the XML encoding, but also as new data and metadata generated by the users' interaction with the archive, which are stored in the database. We believe that these features of networked computational media can be used to redesign the digital archive as a dynamic environment for dierent kinds of practice, not limited to research and teaching. A scholarly remediation of LdoD according to tested principles of electronic philology would be part of a larger interactive environment where reading and writing practices around the Book of Disquiet could be socialized within the digital medium itself. 5 Aggregation of genetic and editorial witnesses according to criteria dened by readers could also occur within a virtual space that allows users to make critical annotations and write variations based on the fragments. Thus knowledge of textual form and textual transmission would be complemented by experiments with writing processes and bibliographic structures.

Problem: The LdoD Archive Model 15
In this section we oer a synthesis of the four functions and three dimensions of our model for a virtual LdoD, as described in Portela and Silva (2014). This description has been diagrammed in gure 1. Our virtual model for LdoD establishes a framework of interactions distributed across four functions: reader-function, editor-function, book-function, and author-function. These functions contain a model of the ecosystem of literary inscriptions. With regard to facsimile surrogates and textual transcriptions, seen here as digital objects available for interaction with expert and nonexpert users, we may say that the archive contains three related dimensions: a genetic dimension that allows users to create a narrative of authorial composition; a social dimension that allows users to create a narrative of scholarly editing and textual reception; and a virtual dimension that enables users to explore the possibilities of both writing and editing while interacting with the Book of Disquiet. 16 We should clarify here that "social dimension" is used in this article to refer to the socialization of texts embodied in particular textual and bibliographic forms in the historical archive-a notion derived from social editing theories (McGann 2006). The "virtual dimension" refers to the collaborative additions and interactive transformations of the archive in a web environment.
Thus the "virtual dimension" should be understood as a particular expression of the "social dimension" in this constructed electronic reading, editing, and writing space, which can be freely used by either Pessoa experts or non-experts. The nal relevance of each virtual edition will result from its popularity-how often it is accessed and used in the construction of other virtual editions. "Social editing" in the title of this article subsumes both processes of textual socialization, that is, our digital representation of the historical socialization of Pessoa's LdoD in a set of expert editions ("social dimension"), and our design of a digital platform for furthering textual collaborations ("virtual dimension"). The social/virtual distinction is pertinent from both a design and a theoretical perspective, since we need to distinguish the static and dynamic layers of the archive. On the other hand, we also want to emphasize the social nature of editing processes through a series of built-in aordances in our digital archive. The reader-function supports a contextualized reading of the LdoD fragments. It enables readers to visualize and compare fragments and variations according to several authorial and editorial witnesses. The book-function enables the construction and reconstruction of LdoD based on textual and metatextual information in the archive. The output of these exible remakings of the LdoD can be either a given form that already exists in the work's archive, such as the Coelho 1982 or the Pizarro 2010 edition, or a virtual reorganization of the book. The editor-function, in its turn, is meant to provide interpretations of LdoD as a book project based on a pre-existing (but variable and unstable) corpus of documents. This function allows users to produce textual sequences and textual aggregations according to user-dened criteria. It also enables users to add annotations and tags to particular passages. Finally, the author-function enables the extension of LdoD with new texts based on original fragments from LdoD. Thus, a reader of LdoD is able to write a new text derived from fragments of LdoD, becoming an author in the context of a virtual edition. These newly authored fragments can result both from textual remixes of the archive's content, and from textual additions of user-created content. 18 We can conclude that what is needed is (1) a scholarly archive where experts can study and compare LdoD's authorial witnesses with their critical editions; and (2) a virtual archive that allows experts and non-experts to experiment with the production of dierent editions of LdoD, by means of (2a) editing (aggregating, sequencing, annotating, tagging) and (2b) writing (extensions and variations based on Pessoa's texts).

19
Given the above set of goals, the LdoD Archive has to accommodate scholarly standards and requirements concerning digital archives, for instance the use of TEI as a specication to encode literary texts, and the virtual communities and social software features to support the social edition of LdoD by both other experts and non-experts. This aspect implies additional constraints on how to coordinate the interactions within the archive. Therefore, there is the need for a dynamic archive where users can edit their own versions of LdoD, and write extensions of the original fragments, while the archive's initial set of experts' interpretations and analyses of LdoD are kept "unchanged" and clearly separated from the socialized virtual editions and writings (Silva and Portela 2013). This constraint will actually shape the formulation of the solution strategy.

Solution Strategy: A Collaborative Archive Framework 20
To address the separation of the experts' interpretations from the socialized virtual editions and writings we establish the following principles: ⚬ Expert editions do not refer to virtual editions, in order to keep these experts' interpretations separate from the virtual editions' extensions; ⚬ By default only the expert editions are presented, so as to preserve an "ocial" experts' archive, which means that users have to explicitly access the virtual editions.
This explicit access makes them aware of the existence, and separation, of experts' and socialized virtual interpretations.  In contrast with most digital literary archives, which are built using XSLT technology to transform TEI representations into HTML for visualization, and where only the reader-and book-functions are dynamic, in the LdoD Archive we allow users to create their own editorial interpretations of the LdoD, and extend LdoD written fragments using a web interface. Therefore, users can dynamically change the archive repository through its virtual dimension. However, we still continue to support the traditional scholarly work on the genetic and social dimensions where the project editors do static oine TEI encoding of the LdoD using their XML editor of choice.

TEI Encoding 23
Because of the dynamic requirements of LdoD project, and the strategy to create a collaborative archive as described above, the TEI encoding needs to address the three dimensions. Actually, while the description of the TEI encoding of the genetic and social dimensions is driven by how TEI can be used to express the authorial witnesses and their expert interpretations through the concept of edition, the encoding of the virtual dimension focuses on how the TEI encoding chosen for the genetic and social dimensions can be used to support the creation of a non-predened number of virtual editions.

24
In Portela and Silva (2014) we describe a UML (Unied Modeling Language) model for a virtual LdoD that supports the project requirements. In this section we show how this model is encoded in TEI. To encode this model in TEI we use the <teiCorpus> element to contain all the fragments and aggregate in its header the entities that are common, and can be reused, in each of the fragments.

Genetic and Social Dimensions
The encoding below shows a part of the TEI Corpus header where the dierent editions are encoded inside a <sourceDesc> element as a list of <bibl> elements. The TEI encoding of Pessoa's heteronyms is done within <profileDesc> and <particDesc> elements as a list of <person> elements with attribute @type set to the "heteronyms" value. In the example, the four experts' editions and the two heteronyms assigned by experts to the LdoD are declared. The Source concept is implemented within the <sourceDesc> element and the distinction between Manuscript, Typescript, and PrintedSource is indicated by convention through the structure of the @xml:id attribute value. Note that Manuscript and Typescript are implemented within the <msDesc> element, while PrintedSource is used in the <bibl> element. On the other hand, the FragInter concept is implemented by the <witness> element and the distinction between editorial (ExpertEditionInter) and authorial (SourceInter) witnesses is also indicated by convention through the structure of the @xml:id attribute value. A <ref> element is used to associate the witness with its source, for SourceInter witnesses, or with its edition, for ExpertEditionInter witnesses. The editions are declared in the corpus header. The editorial contextual information of the fragment (metatextual information) is encoded within the <witness> element. Finally, the TextPortion instances depicted in gure 3 are represented within the <text> element and refer to their respective FragInter through the @wit attribute of <rdg> elements, which contain the witness identier declared in the fragment header. 7 As mentioned above, an apparatus is used to distinguish the dierent readings of the fragments. The <rdg> element is useful in the context of the editions to represent variations in their readings of the source documents.

29
This approach allows us to associate interpretation metadata in the context of each witness. Users will be able to compare digital facsimile representations of authorial documents (and topographic transcriptions of those documents) to editorial transcriptions. The latter can also be compared against each other in order to highlight their interpretations of the source.

30
When a TEI-encoded le for a LdoD fragment is uploaded to the system, it is parsed, and if it does not contain any errors, a new Fragment instance is created associated with a new instance of FragInter for each dierent transcription of the text. In addition to the verication of the TEI syntax, the parser does a semantic verication of the encoded fragment. For instance, it veries the existence of the entities referenced by @xml:id attributes. This supports the encoding work because it allows the early detection of errors.

33
The semantics of this relationship is dened by the @usePolicy attribute that can take two values: "import" and "inherit". When the "import" value is used, the virtual interpretation of the fragment is built on top of the source interpretation and can change it. Any further change done in the source interpretation does not impact on the virtual interpretation. The source interpretation is copied, at virtual interpretation creation time, to the virtual interpretation context and can be freely changed by the user. When the "inherit" policy is used, the source interpretation is extended in the virtual interpretation and cannot be changed. For example, if the source interpretation is changed by the source community, then the changes are propagated to its virtual interpretations. These policies only apply to the tags and annotations associated with the virtual edition, because, currently, the archive does not allow any changes to the authorial and editorial transcriptions.
The TEI encoding of the genetic and social interpretations is extended to support the virtual interpretations. Therefore, the virtual editions are declared in the corpus header as a new <listBibl> element containing a <bibl> element for each new virtual edition. However, the distinction between the set of critical editions and the set of virtual editions is done by convention through the @xml:id attribute value. Virtual heteronyms are similarly declared in the corpus header. As regards virtual interpretations, they are encoded in the fragment header using <witness> elements and a convention based on the @xml:id attribute value to distinguish them from the other editorial interpretations, namely the experts' interpretations. Additionally, the editorial contextual information (metatextual information) of the virtual fragment contains, among other information, a reference to the source interpretation and the fragment order in the virtual edition.

35
Following the proposed TEI encoding we are able to implement the solution principles for the LdoD project identied in section 4. However, some of the distinctions associated with the separation principle, or the references between virtual interpretations and their source interpretations, are set by convention and require automatic tools to be aware of them. On the other hand, TEI does not support the encoding of some information that is required in a Web 2.0 application, namely access control information. For instance, we cannot express in TEI whether a virtual edition is private or public, and which specic users may access the virtual edition. We intend, as one of the nal results of the project, to dene a TEI customization that accommodates all the identied aspects that cannot be expressed using the TEI core.  This database contains the object model described in Portela and Silva (2014). For the support of Web 2.0 interactions, the :LdoD Application Server component provides a web clientserver interface that users, experts and non-experts alike, use to interact with the object-oriented repository. 8 Therefore, the changes occur on the object-oriented representation of the TEI-encoded les. Finally, in order to preserve the TEI interchange qualities of the archive, the :LdoD TEI File Export component allows the regeneration of TEI-encoded les from the object-oriented repository. Note that this component allows the selection of which parts of the repository to generate and the strategy of the encoding. For instance, it is possible to choose only a subset of the interpretations, and decide between dierent methods for linking critical apparatus to the text.
Since the dynamic write interactions can only occur in the context of a virtual edition, it is possible to export the original TEI-encoded les as they were imported, except for their formatting, which is not stored when les are uploaded and parsed.

40
When comparing the software architecture in gure 5 with gure 2, we see that the static aspects of the author and editor functions occur through the :TEI Editor Tool component and all the other dynamic aspects occur through the :Browser component.

41
is implemented in the JAVA programming language, using the Spring MVC framework 9 for serverside interaction, Bootstrap 10 and JQuery 11 for client-side interactions, and the Fénix Framework 12 to support a transactional and persistent domain model.

Conclusion 43
The specic correlation of static and dynamic goals in the LdoD Digital Archive means that our emphasis falls on open changes that feed back into the archive. The TEI encoding and software design implications of this project make us address both the conceptual aspects of TEI schemas for modeling texts and documents, and the processing problems posed by user-oriented virtualization of Pessoa's writing and bibliographic imagination.

44
In this article we have presented how to encode LdoD editions and fragments using TEI, 13 and also the software architecture of a Web 2.0 environment through which the encoded fragments are fed. These two approaches create an environment where experts and non-experts can collaborate in four roles: reader, book, editor, and author. The environment separates the dierent contributions while allowing certain levels of information-sharing that enhance collaboration and the continuous enrichment of the repository.

45
In this phase we have already implemented a full-edged prototype that is used on a daily basis by the TEI encoders. Currently we are extending the architecture and prototype with more aordances for the editor and author functions. Meanwhile we intend to start experiments with end users to assess the dynamic aspects of the environment and study how to foster the creation of communities around a virtual LdoD. 4 Our XML encoding of what we refer to as "authorial" and "editorial" witnesses is in itself a new edition of LdoD. The distinction that we make in this article between "authorial" and "editorial" serves the rhetorical purpose of showing two types of relation between our XML encoding and its source texts: in the case of "authorial" witnesses we are referring to our topographic transcription of autograph documents, which will also be accessible as digital image facsimiles; in the case of "editorial" witnesses we are referring to our textual transcription of LdoD texts as they have been transcribed and organized in four major editions published between 1982 and 2013. It can be argued that our XML topographic transcription of the "authorial" witnesses is of course a fth editorial witness of the LdoD, since it cannot be considered an authorial source. It must always be a particular reading and representation of that source, even if we intend to give readers a less mediated text by placing it in the context of the document facsimiles. It should, however, be noted that, at this level, we are not proposing a specic selection or organization for the fragments other than the semi-arbitrary shelf marks of the National Library. In fact, the fragments included for representation in the archive equal the total sum of the fragments included in those four editions.
Our topographic transcription could be described as a new implicit edition by the research team focused on representing revision layers and describing material details of the source documents.
An explicit edition by the research team, with its own selection and organization rationale, will take place only at the virtual level of the archive.