:::: MENU ::::

Dispatches from Capitol Hill: #3, or XML and TEI are Scary

Here at Early Modern Digital Agendas, we are chugging right along. This week we switched gears rather dramatically; for the last few days of last week, we focused almost exclusively on Early English Books Online (EEBO), issues of facsimile, large corpora analysis, and the building of large collections of early modern materials. And although my title is slightly tongue in cheek, I really do get intimidated by TEI XML; despite a long association with it and using it in several contexts, I’ve never had a dedicated project of my own during which I dealt with XML day in-day out.

Today, Alan Galey, Julia Flanders, and Heather Wolfe joined our hardy band for a discussion on practical editing, diplomatic transcriptions, and the creation of XML encodings of early modern texts in accordance with TEI guidelines. As our group is fairly diverse, some individuals were quite well versed in editorial theory; others felt at home with TEI, while others had very certain theoretical engagements with both groups of practice. This in itself was interesting, since it seemed that no single one of us felt completely at ease engaging with early modern texts, textual editing, critical theory, and the technological methodologies of TEI conformant XML. Together, though, our diverse vantage points prompted some intense debates about everything from the nature of a squiggle (is this a colon or semicolon?) to how to accurately represent additions and deletions in manuscript to rather abstract engagements with TEI as the practice of encoding an Ordered Hierarchy of Content Objects (OHCO) as outlined by Allen Renear. Throughout the day we looked at both the New Variorum Shakespeare and the Folger Digital Text project. This was interesting because these two projects have chosen to model the most canonical of authors in dramatically different ways. The NVS, for example, follows the through line numbering originated by Charlton Hinman; this ensures a certain conceptual interoperability with numerous printed volumes of Shakespeare. As a result, the encoding takes the line as its basic textual unit. Even though it marks up by word (and sometimes by character), the verse line seems to be the basic unit of thought.

The XML for the New Variorum Shakespeare


The XML for the New Variorum Shakespeare

The Folger Digital Text project, in contrast, takes the word as its basic unit. In a rather unorthodox choice, at least on first thought, the folks at the Folger have chosen to assign every word, certain characters, and punctuation an XML id. This allows the project to perform analysis and formatting at a more granular level than the NVS: the XML is more easily processed by a computer than the NVS; on the other hand, it renders the XML nearly unreadable by a human.

Folger Digital Texts XML


Folger Digital Texts XML

Although quite similar in the type of text they are encoding (Shakespearean drama), the two projects have made widely divergent practices to encode those texts. As we discussed extensively today, these decisions are explicitly editorial decisions. Decisions about transcription (taking ſ as s or keeping the long form), logical groupings (line groupings or paragraphs, lines or speeches?), how to portray additions and deletions (should the original character be encoded by default or the final character, or how to portray the relationship of addition to deletion, or is that a choice that has to be made?), and so on. As our conversations tomorrow will no doubt continue to cover, the TEI and XML has a role to play in large archival projects in the (digital) humanities. How we go about the comprehensive creation of scholarly digital editions, and exactly how these projects might impact the types of questions we can ask as early modern scholars, remain to be reckoned with.

 


Comments are closed.