“Markup Theory and Practice with the TEI in ENGL 53.06 ‘Women’s Literature and Technologies of Transmission,'” 12-1:30pm, 5/26/15, Baker-Berry Library (Dartmouth)

[This lecture was sponsored by the Dartmouth Center of Advancement in Learning (DCAL) in collaboration with Dartmouth’s Digital Humanities initiative. Thank you to organizers Laura Braunstein and Scott Millspaugh for the opportunity.]

Opening exercise:

You each have a picture in front of you as well as some markers. Circle or mark the things in that text that you think are the most important structural and thematic aspects of it. Indicate those features using markers or colored pencils on the printed image itself. Discuss with your neighbor. I’m going to show you how this is markup.


Squirrel recreates Michelangelo's painting of the creation of Adam...This hungry squirrel recreated Michelangelo's famous painting of the creation of Adam as it reached for a walnut...The hilarious snap was captured in a park and shows the furry critter clinging on to a tree while stretching for a snack...The stunning photo looks like the famous picture on the ceiling of the Sistine Chapel in which God gives Adam the spark of life through his fingers...Photographer Stanislav Duben, 33, took the photo of his 16 year-old-sister, Aneta, as she fed the squirrel...Stanislav Duben..Stanislav Duben/ Solent News & Photo Agency.UK +44 (0) 2380 458800.

Today’s introduction is meant to give you a brisk historical, theoretical, and pedagogical tour of markup using the Text Encoding Initiative guidelines, also known as the TEI. I heard some really great discussion during the introductory exercise with birds and squirrels images I handed out. The point is to ask you to decide what the most important structural and thematic parts of the text are for you. The upshot of markup, I hope to show today, is not just marking up or encoding, as we say, a document, but rather, it is deciding what features of a document you care most about for structural reasons and thematic reasons. What do you need to mark up for your particular project? It is possible to mark up everything, but that’s rarely desirable considering time and labor limitations. Conversely, we can use others’ markup to reveal the structural and thematic components of digital objects that define our larger set of desires as scholars, or teachers, or students at a certain point in time. The things we don’t  mark up tend to be that which we care less about. As a teaching tool, markup forces us to consider and define what is text or paratext, data or metadata, structural element or theme, and to know what we want to see or search for.

Slide03Introductory and intermediate TEI workshops are now happening all over the country numerous times per year; this is a relatively recent development in the Digital Humanities that there are so many opportunities to learn TEI markup in a workshop setting. Nowadays, TEI is for the most part taught as a skill set that professors, librarians, and other digital humanists need to have in order to produce edited digital collections of texts according to best practices. However, today I want to demonstrate a different reason to learn and teach the TEI as well as a different context. My approach to the TEI is unique, I think, in that I teach this kind of markup as part of literary publishing history from the hand press period to contemporary digital publishing. I use the general idea of markup to introduce students to thinking about the material and paratextual frameworks of books. I want them to consider that there’s not just one book technology, and not all books work the same way. While we all think we know how paperback books work, do we also understand how different kinds of electronic editions work? Books have an array of parts that function in a multitude of ways depending on the materials they were made with, the conditions of their printing and assembly, and their owners’ desires.

So I’m going to share with you today how I do this for my undergraduates in the course I’m teaching this term called “Women’s Literature and Technologies of Transmission from the Long 19th Century to the Present.” The course itself is not a term-long direct study of markup theory and practice related to literature. However, I teach my students markup theory without calling it markup at first, and as we progress through the term and move from letterpress printing to digital publication, I introduce the history of markup languages and we practice markup in class as an analytical exercise. I will boil a term of material down into 1.5 hours for you this afternoon that will introduce you to markup in general, to the TEI, and how I use it in my teaching.

My presentation has 3 parts: (1) introduction to the concepts of markup, (2) an introduction to the TEI and its primary use cases, and (3) how I integrated TEI in my teaching this semester using Dartmouth’s unique resources including Rauner Special Collections and the Book Arts Workshop.

Part I: Introduction to Markup

Bradley Dilger tells us that “in the age of new media, there is no way to avoid markup. Markup is text. Markup is communication. Markup is writing” (Dilger xi). Markup refers to linguistic expressions or codes within a document that indicate its unique structure and style. These codes are not source or machine codes that execute functions on a computer; rather, markup codes indicate how a person or a machine should understand, organize, and display textual data. What I take Dilger to mean is that when we write with a user-friendly interface, like Microsoft Word, markup happens as we write, one might even say that we are marking up the document, but the interface hides it in order to give us a distraction-free and easy-as-pie writing environment. Word processing templates model the desire of a writer to control her use of the paper page; she indents new paragraphs by moving her hand over an inch or underlines a title by drawing a line beneath it. In a Word processor or WYSIWYG interface, markup is hidden, like a blank indentation, but pervasive. Your word processor automatically indents for you when you start a paragraph. It sees the paragraph when you begin it and pre-encodes its architecture. If you didn’t want a new paragraph, you can hit the backspace or delete key, and where it looks like you simply deleted an indent, a blank space, you actually overwrote a small piece of code that the software does not want you to see.

Though they tend to be invisible in our word processors, markup elements almost always appear in pairs of tags that enclose the text they describe. For example, at the top of this page are “tags” in the “title element” that mark-up the title of this slide about markup. If every one of my slides had a title enclosed in the same set of tags, then I can ask a computer to find just those tags and what they contain and list each title in a table of contents. What I strive to help my students notice and what I want to point out today, is that markup represents what’s important to us structurally and thematically. It is a way of understanding how we shape and categorize our texts, but also how we use them to generate more texts, such as the paratext of a TOC.

A Brief History of Markup

Since this is a markup and pedagogy lecture rolled into one, let’s dive straight into markup. The history of markup has a recent place in our long history of publishing. From the 1960s to the 80s, computing did not have a way to structure files so that they could be read across a number of software or hardware platforms. We can read in early markup our desire to gather files together in standardized collections to create electronic libraries, and this necessitated the sacrifice of unique formatting for interoperability. In these early days when the machine or the software died, its files became unreadable. Files that were precious required reformatting in order to be read by another kind of software on another machine. The costs of time, labor, and technology development were significant and frustrating.


One program called COCOA, for example, accomplished this to an extent during the punch-card computing era. It was developed in the late 1960s at the University College London and the Atlas Computer Laboratory. A program written in a deck of FORTRAN cards, COCOA made it possible to gather and make a concordance of a textbase comprised of different files of text. Three files were required to operate a COCOA: (1) the all-FORTRAN IV COCOA program, (2) a “control” file in 13 cards that declares the character, text, and word-count or concordance requests, and (3) the text itself. (Corcoran)[1]


The legacy of efforts like COCOA can be seen in Standard Generalized Markup Language, or SGML. SGML is a foundational early electronic markup language. It was invented in the mid-1980s in part to accommodate the Department of Defense’s need to archive electronic documents with incompatible formats (Bates 4). SGML’s characteristics include:

  • descriptive markup in a hierarchical structure,
  • customizable tag sets,
  • a Document Type Declaration (DTD) that declares each document’s elements,
  • and elements that are human-readable expressions. (DeRose 195-96)

SGML’s perks were that it provided a standard way to state an encoding method for a set of documents; and it was used by the DoD, IRS, and industry-wide for publishing, aerospace, and telecommunications companies for internal file standardization. But SGML’s drawbacks were significant: it was expensive to set up and install, and its specifications were very complex and 500 pages long when printed – akin to a Romantic-era gothic novel. Additionally, it was cumbersome for web browsers and not well-suited for smaller institutions or file compatibility across institutions.

In the 1990s, developers designed HTML (HyperText Markup Language) and XML (eXtensible Markup Language) to address SGML’s weaknesses, focusing primarily on its lack of customization (DuCharme 22).


While HTML enables data to display on the Web, it lacks the flexibility to name and structure elements for various output formats. This is precisely what XML is good at. XML is a markup language that describes a certain grammar of SGML. And it can be changed to define new tag sets – this is what we mean when we say XML is “extensible.” That is, you can make up your very own dictionary of elements for a certain project (Bates 13).

When paired with a style sheet and transformed, XML documents can be published online as HTML or XHTML files with a custom format, as well as text files, PDFs, or ePubs. Many organizations and businesses, from banks to academic institutions, who seek a standard for electronic data archiving and publishing adopt an XML “encoding scheme” that they curate specifically for their needs.

Slide08For example, my needs right now include publishing a recipe for peanut butter on a spoon. (This slide comes from Wikimedia Commons, where peanut butter is so sticky that it is one word.) At the top of the recipe you have some confusing declarations that specify the dictionary of elements or tags you can use in this file – these are specifically for the recipe and they exist in an external DTD, or Document Type Definition. Then, you can see the hierarchical sets of tags: recipe, title, ingredientList, and preparation, before closing the recipe tag.

Part II: What Is the TEI


Now that you understand how XML is structured in open and closing tags, and how it fulfills SGML’s wishes, you’re ready to meet the TEI. The Text Encoding Initiative, or TEI, is a markup language dialect of XML, if you will, established in 1987 for encoding humanities data in electronic forms that do not depend on specific hardware or software. Humanists use the TEI when we want to preserve, group, or make searchable the unique parts of our digital texts. It is currently comprised of a large selection of XML tags designed specifically to describe humanist texts. So TEI files appear similar to the peanut butter recipe I just showed you, except the tags or elements are named to describe the parts of plays, essays, poems, letters, and any other bits of these forms and texts that humanists want to connect or study.

A community established the TEI and continues to oversee it. It gathers members and scholarly interest at a steady rate as TEI has become the best-practice standard for publishing electronic edited collections. In the 1990s this community was comprised of three main groups: The Association for Computers in the Humanities, the Association for Literary and Linguistic Computing, and the Association for Computational Linguistics. In January 1999, the University of Virginia and the University of Bergen (Norway) proposed creating an official body called the TEI Consortium, which would maintain, develop, and promote the TEI. By 2013, the consortium had 60 official institutional members and 33 individual ones. (Seaman)

Slide10 On this slide (left), you can see that today the TEI has 1,537 followers on Twitter, a public Facebook page, and a very lively and informative listserv with archives that date back to 1990. If you email the listserv or Tweet a question, you will almost certainly receive a reply from a TEI expert in the list, like James Cummings or Kevin Hawkins, and novice questions are taken quite seriously. I’ve been shy to address the listserv myself and I need not be. It’s a welcoming and friendly community that seeks new participants as well as seasoned (or battered) warriors. And if you subscribe to the Twitter feed, there’s even a TEI element of the day.


Why Encode or Mark Up with the TEI?

As you may already know, you don’t need to TEI encode something in order to markup for certain words or phrases — you could use your own XML markup language, as we did for the peanut butter on a spoon recipe. You also don’t need TEI to count word repetitions, perform topic modeling, or get random samples from a text corpus. Instead, you would use R or python to pull these things out of a text or corpus, a group of texts.

I think of TEI encoding as a set of tags that we add editorially to a text that creates a new version of it. This new TEI-encoded version illuminates its organizational and architectural structures as well as the parts that make it unique or interesting to us. The Women Writers Project likens the TEI encoder to “an anthropologist in the tradition of Clifford Geertz, creating a thick, contextualized, interpretative description of the text, or to a critical editor who produces an analytical representation of the text which provides systematic, expert knowledge about it. . . . It is a way of formalizing and externalizing the structures in a text; a way of adding further information to the text that interests us; a meta-text that comments on, interprets, or extends the meaning a text” (WWP intro).

I want to add that these traits make the TEI a useful tool for teaching the difference between a text, or the words on a page, and its form, which is their arrangement in the text block on the plane of the page as well as in 3D space, within a book or a collection of electronic files. The TEI can help students notice paratextual apparatuses that frame and shape different editions and media that deliver a “bag of words” – a novel, a collection of poems, a play – to a reader.


For example, where there’s the text of a poem, like Shakespeare’s Sonnet 17 (slide above), the TEI says: Behold, this is a poem! It shows you that it contains a series of lines, and each line breaks at a particular place—it cannot run amok, line-wrapped or not. Look there, at the top of the page, where you read its title, it is a title! (Though it is just a vague title consisting of a sonnet number that places it in a series.) We might also want to point out in a metrical lesson that: Here, in this tag, is a poetic foot! Higher up, at the level of the object, the TEI Header (not depicted in the slide) can describe the poem’s physical container, such as its page size, small or large, printed on the cheap or at great expense; its binding, luxuriously in leather, cost-effectively not bound at all, or perhaps decadently rebound as part of a large family-owned library. In other words, the TEI also tries to make sure that we are aware that, as a reading culture, we read this group of lines as a poem that originally came from this exact book held at that particular library, and with this precise call number. The TEI does not divorce a digital edition from its source text; rather, the TEI version can become an important source of its own to teach about bibliographical object specificity.

The TEI Guidelines – Reading Our Own Tags

The collection of XML tags that we call the TEI reside in the TEI Guidelines. Here’s how you find them on the TEI website:


Together, these tags define a textual object according to our own rules and desires. Therefore, they also have the power to read back to us our own rules and desires for textual objects and how we want form to meet content.

Since the guidelines are expansive and designed to cover humanist projects that range from plays, to letters, to essays, you rarely need all of them at once. You can use only the selections from the guidelines that apply to your specific project, and you can choose to take advantage of deep encoding tag sets or lighter encoding schemas. Here are a few examples of common use cases for the TEI. They represent a wide range of projects.

The TEI: 2 General Use Cases


We use it to organize, search, and publish a large archive of digital texts in multiple formats. This means that you need to standardize the way a collection of documents are packaged so that you can aggregate them and search across them. These kinds of projects usually have a very sparse set of tags: they include just what’s necessary to retain the structure of the document and to make certain desirable attributes of the documents searchable, such as author, title, publication date, and perhaps whether it’s poetry, prose, or drama.

For example, the Text Creation Partnership has 61,315 texts or digital objects TEI encoded, searchable, and accessible in a variety of formats on its site. They are available as webpages, ePubs, XML files (so you can see the TEI encoding), and sometimes also as page images.


We can use the textbase as a database because a uniform tag set is applied across all of the files in the TCP archive. So let’s search for “Smith, Charlotte.” And let’s pull up the source files for the 1795 edition of Elegaic Sonnets and Other Poems so we can look at the TEI encoding, which should be fairly bare bones. Click on the .xml file. Within this collection of poems in the XML file, search for one titled “To a Friend” – this is sonnet XXXIV and one of only two sonnets in this entire collection that precisely fits the Petrarchan sonnet model.[2] If the TCP wanted to enable a study of sonnets or poetic forms in its capacious collection of texts, it would have encoded poems with tags that mark up poetic feet and rhyme schemes. The Petrarchan sonnet is comprised of a group of 8 lines (an octave) with a rhyme scheme ABBAABBA, followed by a group of 6 lines (a sestet) that rhyme CDECDE. But we don’t see any of that here, as we did with Shakespeare’s sonnet 17. Instead, we know by the absence of tags that the encoders weren’t interested in how Petrarchan or not-Petrarchan these sonnets scan; there is no deep-level encoding of poetic feet or rhyme scheme that is particular to this kind of sonnet. Instead, The TCP just encoded the title, which is the, and its lines with the tag , because the TCP project is to provide access to and storage of a large collection of edited TEI texts for others to choose from and further edit as they desire. In other words, the TCP lightly encoded texts makes it easier for another team, let’s say the Petrarchan Sonnet Project (which I just made up), to encode and study the sonnets in the TCP.


While Use Case 1 shows TEI markup used to organize a very large corpus of files, my second use case does the opposite: it shows editors deeply encoding a single text or a small collection of texts by theme, allusions, or ideas to study certain local patterns and implications. The example I want to show you has been recently published on Romantic Circles website: it’s an electronic edition of William Wordsworth’s Guide to the Lakes (1835). According to the introduction, this was the last edition by the author and the most reprinted, and therefore perhaps the most used by travelers or travel planners. Also heavy users, the editors of this new electronic edition marked up the text to tag every single geographic place that Wordsworth refers to. Each place gets a linked note and some of them also receive an image. The effect of this is to make it a most useful travel guide, even useful for contemporary travelers and Wordsworth enthusiasts the same way that his Guide to the Lakes would have been useful to nineteenth-century travelers.

For example, in paragraph 4 you see that “Kirkstall Abbey” is highlighted and linked.

Click on “Kirkstall Abbey” and you find a wonderful note from the editors as well as a photo of the abbey that Paul Westover took on a recent trip.

Screen Shot 2015-06-08 at 8.40.58 AMIn July 1807, Wordsworth and Dorothy along with some friends spent a weekend at Kirkstall and explored both the twelfth-century monastery and the other nearby attractions. Photo: Paul Westover. (Mason et al.)

Now, I can click on the XML : TEI link at the top of the page, on the right side, and see how the editorial annotation looks as it is encoded within Wordsworth’s Guide. Scroll down to paragraph 4 or search for “Kirkstall Abbey” and you will discover that the vast majority of things encoded in this edition are places. This digital edition is truly a study in Wordsworth’s as well as the editors’ representations of place in this single text. In fact, because the editors were only interested in heavily annotating places, they did not even have to use a tag specifically designed for places and place names; instead, they could use a generic tag because the only tags in this document identify and describe places with annotations.

Part III: TEI Markup in ENGL 53.06, “Women’s Literature and Technologies of Transmission”

At this point, I’ve provided a swift introduction to the theory of markup languages, their history, and to the TEI as a markup language used by scholars to create two chief categories of projects: larger electronic archives like the TCP, usually with lighter encoding, and smaller digital editions usually with deeper encoding by structure as well as theme like Wordsworth’s Guide to the Lakes (1835) that is richly encoded as a study of all of the places mentioned in this 19th century Guide. I want to reiterate that my goal here today is to show how a course that introduces students to the idea of thinking about literary history along with its forms of publication is a great opportunity to introduce the TEI as part of that tradition. Additionally, we can write and read our own markup back to ourselves to analyze what elements of the content and formal structures of a text we care about, both in the flat plane of the page as well as at the level of the 3D book object. Markup encourages us to think about textual forms and containers in addition to the text itself.

The ability to teach literary history alongside how literature has been historically packaged is now easier than ever thanks to virtual resources such as the Internet Archive, Google Books, and other digitization projects that visualize book pages. Dartmouth also has two important local resources that enable studies of literary history alongside book history and bibliography: Rauner Special Collections and the Book Arts Workshop.

This Spring, I’m teaching a course entitled “Women’s Writing and the Technologies of Transmission from the long 19th-century to the present” (ENGL 53.06). The course takes as its starting point the idea that technology is not gender neutral, and technologies used for writing and publishing have historically posed particular challenges for women writers. I use the idea of markup in my course as a way of helping students notice content as both different from but also linked to form and the technologies that bring a text to a reader. Markup helps us notice the different ways that kinds of books work, from electronic books to first editions of early-nineteenth-century novels, as well as publishers’, authors’, and editors’ methods for telling readers and consumers what about a book defines its purpose as a literary artifact, a story, or a commodity.

Example 1: Orlando Marking Up Itself as a Biography


The easiest method to introduce the idea of markup is to point out where a text calls attention to its own form, and nearly everything I assigned this term does this in one way or another. In these cases, the text can be thought to mark-up itself. For example, we just finished reading Virginia Woolf’s novel Orlando (1928), which calls itself a biography over and over again, even on the title page. However, it is a fictional biography, which amounts to a novel. Much of the book disputes what a biographer can and cannot accomplish in terms of describing the essence of its subject, and it lampoons the genre of the biography as a patriarchal method of telling another person’s story with a standardized master narrative. One could say that Woolf marks up her book as a biography in order to get the reader to notice her tag, if you will, and decide for herself the best alternative tag or label for the genre and form of this work.

Example 2: Critical Editions and Agendas for Wollstonecraft’s Vindication

My second example of introducing the idea of markup to my class comes far earlier in our course schedule. Our reading started with Mary Wollstonecraft’s seminal Vindication of the Rights of Woman, first published in 1792. We followed Wollstonecraft with Mary Robinson’s response “A Letter to the Women of England” (1799).

A great happy accident started our course. While searching for a free digital edition of the Vindication for students to use the first week of class, I happened upon a gem: a modernized edition of the Vindication (edited by Jonathan Bennett) that obliterates everything I love about reading Wollstonecraft’s voice and diction. I immediately sent it to my class. It truly is the perfect example with which to begin a term devoted to thinking about the history of women’s writing as textual objects that have to survive print history in one shape or another, and how those shapes alter the meaning of the work. This was a singular opportunity to show the amount of power an editor has over a text and its reception. As opposed to the last example of Woolf marking up her novel Orlando as a biography largely as a joke, here I show how an editor’s markup is an act of interpretation that dramatically changes a work. We had this comparative discussion on the second day of class this term.

Wollstonecraft’s 1792 introduction begins:

After considering the historic page, and viewing the living world with anxious solicitude, the most melancholy emotions of sorrowful indignation have depressed my spirits, and I have sighed when obliged to confess, that either nature has made a great difference between man and man, or that the civilization which has hitherto taken place in the world has been very partial. (intro)

In comparison, Bennett’s modernized edition begins:

After thinking about the sweep of history and viewing the present world with anxious care, I find my spirits depressed by the most melancholy emotions of sorrowful indignation. (4, my emph)

The modern edition is missing a lot. It cuts the first sentence in half, replaces “solicitude” with “care,” and relegates to the second sentence an instantiary point that civilization has been very partial to men, there being no natural difference between “man” and “man,” as Wollstonecraft says. However I was most keen for my students to recognize that the modern edition replaces the paramount opening phrase, “the historic page,” with the cliché “the sweep of history.” The “historic page” that Wollstonecraft refers to here is of course figuratively used a metaphor for history in general, but it functions primarily as a sign for a book or text. In other words, the emphasis is not just on history as in “the historic page,” but rather on print pages, as in “the historic page” in the long tradition of publishing on pages, in print. The page is that which is historic. Moreover, this phrase is so important to Wollstonecraft that she uses it once more in Book 9 when she associates it with the gentleman’s library, full of “the adventurous march of virtue in the historic page,” in dialogue with the “gaming table” where privileged men “[hang] with dumb suspense on the turn of a die” (UVA).

We just discussed together how the modern edition dilutes and compromises Wollstonecraft’s message regarding the power of books and printing, and how those media have favored a patriarchal historical narrative both in thought as well as with the publishing industry. This is not just a simple matter of how Bennett translates late-eighteenth-century diction and sentence construction into modern prose. Rather, I will show how the editor’s interpretation sweeps away Wollstonecraft’s own historic page as he forges a paratext that circumscribes and shapes this modern edition. This paratext is his markup. Let’s take a look at that.


Bennett prefaces Wollstonecraft’s Vindication with his own editorial agenda, which unfortunately blights her message. He says:

[Brackets] enclose editorial explanations. Small ·dots· enclose material that has been added, but can be read as though it were part of the original text. Occasional •bullets, and also indenting of passages that are not quotations, are meant as aids to grasping the structure of a sentence or a thought. Every four-point ellipsis …. indicates the omission of a brief passage that seems to present more difficulty than it is worth. Longer omissions are reported between brackets in normal-sized type. (1)

This excerpt provides the equivalent of the markup we provide in a TEI Header called the “encoding description” or . This element is a piece of metadata, or data about the marked-up digital text, and it appears in the top half of the encoded text. It gives us a “detailed description of whether (or how) the text was normalized during transcription, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied, and similar matters” (TEI Guidelines). As markup, the editor’s preface to the modernized edition not only provides an example of the editor being dutifully transparent about his editorial decisions, but it also provides an early warning against the use of this edition for a nuanced analysis of the Vindication. As we know, this essay argues that women should be allowed the same educational privileges as men so that they can, among other things, use their own voices to correct social inequities. The editor tells us in his clear markup preceding the text that he is changing and even omitting Wollstonecraft’s content and voice where passages seem to “present more difficulty than [they] are worth.” In doing this, Bennett obstructs the author’s ability to transmit her text, her own historic page, to her readers for as long as his digital edition lives on the web.

[It’s important to note that if you google “Vindication of the Rights of Woman,” the Bennett edition is the the third listing on the first page of results, below a Wikipedia entry and the Bartleby.com edition. So Google isn’t helping Wollstonecraft’s cause, either, though that’s another matter.]

Example 3: Rauner, Markup as the Build of a Textual Object

The idea of markup as paratextual also relates to how I invite my students to think about the mechanics of a print or digital object, or the physical container that delivers the text to the reader. Jay Satterfield helped me design a class in Rauner that highlights different bibliographical technologies. In other words, for a class I wanted my students to not think as hard about the words on the page as they did about the pages, print, binding, and covers of 19th century textual artifacts that put the story or the poem in a reader’s hands.


We curated a classroom exhibit in Rauner with five tables, and each table contained a small collection of bibliographical objects grouped by format such that we could highlight differences between objects on a certain table, and more broadly among the themed tables. For example: the 19c. “Novel Marketing” table featured an unadorned 1813 first edition of Jane Austen’s Pride and Prejudice alongside an elaborate illustrated and heavily revised third edition of Mary Shelley’s Frankenstein (1831). The 1831 Frankenstein had an extensive and visually appealing apparatus that was meant to re-release an improved text to readers. We showed students how the third edition of Frankenstein was more of an advertisement and a commercial product than the 1813 edition of Pride and Prejudice, a book that mainly delivers a story and doesn’t try as hard to be a commodity.

Another table held two very different kinds of diaries in order to show the differences between manuscript and print, as well as personal diary and saleable volume. Anna Ticknor’s travel diaries are gorgeous handwritten tomes that look extremely burdensome to carry, but are beautiful to admire and easy to read, despite her nineteenth century orthography. We situated them next to Anne MacVicar Grant’s printed Memoirs of an American Lady, which is a compact printed volume that had none of the aura of authenticity of Ticknor’s journal. The exhibit also featured broadsides, periodicals, serial novels, and a literary annual, to name a few other forms of print that I wanted my students to encounter and explore. (I must thank Peter Carini for helping me co-run this class, and again I thank Jay Satterfield for helping me design the class.)

IMG_3772On the heels of our visit to Rauner, our class spent two consecutive class periods in the Dartmouth Book Arts Workshop experiencing and thinking about how for nineteenth and early twentieth century women writers, setting one’s own type was a great privilege. It was an opportunity to be in charge of how your own writing responded to the material constraints of letterpress printing. The privilege of setting and printing was one that enabled authors like Virginia Woolf to write and self-publish books that unapologetically defied and even mocked standard British literary culture and society. Because she and her husband Leonard ran their own press, called Hogarth Press, Virginia did not have to struggle with editors and publishing houses that would have balked at trying to sell her extremely experimental novels.

I timed these classes to coincide with reading book 5 of Elizabeth Barrett Browning’s epic novel Aurora Leigh (1856). This is a part of the book in which Aurora conveys the gender bias of the bookmaking and publishing industries. She laments how difficult it is to publish as a woman poet, let alone make money from the book sales. She describes the materials and technologies of book production as if they also produce men:

Mere passion will not prove a volume worth
Its gall and rags even. Bubbles round a keel
Mean nought, excepting that the vessel moves.
There’s more than passion goes to make a man
Or book, which is a man too.
[…] I am sad. (V.389-99)

This passage, about making books, refers specifically to the process of making paper for a volume out of rags, bubbles, keel, and vessel. These materials, she says, make books, which are also men (V.399). To prime us to think about her acts of writing in terms of letterpress publishing, as well, she describes men’s opinions in Book 1 as “press and counterpress/Now up, now down, now underfoot, and now/Emergent” (I.802-4). Her poetry mimics the rhythm and vertical movement of the hand press on which her epic novel would have been printed, the use of the printer’s feet to move the plates, and the printed page that emerges wet with ink.

As I set forth in our initial definition, markup refers to linguistic expressions or codes within a document that indicate its unique structure and style. Though they don’t look like tags and are not linguistic, letterpress printing tools also relate to markup. Like a FORTRAN card during the punch-card computing era, the markup in letterpress is physically built into each unique piece of type and the other materials that shape how ink winds up on the page. Unlike TEI-encoded digital poems, the “data” produced by printing poetry in letterpress cannot be transferred across platforms. As a class, we best experienced and understood letterpress “markup” as the constraints built into the materials we learned to use to set type: the size and shape of each lead piece of type as well as the “furniture” or woodblocks that help control blank spaces on the printed page.


The Dartmouth Book Arts workshop boasts a press like the one that would have been used to print Aurora Leigh in 1857, an H. M. Caslon Iron Hand Press. [However, we used an electric press to print our broadsheet, since the hand press would have taken too long given our time limits.] Students each selected two lines from our reading in Aurora Leigh that they wanted to set on a broadsheet, which we would print as a class. They were given free reign to use any font they liked, and I think everyone selected a different font and type size, so our final product is, shall we say, spirited.

There were a few unique moments during typesetting that made students feel the constraints, the inflexibility and non-interoperability of letterpress, that which distinguishes it from writing and printing with computer word processors like Microsoft Word or Pages.


These moments included:

  • When a student ran out of a particular kind of type. Type is kept in really heavy trays with each letter in a little pocket of the tray. It was a unique experience for students to feel as though the number of letters they could use to write something limited them. We’re never limited by keystrokes on a keyboard—that is, until the key breaks.
  • When the length of the line a student wanted to set, with a given size type, was longer than the length of the job stick in which they must set it. When this happened, their choices were either to cut the line (which is an offense against the poet’s verse), or to redo the whole line with smaller type. This means putting away all of the type they so carefully placed, taking down a new heavy tray of type, and selecting and placing all of the letters again. One could feel the cost in labor and time.
Shayn Jiang finishes her broadside letterpress print in class
Shayn Jiang finishes her broadside letterpress print in class

An unexpected result of this class was that it turned into a bonding experience and a little bit of a rowdy celebration. In this slide, Shayn is smiling as she pulls out the broadside she just printed because her 10 classmates were clapping and cheering her on. Printing felt and sounded like a sporting event. And this too was a moment of reflection for us: we don’t get this same kind of physical satisfaction from producing electronic print. It’s far less laborious and time consuming, it’s not a new skill, and therefore it feels to us like less of an event than letterpress printing did. [Special thanks to Sarah Smith for helping me design and run these two classes.]


From Letterpress  to the TEI in My Course

This brings us up to just a couple weeks ago, when I co-taught a class introduction to markup and TEI with David Seaman, the Associate Librarian for Information Management here in Baker-Berry Library. David is also an annual professor of text encoding at the Rare Book School at the University of Virginia, and I personally was excited to learn how he thought about and taught markup and the TEI.

Between David and myself, we delivered a similar introduction to markup and to the TEI that I just presented to you this afternoon. I prepared my students for the TEI class by asking them to reflect on how we have been thinking about the author, editor, publisher, and printer’s control of form and content in different instances across our term and reading:

  • in the two versions of Wollstonecraft’s Vindication that we compared as a class
  • in Virginia Woolf’s hyperbolically self-referential mock “markup” of Orlando the novel as a biography
  • in our visit to Rauner, where, for example, we compared the very different form of an unadorned first edition of Pride and Prejudice with the illustrated and heavily revised third edition of Frankenstein (which we read as a class)
  • and most recently in our work setting and printing type in the letterpress studio.

We followed our introduction with an exercise that practiced identifying items on a page for markup and viewing what that would look like in an XML editor. And we’re going to do a modified version of that now.

[At this point in the talk, I passed out transcriptions and manuscript copies of two letters from Jane Simpson, a 19th-century author, to book collector Francis John Stainforth, and we marked up the structural and thematic parts that we wanted to preserve electronically and/or study further. Participants discussed the results and we followed with a 15 minute Q&A.]


I hope I have shown in this short introduction to markup history, theory, and the TEI that we can apply the idea of markup to teaching students how recognize the forms that shape our texts, whether they are books or PDFs. Markup and the TEI can also help students feel the editorial power they have when they shape a text themselves by determining what parts of the text merit markup depending on their constraints and desires. As scholars and editors, it’s our job to learn how to see markup happening even when software makes it invisible, or when other editors put it to work and change the visage and vital organs of an online edition. It is also incumbent upon us to read our own markup back to ourselves in order to try to gain perspective on what we, as textual scholars, value most and least when using electronic texts to preserve or analyze.

Thank you.

[1] For a succinct and detailed description of COCOA see Paul Corcoran’s “COCOA: A FORTRAN program for concordance and word-count processing of natural and language texts” in Behvior Research Methods and Instrumentation 6.6 (1974): 566. Web. 2 June 2015. See also Susan Hockey’s “The History of Humanities Computing”. University of Illinois. Archived from the original on 18 September 2013. Retrieved 11 June 2015.

[2] The other Petrarchan sonnet in the collection is “To Melancholy” (Sonnet XXXII).

Works Cited (but not linked)

Bates, Chris. XML Theory and Practice. West Sussex, UK: Wiley, 2003. Print.

Corcoran, Paul E. “COCOA: A FORTRAN Program for Concordance and Word-count Processing of Natural Language Texts.” Behavior Research Methods & Instrumentation 6.6 (1974): 566. Web. 11 June 2015.

Dilger, Bradley, and Jeff Rice. Introduction. From A to : Keywords of Markup. Minneapolis: U of Minnesota P, 2010. Print.

DuCharme, Bob. XML: The Annotated Specification. Upper Saddle River, NJ: Prentice Hall, 1999. Print.

Seaman, David. “Introduction to TEI In-Class Lecture and Workshop.” With Kirstyn Leuner. ENG 53.06 “Women’s Literature and Technologies of Transmission.” Dartmouth College. 15 May 2015.

Opening exercise image credits: (1) National Geographic, (2) Stanislav Duben/ Solent News & Photo Agency


  1. […] Kirstyn Leuner “Markup Theory and Practice” lecture; Laura Mandell “Gendering Digital Literary History: What Counts for Digital […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s