[I was invited to deliver the inaugural talk in a series hosted for Baker-Berry Library at Dartmouth College. Many thanks to Laura Braunstein for organizing the event and to those who attended. And special thanks to my collaborators at CU Libraries, Deborah Hollis and Holley Long, for their continued support and teamwork.]
Textual digitization projects of a large scale can take a long time. As you know, they take an even longer time when the text cannot be scanned and OCR’d and manuscript transcription is required. Today I want to take advantage of this middle stage of a long-term textual digitization project I manage, called the Stainforth Library of Women’s Writing, to swap expertise with you. This digital library is under construction right now, and it’s the perfect time to pause, appreciate our data and editorial carnage, share our process and the hunches I’m operating on, and ask different audiences what you think of it and how we can make it better. My plan for our time today is to introduce my project and tell you a little bit about the history of how it got off the ground. And then I’m going to give you a chance to pretend my project is yours. We’ll compare your approaches to editing and delivering our data with what we have done or hope to do.
Throughout, while we talk about options for processes and the choices I’ve made, I am hoping that we can think together about the future of textual digitization projects from a pragmatic point of view that takes small budgets, time constraints, growing numbers of untenured faculty, and the job market into account. That is: you know and I know, from experience, the amount of labor and critical thinking that go into producing digital editions of a text or a collection of texts, such as Dartmouth’s John McCoy Family papers and the Occom Circle Project. I think that it is important to imagine the upper limits of what a digital archive can do for researchers, such as provide 3D representations of digital objects. But I also want to offer an idea for how scholars like myself, without a sizable markup team, can generate a corpus of edited data and deliver it for research and teaching in a timely manner.
I will argue today that producing and releasing one’s data first, and the more marked-up data set in a linked digital archive second, is a much needed new model of data curation that may be of particular interest to untenured digital humanists, of which I am one. This two-phase release model responds to the constrained institutional ecosystem in which many digital humanists are working with limited budgets, time, and help, and while satisfying requirements for hiring, tenure, and promotion that rest on peer-reviewed essays and books. This model, which I’ll be calling the “cookie-dough” model just for this talk, helps us make more useful digital objects with the same amount of data. It also helps create tangible traditional scholarly products from our digital curation work sooner, before we finish creating an interface, which can take a very long time. So: first, a project introduction, second, we’ll talk about processes together in a workshop, and then I’ll conclude. Continue reading