Friday, April 4, 2008

Review: IATH Manuscript Transcription Database

This is the second of two reviews of similar transcription projects I wrote in correspondence with Brian Cafferelli, an undergraduate working on the WPI Manuscript Transcription Assistant. In this correspondence, I reviewed systems by their support for collaboration, automation, and analysis.

The IATH Manuscript Transcription Database was a system for producing transcriptions developed for the Salem Witch Trials court records. It allowed full collaboration within an institutional setting. An administrator assigned transcription work to a transcriber, who then pulled that work from the queue and edited and submitted a transcription. Presumably there was some sort of review performed by the admin, or a proofreading step done by comparison with another user's transcription. Near as I can tell from the manual, no dual-screen transcription editor was provided. Rather, transcription files were edited outside the system, and passed back and forth using MTD.

I'm a bit hazy on all this because after reviewing the docs and downloading the source code, I sent an inquiry to IATH about the legal status of the code. It turns out that while the author intended to release the system to the public, this was never formally done. The copyright status of the code is murky, and I view it as tainted IP for my purpose of producing a similar product. I'm afraid I deleted the files from my own system, unread.

For those interested in reading more, here is the announcement of the MTD
on the Humanist list.
The full manual, possibly with archived source code is accessible via the wayback machine. They've pulled this from the IATH site, presumably because of the IP problems.

So the MTD was another pure-production system. Automation and collaboration were fully supported, though the collaboration was designed for a purely institutional setting. Systems of assignment and review are inappropriate for a volunteer-driven system like my own.

My correspondent at IATH did pass along a piece of advice from someone who had worked on the project: "Use CVS instead". What I gather they meant by this was that the system for passing files between distributed transcribers, collating those files, and recording edits is a task that source code repositories already perform very well. This does nothing to replace transcription production tools akin to mine or the TEI editors, but the whole system of editorial oversight and coordination provided by the MTD is a subset of what a source code repository can do. A combination of Subversion and Trac would be a fantastic way to organize a distributed transcription effort, absent a pure-hosted tool.

This post contains a lot more speculation and supposition than usual, and I apologize in advance to my readers and the IATH for anything I've gotten wrong. If anyone associated with the MTD project would like to set me straight, please comment or send me email.

1 comment:

Margo Burns said...

Just as an FYI: The MTD was not used by the editorial team transcribing the Records of the Salem Witch-Hunt (Bernard Rosenthal, ed. Cambridge University Press, 2009). After trying to use the MTD, only to discover bugs and limitations that IATH was not in a position to pay to have addressed by the programmer(s), I ended up teaching myself how to write PHP code and created various custom interfaces and database tables at our work website to handle the specific tasks that we needed to accomplish, in line with the workflow that the editors had established, rather than warp our work to suit the design of the MTD.