TAN Tutorial 1: Preparing a TEI Corpus for the Text Alignment Network
Introduction
In this tutorial, an incremental bootstrap approach to TAN, you will create your first
small corpus of working TAN files. You will be exposed to important principles of TAN
design, learn how to interact with other TAN files, and discover how TAN enhances the
editing process.
The capstone to the tutorial is learning how to configure and run Parabola, a TAN
application that arranges work-versions in interactive parallel display.
Examples of output:
Although you can learn a lot simply by reading through the slides, you will learn far
more by joining at least one other student and being led by someone who has developed a
TAN project. Inevitably something will not go the way you expected, and an experienced
hand can save you much time and frustration.
Prerequisites
- At least basic familiarity with XML and
TEI, and with the concepts of well-formed
and schema-valid XML.
- A rudimentary knowledge of XPath and either XQuery or XSLT.
- Some familiarity with Unicode and regular expressions.
- A corpus of TEI texts. The corpus may be small, and consist of short texts
(recommended, in fact), but it must include at least three versions of one work, of
the participant’s choice. The versions may be in different languages (e.g.,
translations) or they may be in the same language (e.g., revisions, variations).
- Curiosity, humility, and patience
Technical Requirements
- a computer
- a licensed, local installation of Oxygen XML Editor
- a file sharing service such as Google Drive or Dropbox (service TBD)
- internet access
- optional: ability to post XML files to the Internet, e.g., through a personal or
institutional server, or through a service such as GitHub
Preliminary readings
- TAN Guidelines, chapters 1 and 2
Sessions
Background: Guidelines 4.1, 4.2, 4.5, 8.4
- Learn how TAN handles vocabulary and semantic-based identification
- Create a TAN-voc file
- Understand TAN metadata structures and principles
- Understand validation phases, and how to interpret validation reports
Background: Guidelines 5 (all), 8.2, 8.3
- Adjust TEI files to be TAN-compliant.
- Coordinate transcriptions with vocabulary.
- Understand how TAN handles space, Unicode, and word division.
- Think about normalization.
- Learn how to handle reference systems.
Background: Guidelines 6.1, 6.2, 8.5
- Develop a TAN-A file for aligning and annotating the corpus of files.
- Expose files to a network
- Use files on the network
- Understand how claims (annotations) work
Background: Guidelines 9.1, 9.2, 9.4
- Configure and use the TAN application Parabola.
- Learn the basics of data types and regular expressions.
- Work with class 2 adjustments.
- Grouping works and sources via aliases.