TAN Tutorial 1: Preparing a TEI Corpus for the Text Alignment Network

Introduction

In this tutorial, an incremental bootstrap approach to TAN, you will create your first small corpus of working TAN files. You will be exposed to important principles of TAN design, learn how to interact with other TAN files, and discover how TAN enhances the editing process.

Although you can learn a lot simply by reading through the slides, you will learn far more by joining at least one other student and being led by someone who has developed a TAN project. Inevitably something will not go the way you expected, and an experienced hand can save you much time and frustration.

Prerequisites

At least basic familiarity with XML and TEI, and with the concepts of well-formed and schema-valid XML.

A rudimentary knowledge of XPath and either XQuery or XSLT.

Some familiarity with Unicode and regular expressions.

A corpus of TEI texts. The corpus may be small, and consist of short texts (recommended, in fact), but it must include at least three versions of one work, of the participant’s choice. The versions may be in different languages (e.g., translations) or they may be in the same language (e.g., revisions, variations).

Curiosity, humility, and patience

Technical Requirements

a computer

a licensed, local installation of Oxygen XML Editor

a file sharing service such as Google Drive or Dropbox (service TBD)

internet access

optional: ability to post XML files to the Internet, e.g., through a personal or institutional server, or through a service such as GitHub

Sessions

Background: Guidelines 4.1, 4.2, 4.5, 8.4

Learn how TAN handles vocabulary and semantic-based identification

Create a TAN-voc file

Understand TAN metadata structures and principles

Understand validation phases, and how to interpret validation reports

Background: Guidelines 5 (all), 8.2, 8.3

Adjust TEI files to be TAN-compliant.

Coordinate transcriptions with vocabulary.

Understand how TAN handles space, Unicode, and word division.

Think about normalization.

Learn how to handle reference systems.

Background: Guidelines 6.1, 6.2, 8.5

Develop a TAN-A file for aligning and annotating the corpus of files.

Expose files to a network

Use files on the network

Understand how claims (annotations) work

Background: Guidelines 9.1, 9.2, 9.4

Configure and use the TAN application Parabola.

Learn the basics of data types and regular expressions.

Work with class 2 adjustments.

Grouping works and sources via aliases.

TAN Tutorial 1: Preparing a TEI Corpus for the Text Alignment Network

Introduction

Prerequisites

Technical Requirements

Preliminary readings

Sessions

Session 1: Vocabulary

Session 2: Transcriptions

Session 3: Alignment and Annotation

Session 4: Application