Chapter 1. Introduction

Chapter 1. Introduction
Prev	Part I. General overview	Next

The Text Alignment Network (TAN) is a framework that allows users, working independently and collaboratively, to share, find, create, edit, and explore digital texts and annotations.

A customized extension of Text Encoding Initiative (TEI) XML, TAN is particularly suited for organizing and aligning texts with multiple versions (copies, translations, paraphrases), and for creating and editing text annotations such as quotations, translation clusters (word-to-word), and linguistic features.

The foundation of TAN is a suite of XML formats, each designed for a specific task. The extensive valid routines maximize the syntactic and semantic interoperability of texts, annotations, and language resources. TAN comes with applications and utilities that open new frontiers in scholarly publishing, research, and teaching.

Why use TAN?

Extensive error checking. Built-in TAN validation rules go well beyond the customary error-checking performed by other formats. Files linked in the network "talk" to each other, to let users know about changes and updates. More than one hundred types of content-based errors are checked. Through Schematron Quick Fixes, many of the problems can be corrected in a matter of seconds.

Time-saving utilities. Enjoy enhanced editing functions in Oxygen XML Editor's Author mode. Highly customizable TAN utilities help you create, edit, and maintain TEI and TAN files. For example:

Body Builder: write rules to convert plain text or Word docx files into a preferred TAN/TEI structure and markup.
Body Remodeler: incrementally restructure a text to imitate an existing TAN/TEI file. In conjunction with Oxygen Author tools, this utility can save hours of labor in creating a collection of many versions of the same work.
Body Sync: update a TAN/TEI file so its transcription exactly matches that of another TAN/TEI file.
TAN-A-lm Builder: generate lexico-morphological data for a TAN/TEI file.

Pathbreaking applications. Core TAN applications, written in XSLT, provide cutting-edge tools for textual research and analysis. For example:

Diff+: identify, analyze, and visualize text differences between any number of versions of a text.
Parabola: juxtapose in a single interactive HTML page all the versions of a work, along with annotations.
Tangram: identify quotations, paraphrases, and common text between two groups of texts.

Intuitive text referencing. Unlike TEI, HTML, or other markup systems that rely heavily upon arbitrary identifiers that can be difficult to navigate and maintain, TAN points to text portions using familiar reference systems, or user-customized tokenization rules.

Application development. TAN is built upon an extensive and robust XSLT function library, one of the few of its kind. Do you already use Natural Language Toolkit, Classical Language Toolkit, or comparable packages in programming languages to develop tools for textual and linguistic research? Do you have to process, analyze, and transform texts that are in tree structures? With more than 250 public functions, covering a range of tasks, from numerics to maps, checksums to tree manipulation, the TAN function library might have everything you need, and more, and help you stay within an XML environment. Many TAN functions are extremely useful, even outside of TEI or TAN.

Semantic Web. TAN was designed at the outset to ensure that texts and their annotations would be rooted in the practices of the Semantic Web. Unlike many other formats, whose attribute values are almost always only human-readable, most TAN file components are tied to URIs, making them suitable for use in Semantic Web applications.

Chapter 1. Introduction

Overview