Part II. Detailed description
Prev		Next

Part II. Detailed description

This part of the guidelines provides a detailed description of the design and structure of the formats of the Text Alignment Network. The material follows the organization of the schema files (kept in the schemas subdirectory), so both can be studied in tandem.

Chapter 3, General underpinnings outlines, in a non-technical way, the principles and technical foundations of the TAN format.

Chapter 4, Common patterns and structures, Chapter 5, Class-1 TAN files, representations of textual objects (scripta), Chapter 6, Class-2 TAN files, annotations of texts, and Chapter 7, Class-3 TAN Files, Varia describe each TAN format, class by class. Each chapter starts with theory or scholarly context before expanding on technical points.

The chapters in this part have been written with the assumption that you have already read the previous part (Part I, “General overview”) and that you have already started to create or edit a TAN collection.

Because readers will come from different specialties, all acronyms, abbreviations, and concepts are defined and explained, albeit tersely, to explain how they affect the use of TAN. Suggestions for further reading are provided for those who want a more thorough introduction to a topic.

Table of Contents

3. General underpinnings

Design principles

Format organization

Assumptions in the creation of TAN data

Core technology

Unicode
eXtensible Markup Language (XML)
Namespaces
The Text Encoding Initiative
Data types
Identifiers and their use (IRIs, URIs, URLs, URNs, UUIDs)
Regular expressions

4. Common patterns and structures

Common patterns

IRI + name pattern
Digital entity metadata pattern
Edit stamp

Overall structure

Identifying TAN files: @id
TAN file versions

Attribute inheritability and priority

Defining words and tokens

Metadata (<head>)

Key Information
Key Declarations
Networked Files
Adjustments
Local vocabulary items and ID assignments: <vocabulary-key>
Responsibility
Change log
Pending work

5. Class-1 TAN files, representations of textual objects (scripta)

Principles and assumptions

General
Domain model
One version, one work, one scriptum, one reference system
Normalizing transcriptions

Class 1 metadata

Class 1 data

Transcriptions using the Text Encoding Initiative (<TEI>)

TAN-TEI
TEI customization
Converting TEI to TAN-TEI

6. Class-2 TAN files, annotations of texts

Common elements

Class 2 metadata (<head>)
Class 2 data (<body>)
Class 2 pointer syntax: referencing texts

General annotations and alignments (<TAN-A>)

Root element and header
Data (<body>)

Token-based annotations and alignments (<TAN-A-tok>)

Root Element and Header
Data (<body>)

Lexico-morphology (<TAN-A-lm>)

Principles and assumptions
Root Element and Header
Data (<body>)

7. Class-3 TAN Files, Varia

Vocabulary (TAN-voc)

Root Element and Head
Data (<body>)

Morphological Concepts and Patterns (TAN-mor)

Principles and Assumptions
Root Element and Header
Data (<body>)

TAN Catalog Files (collection)

Prev		Next
Aligning across projects	Home	Chapter 3. General underpinnings