This part of the guidelines provides a detailed description of the design and
structure of the formats of the Text Alignment Network. The material follows the
organization of the schema files (kept in the schemas
subdirectory), so
both can be studied in tandem.
Chapter 3, General underpinnings outlines, in a non-technical way, the principles
and technical foundations of the TAN format.
Chapter 4, Common patterns and structures, Chapter 5, Class-1 TAN files, representations of textual objects
(scripta), Chapter 6, Class-2 TAN files, annotations of texts,
and Chapter 7, Class-3 TAN Files, Varia describe each TAN format, class by class. Each chapter
starts with theory or scholarly context before expanding on technical points.
The chapters in this part have been written with the assumption that you have already
read the previous part (Part I, “General overview”) and that you have already
started to create or edit a TAN collection.
Because readers will come from different specialties, all acronyms, abbreviations,
and concepts are defined and explained, albeit tersely, to explain how they affect the
use of TAN. Suggestions for further reading are provided for those who want a more
thorough introduction to a topic.
Table of Contents
- 3. General underpinnings
- Design principles
- Format organization
- Assumptions in the creation of TAN data
- Core technology
- Unicode
- eXtensible Markup Language (XML)
- Namespaces
- The Text Encoding Initiative
- Data types
- Identifiers and their use (IRIs, URIs, URLs, URNs, UUIDs)
- Regular expressions
- 4. Common patterns and structures
- Common patterns
- IRI + name pattern
- Digital entity metadata pattern
- Edit stamp
- Overall structure
- Identifying TAN files: @id
- TAN file versions
- Attribute inheritability and priority
- Defining words and tokens
- Metadata (<head>)
- Key Information
- Key Declarations
- Networked Files
- Adjustments
- Local vocabulary items and ID assignments: <vocabulary-key>
- Responsibility
- Change log
- Pending work
- 5. Class-1 TAN files, representations of textual objects
(scripta)
- Principles and assumptions
- General
- Domain model
- One version, one work, one scriptum, one reference system
- Normalizing transcriptions
- Class 1 metadata
- Class 1 data
- Transcriptions using the Text Encoding Initiative (
<TEI>
) - TAN-TEI
- TEI customization
- Converting TEI to TAN-TEI
- 6. Class-2 TAN files, annotations of texts
- Common elements
- Class 2 metadata (<head>)
- Class 2 data (<body>)
- Class 2 pointer syntax: referencing texts
- General annotations and alignments (<TAN-A>)
- Root element and header
- Data (<body>)
- Token-based annotations and alignments (<TAN-A-tok>)
- Root Element and Header
- Data (<body>)
- Lexico-morphology (<TAN-A-lm>)
- Principles and assumptions
- Root Element and Header
- Data (<body>)
- 7. Class-3 TAN Files, Varia
- Vocabulary (
TAN-voc
) - Root Element and Head
- Data (<body>)
- Morphological Concepts and Patterns (
TAN-mor
) - Principles and Assumptions
- Root Element and Header
- Data (<body>)
- TAN Catalog Files (
collection
)