Part II. Detailed Description

This part of the guidelines provides a detailed description of the formats of the Text Alignment Network. The material is organized according to the structure that governs the schema files, so both can be read in tandem.

Chapter 3, General Underpinnings outlines, in a non-technical way, the principles and technical foundations of the TAN format.

Chapter 4, Patterns and Structures Common to All TAN Encoding Formats, Chapter 5, Class-1 TAN Files, Representations of Textual Objects (Scripta), Chapter 6, Class-2 TAN Files, Annotations of Texts, and Chapter 7, Class-3 TAN Files, Varia comprehensively describe all the TAN formats. Each chapter starts with theoretical or scholarly background, to provide a contextual explanation for the technical points that follow.

Chapter 8, TAN patterns, elements, and attributes defined, the first of two very long chapters, provides a comprehensive, detailed explanation of the rules for every element and attribute, as well as the patterns into which they fall. This chapter includes a thorough list of relevant validation rules and examples. It has been written using a stylesheet that traverses the official TAN schemas, functions, and examples.

Chapter 9, Official TAN keywords lists all the vocabulary items that have already been defined as a core part of the format. This chapter is, essentially, a different way of looking at the TAN-key files that are in the TAN-key folder.

The chapters in this part of the guidelines should be read selectively, not consecutively. They have been written with the assumption that you have already read the previous part (Part I, “General Overview”) and that you have already started to create and edit a TAN collection.

Because readers will come from different specialties, all acronyms, abbreviations, and concepts are defined and explained, albeit tersely. Concepts or technologies are discussed only insofar as they affect the use of TAN; suggestions for further reading are provided for those who want a more thorough introduction to a topic.

Table of Contents

3. General Underpinnings
Design Principles
Format Organization
Assumptions in the Creation of TAN Data
Core Technology
Unicode
eXtensible Markup Language (XML)
Namespaces
The Text Encoding Initiative
Data types
Identifiers and Their Use
Regular Expressions
Interpretation of multiple values
4. Patterns and Structures Common to All TAN Encoding Formats
Common Patterns
IRI + name Pattern
Digital Entity Metadata Pattern
Edit Stamp
Overall Structure
@id and a TAN file's IRI Name
Metadata (<head>)
Rights and Licenses
Keys and Inclusions
Distinguishing <source>s and <see-also>s
Attribute inheritability and priority
Defining Words and Tokens
5. Class-1 TAN Files, Representations of Textual Objects (Scripta)
Principles and Assumptions
General
Domain model
One version, one work, one object, one reference system
Normalizing transcriptions
Transcriptions
Flattened References, and the Leaf Div Uniqueness Rule
Transcriptions Using the Text Encoding Initiative (<TEI>)
6. Class-2 TAN Files, Annotations of Texts
Common Elements
Class 2 Metadata (<head>)
Class 2 Data Patterns (<body>)
@pos and @val
Division-Based Annotations and Alignments (<TAN-A-div>)
Root Element and Header
Data (<body>)
Token-Based Annotations and Alignments (<TAN-A-tok>)
Root Element and Header
Data (<body>)
Lexico-Morphology
Principles and Assumptions
Root Element and Header
Data (<body>)
7. Class-3 TAN Files, Varia
Keyword Vocabulary (TAN-key)
Root Element and Head
Data (<body>)
Morphological Concepts and Patterns (TAN-mor)
Principles and Assumptions
Root Element and Header
Data (<body>)
TAN Catalog Files (collection)
8. TAN patterns, elements, and attributes defined
@adverb
@affects-element
@bitext-relation
@by
@cert
@cert2
@chars
@claimant
@code
@def-ref
@div-type
@ed-when
@ed-who
@flags
@from
@group
@help
@href
@id
@idrefs
@in-progress
@include
@lexicon
@licensor
@m-has-features
@m-has-how-many-features
@m-matches
@morphology
@n
@new
@object
@object-datatype
@object-lexical-constraint
@pattern
@period
@pos
@ref
@relationship
@replacement
@reuse-type
@roles
@root
@shallow
@src
@stable
@subject
@TAN-version
@to
@tok-matches
@type
@units
@val
@verb
@when
@when-accessed
@where
@which
@who
@work
@xml:id
@xml:lang
<algorithm>
<alias>
<align>
<alter>
<ambiguous-letter-numerals-are-roman>
<ana>
<assert>
<bitext-relation>
<body>
<category>
<change>
<checksum>
<claim>
<collection>
<comment>
<definitions>
<desc>
<div>
<div-ref>
<div-type>
<doc>
<equate>
<feature>
<for-lang>
<from>
<group>
<group-type>
<head>
<inclusion>
<IRI>
<item>
<key>
<l>
<lexicon>
<license>
<licensor>
<lm>
<location>
<locus>
<m>
<master-location>
<modal>
<morphology>
<name>
<normalization>
<object>
<organization>
<period>
<person>
<place>
<reassign>
<relationship>
<rename>
<replace>
<report>
<resp>
<reuse-type>
<role>
<rule>
<scriptum>
<see-also>
<skip>
<source>
<subject>
<tail>
<TAN-A-div>
<TAN-A-lm>
<TAN-A-tok>
<TAN-key>
<TAN-mor>
<TAN-T>
<to>
<tok>
<token-definition>
<topic>
<unit>
<value>
<verb>
<version>
<where>
<work>
TAN patterns
~abstract-tok-ref
~action-complex-condition
~action-condition
~action-condition-attributes
~action-simple-condition
~agent-ref
~alignment
~alignment-attributes-non-class-2
~alignment-content-non-class-2
~alignment-inclusion-opt
~alt-equate
~alt-norm
~alt-reassign
~alt-rename
~alt-repl
~alt-skip
~alter-class-2
~alter-class-3
~alter-condition
~alter-core
~alter-element
~alter-non-class-2
~alter-non-class-3
~alter-non-core
~alter-statement
~any-attribute
~any-content
~any-element
~assert
~attr-cert
~attr-cert2
~bitext-relation-attr
~body-attributes-non-core
~body-content-class-1
~body-content-class-2
~body-content-class-3
~body-content-core
~body-content-non-class-1
~body-content-non-class-2
~body-content-non-class-3
~body-content-non-core
~body-group
~body-item
~category
~category-list
~cert-claim
~cert-content
~certainty-stamp
~change-log
~char-ref
~checksum
~claim
~claimant-ref
~code
~comment
~complex-object
~complex-rename
~complex-subject
~complex-text-ref
~complex-textual-reference-set
~condition-m-has-features
~condition-m-has-how-many-features
~condition-m-matches
~condition-pattern
~condition-tok-matches
~definition-class-2
~definition-class-3
~definition-core
~definition-list
~definition-non-class-2
~definition-non-class-3
~definition-non-core
~defn-agent
~defn-alg
~defn-alias
~defn-ambig-numerals
~defn-brel
~defn-claims
~defn-class-1
~defn-div-type
~defn-features
~defn-group-type
~defn-id-ref-opt
~defn-lexi
~defn-mode
~defn-morph
~defn-non-class-1
~defn-org
~defn-pattern-default
~defn-pattern-id
~defn-pattern-language
~defn-pattern-no-id
~defn-period
~defn-pers
~defn-place
~defn-relationship
~defn-reus
~defn-role
~defn-scri
~defn-tok-def
~defn-topic
~defn-unit
~defn-verb
~defn-vers
~defn-work
~div-item-ref
~div-range-ref
~div-ref-range
~div-type-ref
~ed-agent
~ed-stamp
~ed-time
~element-scope
~entity-digital-generic-ref
~entity-digital-tan-other-ref
~entity-digital-tan-self-ref
~entity-nondigital-ref
~entity-tok-def
~error-flag
~feature
~feature-ref
~func-param-flags
~func-param-pattern
~func-replace
~grammar-attr
~group-attributes
~group-ref
~head-prelude
~head-prelude-core
~head-prelude-non-core
~help-opt
~href-opt
~id-option
~inclusion
~inclusion-att
~inclusion-item
~inclusion-list
~increment
~internal-idrefs
~internal-non-xml-id
~internal-xml-id
~IRI-gen
~IRI-gen-ref
~item-picker
~item-pos-ref
~key-item
~key-list
~keyword-ref
~lang-of-content
~lang-outside
~lang-preface
~lexeme
~lexicon-attr
~licensor
~lm-tok-ref
~loc-self
~loc-src
~locus
~metadata-desc
~metadata-human
~modal-ref
~morph
~morphology-rule
~n
~new-name
~new-ref-name
~non-class-2-opt
~nonsource-license
~nontextual-reference
~object
~object-constraint
~object-datatype
~object-element
~object-lexical-constraint
~object-ref
~period-ref
~place-ref
~pointer-to-div-item
~pointer-to-div-range
~progress
~relationship
~report
~resp-item
~resp-list
~reuse-type-attr
~role-ref
~see-also-item
~see-also-list
~seq-picker
~seq-pos-ref
~shallow-option
~simple-rename
~simple-textual-reference
~source-id-opt
~source-item
~source-list
~source-ref
~sources-ref
~subject
~subject-ref
~TAN-A-lm-item
~TAN-body
~TAN-head
~TAN-key-item
~TAN-R-mor-body
~TAN-root
~TAN-tail
~TAN-ver
~target-div-ref
~text-div
~textual-reference
~tok-cert-opt
~tok-mult-selector-attributes
~tok-range-selector
~tok-ref
~tok-ref-group
~tok-ref-item
~tok-ref-range
~tok-single-selector-attributes
~tok-sources-ref-opt
~token-value-ref
~type
~units
~URI-tag
~verb-ref
~when-claim
~work-ref
9. Official TAN keywords
TAN keywords for types of bitext relations (<bitext-relation>)
TAN keywords for types of divisions (<div-type>)
TAN keywords for features (<feature>)
TAN keywords for types of groups (<group-type>)
TAN keywords for types of rights (<license>)
TAN keywords for types of modals (<modal>)
TAN keywords for types of normalizations (<normalization>)
TAN keywords for types of relationships (<relationship>)
TAN keywords for types of bitext reuse (<reuse-type>)
TAN keywords for types of roles (<role>)
TAN keywords for types of token definitions (<token-definition>)
TAN keywords for verbs (<verb>)