Part II. Detailed Description

This part of the guidelines provides a detailed description of the formats of the Text Alignment Network. The material is organized according to the structure that governs the schema files, so both can be read in tandem.

Chapter 3, General Underpinnings outlines, in a non-technical way, the principles and technical foundations of the TAN format.

Chapter 4, Patterns and Structures Common to All TAN Encoding Formats, Chapter 5, Class-1 TAN Files, Representations of Textual Objects (Scripta), Chapter 6, Class-2 TAN Files, Annotations of Texts, and Chapter 7, Class-3 TAN Files, Varia comprehensively describe all the TAN formats. Each chapter covers preliminary theoretical or scholarly considerations, discussiong how the features of each TAN format are meant to be interpreted as a whole.

Chapter 8, TAN patterns, elements, and attributes defined, the first of two very long chapters, provides a comprehensive, detailed explanation of the rules for every element and attribute, as well as the patterns into which they fall. This chapter includes a thorough list of relevant validation rules and examples. It has been written using a stylesheet that traverses the official TAN schemas, functions, and examples.

Chapter 9, Official TAN keywords lists all the vocabulary items that have already been defined as a core part of the format. This chapter is, essentially, a re-presentation of the TAN-key files that are in the TAN-key folder.

The chapters in this part of the guidelines should be read selectively, not consecutively. They have been written with the assumption that you have already read the previous part (Part I, “General Overview”) and that you have already started to create and edit a TAN collection.

Because readers will come from different specialties, all acronyms, abbreviations, and concepts are defined and explained, albeit tersely. Concepts or technologies are discussed only insofar as they affect the use of TAN; suggestions for further reading are provided for those who want a more thorough introduction to a topic.

Table of Contents

3. General Underpinnings
The Big Picture
Assumptions in the Creation of TAN Data
Core Technology
Unicode
eXtensible Markup Language (XML)
Namespaces
The Text Encoding Initiative
Data types
Identifiers and Their Use
Regular Expressions
Interpretation of multiple values
4. Patterns and Structures Common to All TAN Encoding Formats
Common Patterns
IRI + name Pattern
Digital Entity Metadata Pattern
Edit Stamp
Overall Structure (root)
@id and a TAN file's IRI Name
Metadata (<head>)
Rights and Licenses
Inclusions and Keys
Distinguishing <source>s and <see-also>s
Interpretation of inheritable attributes
Defining Words and Tokens
5. Class-1 TAN Files, Representations of Textual Objects (Scripta)
Principles and Assumptions
General
Domain model
One version, one work, one object, one reference system
Normalizing transcriptions
Transcriptions
Flattened References, and the Leaf Div Uniqueness Rule
Transcriptions Using the Text Encoding Initiative (<TEI>)
6. Class-2 TAN Files, Annotations of Texts
Common Elements
Class 2 Validation
Class 2 Metadata (<head>)
Class 2 Data Patterns (<body>)
@pos and @val
Alignments: Principles and Assumptions
Division-Based Alignments (<TAN-A-div>)
Root Element and Header
Data (<body>)
Token-Based Alignments (<TAN-A-tok>)
Root Element and Header
Data (<body>)
Lexico-Morphology
Principles and Assumptions
Root Element and Header
Data (<body>)
7. Class-3 TAN Files, Varia
Keyword Vocabulary (TAN-key)
Root Element and Head
Data (<body>)
Morphological Concepts and Patterns (TAN-mor)
Principles and Assumptions
Root Element and Header
Data (<body>)
Claims and assertions (TAN-c)
Root Element and Header
Data (<body>)
8. TAN patterns, elements, and attributes defined
TAN-core elements and attributes summarized
<agent>
<agentrole>
<alias>
<body>
<change>
<checksum>
<comment>
<declarations>
<desc>
<for-lang>
<group>
<group-type>
<head>
<inclusion>
<IRI>
<key>
<location>
<master-location>
<name>
<relationship>
<rights-excluding-sources>
<rights-source-only>
<role>
<see-also>
<source>
<tail>
<token-definition>
<value>
<version>
<when>
<work>
@affects-element
@cert
@cert2
@ed-when
@ed-who
@flags
@from
@group
@help
@href
@id
@idrefs
@in-progress
@include
@n
@regex
@rights-holder
@roles
@TAN-version
@to
@type
@when
@when-accessed
@which
@who
@xml:id
@xml:lang
TAN-class-1 elements and attributes summarized
<div-type>
<filter>
<normalization>
<replace>
<transliteration>
@replacement
TAN-T elements and attributes summarized
<div>
<TAN-T>
TAN-class-2 elements and attributes summarized
<rename>
<rename-div-ns>
<suppress-div-types>
<tok>
@chars
@cont
@div-type-ref
@new
@old
@pos
@ref
@src
@val
TAN-A-div elements and attributes summarized
<anchor-div-ref>
<div-ref>
<div-type-ref>
<equate-div-types>
<equate-works>
<realign>
<split-leaf-div-at>
<TAN-A-div>
@seg
@work
TAN-A-tok elements and attributes summarized
<align>
<bitext-relation>
<reuse-type>
<TAN-A-tok>
@bitext-relation
@reuse-type
TAN-LM-core elements and attributes summarized
<ana>
<l>
<lexicon>
<lm>
<m>
<morphology>
<TAN-LM>
@def-ref
@lexicon
@morphology
TAN-LM elements and attributes summarized
TAN-LM-lang elements and attributes summarized
TAN-class-3 elements and attributes summarized
TAN-key elements and attributes summarized
<item>
<TAN-key>
TAN-mor elements and attributes summarized
<assert>
<category>
<feature>
<report>
<TAN-mor>
@code
@context
@feature-qty-test
@feature-test
@matches-m
@matches-tok
TAN-c elements and attributes summarized
<TAN-c>
TAN-c-core elements and attributes summarized
<claim>
<claim-basis>
<locus>
<modal>
<object>
<person>
<place>
<scriptum>
<subject>
<topic>
<unit>
<verb>
@adverb
@claim-basis
@claimant
@object
@object-datatype
@object-lexical-constraint
@subject
@units
@verb
@where
TAN patterns
~agent-list
~agent-ref
~agent-role-list
~alignment
~alignment-attributes-non-class-2
~alignment-content-non-class-2
~alignment-inclusion-opt
~anchor-div-ref-item
~any-attribute
~any-content
~any-element
~assert
~attr-cert
~attr-cert2
~bitext-relation-attr
~body-group
~body-group-opt
~category
~category-feature
~category-list
~cert-claim
~cert-content
~cert-opt
~certainty-stamp
~change-list
~char-ref
~checksum
~claim
~claim-div-ref-item
~claimant
~code
~comment
~complex-object
~complex-rationale
~complex-subject
~complex-text-ref
~complex-textual-reference-set
~continuation
~continuation-opt
~decl-alias
~decl-brel
~decl-class-1
~decl-div
~decl-filt
~decl-filt-norm
~decl-filt-repl
~decl-filt-tlit
~decl-filter-content
~decl-group-type
~decl-id-ref-opt
~decl-lexi
~decl-mode
~decl-morph
~decl-non-class-1
~decl-opt
~decl-pattern-default
~decl-pattern-language
~decl-pattern-no-id
~decl-pers
~decl-place
~decl-rename-div-n
~decl-reus
~decl-scri
~decl-supp-div-type
~decl-tok-def
~decl-topic
~decl-unit
~decl-verb
~decl-vers
~decl-work
~declaration-core
~declaration-items
~div-item-ref
~div-range-ref
~div-type-equiv
~div-type-ref
~div-type-ref-cluster
~ed-agent
~ed-stamp
~ed-time
~element-scope
~entity-digital-generic-ref
~entity-digital-tan-other-ref
~entity-digital-tan-self-ref
~entity-nondigital-ref
~entity-tok-def
~error-flag
~feature
~feature-list
~feature-pattern
~feature-pattern-no-code
~feature-qty-test
~feature-test
~filter
~func-param-flags
~func-param-pattern
~func-replace
~grammar-attr
~group-attributes
~group-ref
~help-opt
~href-opt
~id-option
~inclusion
~inclusion-att
~inclusion-item
~inclusion-list
~internal-id
~internal-idrefs
~IRI-gen
~IRI-gen-ref
~item
~item-picker
~item-pos-ref
~key-item
~key-list
~keyword-ref
~lang-of-content
~lang-outside
~lexeme
~lexicon-attr
~loc-self
~loc-src
~locus
~matches-m
~matches-tok
~metadata-desc
~metadata-human
~modal-claim
~morph
~n
~n-val
~name-change
~non-class-2-opt
~nonsource-rights
~nontextual-reference
~object
~object-constraint
~object-datatype
~object-element
~object-lexical-constraint
~other-body-attributes
~period-filter
~place-filter
~pointer-to-div-item
~pointer-to-div-range
~progress
~rationale
~realignment
~reanchor-div-ref-item
~relationship
~report
~reuse-type-attr
~rights-holder
~role-list
~role-ref
~see-also-item
~see-also-list
~seg-ref
~seq-picker
~seq-pos-ref
~set-of-claims
~simple-object
~simple-rationale
~simple-subject
~simple-textual-reference
~source-id-opt
~source-item
~source-list
~source-ref
~source-refs
~source-rights
~split
~subject
~TAN-body
~TAN-body-core
~TAN-c-decl
~TAN-c-decl-core
~TAN-c-item
~TAN-head
~TAN-key-decl
~TAN-key-item
~TAN-LM-item
~TAN-R-mor-body
~TAN-root
~TAN-tail
~TAN-ver
~test-pattern
~text-div
~textual-reference
~tok-attr-core
~tok-cert-opt
~tok-regular
~tok-sequence
~tok-sequence-attr-core
~tok-source-ref-opt
~tok-with-cont-but-no-src
~tok-with-src-and-cont
~tok-without-cont-or-src
~token-value-ref
~type
~units
~URI-tag
~verb
~when-claim
~work-equiv
~work-ref
~work-refs
9. Official TAN keywords
TAN keywords for types of bitext relations (<bitext-relation>)
TAN keywords for types of divisions (<div-type>)
TAN keywords for features (<feature>)
TAN keywords for types of groups (<group-type>)
TAN keywords for types of modals (<modal>)
TAN keywords for types of normalizations (<normalization>)
TAN keywords for types of relationships (<relationship>)
TAN keywords for types of bitext reuse (<reuse-type>)
TAN keywords for types of rights (<rights-excluding-sources><rights-source-only>)
TAN keywords for types of roles (<role>)
TAN keywords for types of token definitions (<token-definition>)
TAN keywords for verbs (<verb>)