The Text Alignment Network: Official Guidelines

Revision History
Revision 20182018-01-09

Formats: HTMLPDFDocbook (master)

[Warning]Warning

In case of contradictions, apparent or not, between these guidelines and the core TAN files, priority should be given to the RELAX-NG schemas (compact syntax), then to the functions, and then to these guidelines.


Table of Contents

I. General Overview
1. Introduction
Definition and purpose
Rationale and Purpose
About this version
Participation
2. Starting off with the TAN Format
Creating TAN Transcription and Alignment Data
The Principles of TAN Metadata (<head>)
Creating TAN Metadata (<head>)
Aligning across Projects
II. Detailed Description
3. General Underpinnings
Design Principles
Format Organization
Assumptions in the Creation of TAN Data
Core Technology
Unicode
eXtensible Markup Language (XML)
Namespaces
The Text Encoding Initiative
Data types
Identifiers and Their Use
Regular Expressions
Interpretation of multiple values
4. Patterns and Structures Common to All TAN Encoding Formats
Common Patterns
IRI + name Pattern
Digital Entity Metadata Pattern
Edit Stamp
Overall Structure
@id and a TAN file's IRI Name
Metadata (<head>)
Rights and Licenses
Keys and Inclusions
Distinguishing <source>s and <see-also>s
Attribute inheritability and priority
Defining Words and Tokens
5. Class-1 TAN Files, Representations of Textual Objects (Scripta)
Principles and Assumptions
General
Domain model
One version, one work, one object, one reference system
Normalizing transcriptions
Transcriptions
Flattened References, and the Leaf Div Uniqueness Rule
Transcriptions Using the Text Encoding Initiative (<TEI>)
6. Class-2 TAN Files, Annotations of Texts
Common Elements
Class 2 Metadata (<head>)
Class 2 Data Patterns (<body>)
@pos and @val
Division-Based Annotations and Alignments (<TAN-A-div>)
Root Element and Header
Data (<body>)
Token-Based Annotations and Alignments (<TAN-A-tok>)
Root Element and Header
Data (<body>)
Lexico-Morphology
Principles and Assumptions
Root Element and Header
Data (<body>)
7. Class-3 TAN Files, Varia
Keyword Vocabulary (TAN-key)
Root Element and Head
Data (<body>)
Morphological Concepts and Patterns (TAN-mor)
Principles and Assumptions
Root Element and Header
Data (<body>)
TAN Catalog Files (collection)
8. TAN patterns, elements, and attributes defined
@adverb
@affects-element
@bitext-relation
@by
@cert
@cert2
@chars
@claimant
@code
@def-ref
@div-type
@ed-when
@ed-who
@flags
@from
@group
@help
@href
@id
@idrefs
@in-progress
@include
@lexicon
@licensor
@m-has-features
@m-has-how-many-features
@m-matches
@morphology
@n
@new
@object
@object-datatype
@object-lexical-constraint
@pattern
@period
@pos
@ref
@relationship
@replacement
@reuse-type
@roles
@root
@shallow
@src
@stable
@subject
@TAN-version
@to
@tok-matches
@type
@units
@val
@verb
@when
@when-accessed
@where
@which
@who
@work
@xml:id
@xml:lang
<algorithm>
<alias>
<align>
<alter>
<ambiguous-letter-numerals-are-roman>
<ana>
<assert>
<bitext-relation>
<body>
<category>
<change>
<checksum>
<claim>
<collection>
<comment>
<definitions>
<desc>
<div>
<div-ref>
<div-type>
<doc>
<equate>
<feature>
<for-lang>
<from>
<group>
<group-type>
<head>
<inclusion>
<IRI>
<item>
<key>
<l>
<lexicon>
<license>
<licensor>
<lm>
<location>
<locus>
<m>
<master-location>
<modal>
<morphology>
<name>
<normalization>
<object>
<organization>
<period>
<person>
<place>
<reassign>
<relationship>
<rename>
<replace>
<report>
<resp>
<reuse-type>
<role>
<rule>
<scriptum>
<see-also>
<skip>
<source>
<subject>
<tail>
<TAN-A-div>
<TAN-A-lm>
<TAN-A-tok>
<TAN-key>
<TAN-mor>
<TAN-T>
<to>
<tok>
<token-definition>
<topic>
<unit>
<value>
<verb>
<version>
<where>
<work>
TAN patterns
~abstract-tok-ref
~action-complex-condition
~action-condition
~action-condition-attributes
~action-simple-condition
~agent-ref
~alignment
~alignment-attributes-non-class-2
~alignment-content-non-class-2
~alignment-inclusion-opt
~alt-equate
~alt-norm
~alt-reassign
~alt-rename
~alt-repl
~alt-skip
~alter-class-2
~alter-class-3
~alter-condition
~alter-core
~alter-element
~alter-non-class-2
~alter-non-class-3
~alter-non-core
~alter-statement
~any-attribute
~any-content
~any-element
~assert
~attr-cert
~attr-cert2
~bitext-relation-attr
~body-attributes-non-core
~body-content-class-1
~body-content-class-2
~body-content-class-3
~body-content-core
~body-content-non-class-1
~body-content-non-class-2
~body-content-non-class-3
~body-content-non-core
~body-group
~body-item
~category
~category-list
~cert-claim
~cert-content
~certainty-stamp
~change-log
~char-ref
~checksum
~claim
~claimant-ref
~code
~comment
~complex-object
~complex-rename
~complex-subject
~complex-text-ref
~complex-textual-reference-set
~condition-m-has-features
~condition-m-has-how-many-features
~condition-m-matches
~condition-pattern
~condition-tok-matches
~definition-class-2
~definition-class-3
~definition-core
~definition-list
~definition-non-class-2
~definition-non-class-3
~definition-non-core
~defn-agent
~defn-alg
~defn-alias
~defn-ambig-numerals
~defn-brel
~defn-claims
~defn-class-1
~defn-div-type
~defn-features
~defn-group-type
~defn-id-ref-opt
~defn-lexi
~defn-mode
~defn-morph
~defn-non-class-1
~defn-org
~defn-pattern-default
~defn-pattern-id
~defn-pattern-language
~defn-pattern-no-id
~defn-period
~defn-pers
~defn-place
~defn-relationship
~defn-reus
~defn-role
~defn-scri
~defn-tok-def
~defn-topic
~defn-unit
~defn-verb
~defn-vers
~defn-work
~div-item-ref
~div-range-ref
~div-ref-range
~div-type-ref
~ed-agent
~ed-stamp
~ed-time
~element-scope
~entity-digital-generic-ref
~entity-digital-tan-other-ref
~entity-digital-tan-self-ref
~entity-nondigital-ref
~entity-tok-def
~error-flag
~feature
~feature-ref
~func-param-flags
~func-param-pattern
~func-replace
~grammar-attr
~group-attributes
~group-ref
~head-prelude
~head-prelude-core
~head-prelude-non-core
~help-opt
~href-opt
~id-option
~inclusion
~inclusion-att
~inclusion-item
~inclusion-list
~increment
~internal-idrefs
~internal-non-xml-id
~internal-xml-id
~IRI-gen
~IRI-gen-ref
~item-picker
~item-pos-ref
~key-item
~key-list
~keyword-ref
~lang-of-content
~lang-outside
~lang-preface
~lexeme
~lexicon-attr
~licensor
~lm-tok-ref
~loc-self
~loc-src
~locus
~metadata-desc
~metadata-human
~modal-ref
~morph
~morphology-rule
~n
~new-name
~new-ref-name
~non-class-2-opt
~nonsource-license
~nontextual-reference
~object
~object-constraint
~object-datatype
~object-element
~object-lexical-constraint
~object-ref
~period-ref
~place-ref
~pointer-to-div-item
~pointer-to-div-range
~progress
~relationship
~report
~resp-item
~resp-list
~reuse-type-attr
~role-ref
~see-also-item
~see-also-list
~seq-picker
~seq-pos-ref
~shallow-option
~simple-rename
~simple-textual-reference
~source-id-opt
~source-item
~source-list
~source-ref
~sources-ref
~subject
~subject-ref
~TAN-A-lm-item
~TAN-body
~TAN-head
~TAN-key-item
~TAN-R-mor-body
~TAN-root
~TAN-tail
~TAN-ver
~target-div-ref
~text-div
~textual-reference
~tok-cert-opt
~tok-mult-selector-attributes
~tok-range-selector
~tok-ref
~tok-ref-group
~tok-ref-item
~tok-ref-range
~tok-single-selector-attributes
~tok-sources-ref-opt
~token-value-ref
~type
~units
~URI-tag
~verb-ref
~when-claim
~work-ref
9. Official TAN keywords
TAN keywords for types of bitext relations (<bitext-relation>)
TAN keywords for types of divisions (<div-type>)
TAN keywords for features (<feature>)
TAN keywords for types of groups (<group-type>)
TAN keywords for types of rights (<license>)
TAN keywords for types of modals (<modal>)
TAN keywords for types of normalizations (<normalization>)
TAN keywords for types of relationships (<relationship>)
TAN keywords for types of bitext reuse (<reuse-type>)
TAN keywords for types of roles (<role>)
TAN keywords for types of token definitions (<token-definition>)
TAN keywords for verbs (<verb>)
III. Working with the Text Alignment Network
10. Best Practices in Working with TAN Files
Local Setup
Creating and populating TAN files
Sharing TAN files
Doing things with TAN files
11. TAN variables, keys, functions, and templates
TAN-core global variables, keys, and functions summarized
variables
keys
functions
TAN-core-errors global variables, keys, and functions summarized
variables
functions
TAN-core-resolve global variables, keys, and functions summarized
functions
TAN-core-expand global variables, keys, and functions summarized
functions
TAN-core-string global variables, keys, and functions summarized
variables
functions
TAN-class-1 global variables, keys, and functions summarized
variables
keys
functions
TAN-class-2 global variables, keys, and functions summarized
variables
keys
TAN-A-div global variables, keys, and functions summarized
functions
TAN-A-lm global variables, keys, and functions summarized
variables
TAN-class-3 global variables, keys, and functions summarized
functions
TAN-extra global variables, keys, and functions summarized
variables
functions
TAN-function global variables, keys, and functions summarized
variables
functions
regex-ext-tan global variables, keys, and functions summarized
variables
functions
templates
TAN-schema global variables, keys, and functions summarized
variables
functions
Mode templates
ŧ #all
ŧ add-square-brackets
ŧ analyze-string-length-pass-1
ŧ analyze-string-length-pass-2
ŧ catalog-expansion-terse
ŧ check-referred-doc
ŧ class-1-expansion-verbose
ŧ class-2-expansion-normal
ŧ class-2-expansion-terse
ŧ class-2-expansion-terse-pass-2
ŧ class-2-expansion-verbose
ŧ copy-of-except
ŧ core-expansion-normal
ŧ core-expansion-terse
ŧ core-expansion-terse-alias
ŧ core-expansion-terse-attributes
ŧ core-expansion-verbose
ŧ core-resolution-arabic-numerals
ŧ dependencies-tokenized-selectively
ŧ dependency-expansion-normal
ŧ dependency-expansion-terse
ŧ dependency-expansion-terse-no-alter
ŧ dependency-expansion-verbose
ŧ diff-to-collation
ŧ divs-excluding-what-qs
ŧ element-to-error
ŧ evaluate-conditions
ŧ expand-tan-key-dependencies
ŧ first-stamp
ŧ fragment-to-text
ŧ infuse-tokenized-div
ŧ infuse-tokenized-text
ŧ merge-divs
ŧ merge-expanded-docs-prep
ŧ no-misfit-divs-or-anchors
ŧ normalize-tei-space
ŧ normalize-xml-fragment-space
ŧ only-misfit-divs
ŧ only-misfit-divs-and-anchors
ŧ pluck
ŧ prepend-id-or-idrefs
ŧ reconstruct-div-hierarchy
ŧ reset-hierarchy
ŧ resolve-attr-include
ŧ resolve-href
ŧ resolve-keyword
ŧ snap-to-word-pass-1
ŧ string-to-numerals
ŧ strip-all-attributes-except
ŧ strip-duplicate-children-by-attribute-value
ŧ strip-duplicates
ŧ strip-specific-attributes
ŧ strip-text
ŧ text-join
ŧ text-only
ŧ tokenize-div
12. Errors
error[adv01]
error[adv02]
error[adv03]
error[cat01]
error[cat02]
error[cat03]
warning[cat04]
warning[cat05]
warning[cat06]
warning[cat07]
error[chr01]
error[cl101]
error[cl102]
error[cl103]
error[cl104]
error[cl106]
warning[cl107]
error[cl108]
error[cl109]
error[cl110]
error[cl111]
error[cl112]
error[cl113]
warning[cl115]
warning[cl116]
error[cl117]
fatal[cl201]
error[cl202]
error[cl203]
warning[cl211]
warning[cl212]
error[cl213]
warning[cl214]
error[cl215]
error[cl216]
error[cl217]
error[clm01]
error[clm02]
error[clm03]
error[clm04]
error[clm05]
error[clm06]
error[clm07]
error[dty01]
error[inc02]
error[inc03]
fatal[inc04]
error[loc01]
error[loc02]
error[loc03]
error[rea01]
warning[rea02]
warning[rea03]
error[ref01]
warning[ref02]
error[see01]
error[see03]
error[see04]
error[seq01]
error[seq02]
error[seq03]
error[seq04]
error[seq05]
error[tan01]
error[tan02]
error[tan03]
error[tan04]
error[tan05]
error[tan06]
error[tan07]
error[tan08]
error[tan09]
error[tan10]
error[tan11]
error[tan13]
error[tan14]
error[tan15]
error[tan16]
error[tan17]
warning[tan18]
error[tan19]
error[tan20]
error[tei01]
error[tei02]
error[tei03]
warning[tei04]
error[tei05]
error[tky01]
error[tky02]
error[tky03]
error[tky04]
error[tlm01]
error[tlm02]
error[tlm03]
error[tlm04]
error[tmo02]
error[tok01]
error[tok02]
error[whe01]
error[whe02]
error[whe03]
error[whi01]
error[whi02]
error[whi03]
fatal[whi04]
warning[wrn01]
warning[wrn02]
warning[wrn03]
warning[wrn04]
warning[wrn05]
warning[wrn06]

List of Figures

3.1. Venn%20diagram.jpeg

List of Tables

2.1. Ring around the Rosie
3.1. Unicode characters
3.2. Special characters in regular expressions
3.3. Examples of Regular Expressions
4.1. Root TAN elements
5.1. Synopsis of TAN-TEI customization
9.1. TAN keywords for types of bitext relations
9.2. TAN keywords for types of divisions
9.3. TAN keywords for features
9.4. TAN keywords for types of groups
9.5. TAN keywords for types of rights
9.6. TAN keywords for types of modals
9.7. TAN keywords for types of normalizations
9.8. TAN keywords for types of relationships
9.9. TAN keywords for types of bitext reuse
9.10. TAN keywords for types of roles
9.11. TAN keywords for types of token definitions
9.12. TAN keywords for verbs
10.1. Global variables for referred files

List of Examples

3.1. TAN IRI names
8.1. @adverb
8.2. @affects-element
8.3. @affects-element
8.4. @affects-element
8.5. @affects-element
8.6. @bitext-relation
8.7. @bitext-relation
8.8. @bitext-relation
8.9. @by
8.10. @cert
8.11. @cert
8.12. @claimant
8.13. @claimant
8.14. @claimant
8.15. @div-type
8.16. @div-type
8.17. @div-type
8.18. @div-type
8.19. @ed-when
8.20. @ed-when
8.21. @ed-when
8.22. @ed-who
8.23. @ed-who
8.24. @ed-who
8.25. @flags
8.26. @group
8.27. @href
8.28. @href
8.29. @id
8.30. @id
8.31. @id
8.32. @id
8.33. @idrefs
8.34. @in-progress
8.35. @in-progress
8.36. @in-progress
8.37. @in-progress
8.38. @include
8.39. @lexicon
8.40. @lexicon
8.41. @m-has-features
8.42. @m-has-how-many-features
8.43. @m-matches
8.44. @morphology
8.45. @morphology
8.46. @n
8.47. @new
8.48. @object
8.49. @object-datatype
8.50. @pattern
8.51. @pattern
8.52. @pattern
8.53. @pattern
8.54. @pos
8.55. @ref
8.56. @relationship
8.57. @relationship
8.58. @reuse-type
8.59. @reuse-type
8.60. @reuse-type
8.61. @roles
8.62. @roles
8.63. @root
8.64. @src
8.65. @stable
8.66. @stable
8.67. @TAN-version
8.68. @TAN-version
8.69. @TAN-version
8.70. @TAN-version
8.71. @tok-matches
8.72. @type
8.73. @val
8.74. @verb
8.75. @when
8.76. @when
8.77. @when-accessed
8.78. @when-accessed
8.79. @which
8.80. @who
8.81. @work
8.82. @work
8.83. @xml:id
8.84. @xml:lang
8.85. @xml:lang
8.86. <algorithm>
8.87. <algorithm>
8.88. <algorithm>
8.89. <algorithm>
8.90. <alias>
8.91. <align>
8.92. <alter>
8.93. <alter>
8.94. <alter>
8.95. <alter>
8.96. <ambiguous-letter-numerals-are-roman>
8.97. <ambiguous-letter-numerals-are-roman>
8.98. <ana>
8.99. <assert>
8.100. <bitext-relation>
8.101. <bitext-relation>
8.102. <bitext-relation>
8.103. <body>
8.104. <body>
8.105. <body>
8.106. <body>
8.107. <change>
8.108. <change>
8.109. <checksum>
8.110. <comment>
8.111. <comment>
8.112. <comment>
8.113. <comment>
8.114. <definitions>
8.115. <definitions>
8.116. <definitions>
8.117. <definitions>
8.118. <desc>
8.119. <desc>
8.120. <div>
8.121. <div-type>
8.122. <div-type>
8.123. <equate>
8.124. <feature>
8.125. <for-lang>
8.126. <for-lang>
8.127. <group>
8.128. <group-type>
8.129. <group-type>
8.130. <head>
8.131. <head>
8.132. <head>
8.133. <head>
8.134. <inclusion>
8.135. <IRI>
8.136. <item>
8.137. <key>
8.138. <l>
8.139. <lexicon>
8.140. <lexicon>
8.141. <license>
8.142. <license>
8.143. <license>
8.144. <license>
8.145. <licensor>
8.146. <licensor>
8.147. <licensor>
8.148. <licensor>
8.149. <lm>
8.150. <location>
8.151. <location>
8.152. <locus>
8.153. <master-location>
8.154. <master-location>
8.155. <master-location>
8.156. <master-location>
8.157. <modal>
8.158. <morphology>
8.159. <morphology>
8.160. <name>
8.161. <normalization>
8.162. <normalization>
8.163. <normalization>
8.164. <normalization>
8.165. <object>
8.166. <person>
8.167. <person>
8.168. <person>
8.169. <person>
8.170. <reassign>
8.171. <relationship>
8.172. <relationship>
8.173. <rename>
8.174. <report>
8.175. <resp>
8.176. <resp>
8.177. <reuse-type>
8.178. <reuse-type>
8.179. <reuse-type>
8.180. <role>
8.181. <role>
8.182. <rule>
8.183. <scriptum>
8.184. <see-also>
8.185. <see-also>
8.186. <skip>
8.187. <skip>
8.188. <skip>
8.189. <skip>
8.190. <source>
8.191. <source>
8.192. <source>
8.193. <source>
8.194. <subject>
8.195. <subject>
8.196. <TAN-A-div>
8.197. <TAN-A-div>
8.198. <TAN-A-div>
8.199. <TAN-A-lm>
8.200. <TAN-A-lm>
8.201. <TAN-A-tok>
8.202. <TAN-A-tok>
8.203. <TAN-A-tok>
8.204. <TAN-key>
8.205. <TAN-key>
8.206. <TAN-key>
8.207. <TAN-key>
8.208. <TAN-mor>
8.209. <TAN-T>
8.210. <TAN-T>
8.211. <TAN-T>
8.212. <TAN-T>
8.213. <to>
8.214. <tok>
8.215. <token-definition>
8.216. <token-definition>
8.217. <token-definition>
8.218. <token-definition>
8.219. <topic>
8.220. <value>
8.221. <verb>
8.222. <version>
8.223. <work>
8.224. <work>
8.225. <work>
8.226. <work>