Chapter 7. Class-3 TAN Files, Varia

Chapter 7. Class-3 TAN Files, Varia
Prev	Part II. Detailed description	Next

This chapter provides general background to class-3 TAN files, which are devoted to formats that do not fit the other two classes. For detailed discussion of specific elements and attributes, see Chapter 12, TAN patterns, elements, and attributes defined.

Vocabulary (`TAN-voc`)

All too often, a project has a set of vocabulary it draws from time and again. To repeat the the section called “IRI + name pattern” can be both tedious and treacherous. If a project with hundreds of TAN files decides to change or augment its vocabulary it could take a long time to find and make all the changes, everywhere and consistently.

The TAN-voc format addresses that problem. It is intended to allow a project to define, edit, and augment the IRI + name patterns for recurrent vocabulary. TAN includes several standard TAN-voc files under the subdirectory vocabularies, supporting commonly used concepts such as token definitions, div types, licenses, and many more. For a complete list of predefined TAN keywords, see Chapter 11, Official TAN vocabularies

It is quite common for a person or team to build vocabulary items gradually while developing a corpus, which means that TAN-voc files tend to change and grow. You can organize your vocabulary in whatever manner makes sense. You might create one large TAN-voc file that has all your project's vocabulary. Or you might break out the vocabulary, one file per type. Each approach has strengths and weaknesses. If you break your vocabulary into many files, you should designate one of them as your point of main import, and include the other TAN-voc files via <inclusion>s (along with <groupinclude="[IDREFS]"/> or <item include="[IDREFS]"/>, pointing to the IDrefs of the included TAN-voc files). Doing so prevents you from having to insert numerous <vocabulary>s in your other TAN files.

For more details on how this format relates to other TAN formats, see the section called “Networked Files”.

Root Element and Head

A TAN-voc file has <TAN-voc> as the root element.

The <vocabulary-key> of a TAN-voc file takes, in addition to core vocabulary items, any number of <group-type>s.

A TAN-voc file may draw directly from the vocabulary in its body, as if it were referring to itself via <vocabulary>.

Data (`<body>`)

The <body> of a TAN-voc file consists simply of <item>s or <verb>s, perhaps gathered into groups via <group> or @group. These groups have, at present, no effect upon other TAN files that use them, but they have been valuable in certain applications. For example, the standard TAN-voc file for <div-type> (vocabularies/div-types.TAN-voc.xml) groups textual division types into a rudimentary typology that allows applications to be designed to decide programmatically whether a particular division should be treated as a block or inline element, or whether it should be indented.

The @affects-attribute or @affects-element, both weakly inheritable, defines the scope of the vocabulary items, i.e., what elements or attributes can the items be legitimately used for. The vocabulary item will be eligible only for specified attributes or elements.

Nearly all <item>s in a TAN-voc file contain the IRI + name pattern or a derived pattern. The only exceptions are <item>s pertaining to token definitions, which instead of <IRI>s take <token-definition>s. See the section called “Defining words and tokens”.

<verb> includes, in addition to the IRI + name pattern, the option to have <constraints> added. Those constraints define what components are permitted in any <claim> that uses the <verb>. At this time, verb constraints are an experimental feature. Only those constraints that mirror standard TAN vocabulary for verbs, vocabularies/verbs.TAN-voc.xml, will be supported during validation. Study that file for examples of how to build a <verb>. See the section called “Data (<body>)” on the use of verbs in a TAN-A file.

Prev	Up	Next
Lexico-morphology (<TAN-A-lm>)	Home	Morphological Concepts and Patterns (`TAN-mor`)