Chapter 7. Class-3 TAN Files, Varia

Chapter 7. Class-3 TAN Files, Varia
Prev	Part II. Detailed Description	Next

This chapter provides general background to class-3 TAN files, which are devoted to formats that do not fit the other two classes. For detailed discussion of specific elements and attributes, see Chapter 9, TAN patterns, elements, and attributes defined.

Vocabulary (`TAN-voc`)

All too often, a project has a set of vocabulary it draws from time and again. To repeat the the section called “IRI + name Pattern” can be both tedious and treacherous. If a project with hundreds of TAN files decides to change or augment its vocabulary it could take a long time to find and make all the changes, everywhere and consistently.

The TAN-voc format addresses that problem. It is intended to allow a project to define, edit, and augment the IRI + name patterns for recurrent vocabulary. TAN supplies several standard TAN-voc files under the subdirectory vocabularies, supporting commonly used concepts such token definitions, div types, licenses, and many more. For a complete list of predefined TAN keywords, see Chapter 10, Official TAN vocabularies

It is quite common for a person or team to build vocabulary items in the course of developing a corpus, which means that TAN-voc files tend to changed as the project progresses. You can organize your vocabulary in whatever manner makes sense. You might create one large TAN-voc file for all vocabulary or one file per type of vocabulary, each independent of the other. Each approach has strengths and weaknesses. The latter, one TAN-voc file per type of vocabulary, can create quite a bit of extra work. Every TAN file that draws from the vocabulary must insert one <vocabulary> for each relevant TAN-voc file. The best approach we have found is to have one relatively small master TAN-voc file, which includes other TAN-voc files via <inclusion>s (along with <groupinclude="[IDREFS]"/> or <item include="[IDREFS]"/>, pointing to the IDrefs of the included TAN-voc files).

For more details on how this format relates to other TAN formats, see the section called “Networked Files”.

Root Element and Head

A TAN-voc file has <TAN-voc> as the root element.

The <vocabulary-key> of a TAN-voc file takes, in addition to core vocabulary items, any number of <group-type>s.

A TAN-voc file may draw directly from the vocabulary in its body, as if it were referring to itself via <vocabulary>.

Data (`<body>`)

The <body> of a TAN-voc file consists simply of <item>s or <verb>s, perhaps gathered into groups via <group> or @group. These groups have, at present, no effect upon other TAN files that use them, but they have been valuable in certain applications. For example, the standard TAN-voc file for <div-type> (vocabularies/div-types.TAN-voc.xml) groups textual division types into a rudimentary typology that allows applications to decide programmatically whether a particular division should be treated as a block or inline element, or whether it should be indented.

The @affects-attribute or @affects-element, both weakly inheritable, defines the scope of the vocabulary items, i.e., what elements or attributes can the items be legitimately used for. The vocabulary item will be eligible only for specified attributes or elements.

Nearly all <item>s in a TAN-voc file contain the IRI + name pattern. The only exceptions are <item>s pertaining to token definitions, which instead of <IRI>s take <token-definition>s. See the section called “Defining Words and Tokens”.

<verb> includes, in addition to the IRI + name pattern, the option to have <constraints> added. Those constraints define what components are permitted in any <claim> that uses the <verb>. At this time, verb constraints are at an early phase of development. Only those constraints that mirror standard TAN vocabulary for verbs, vocabularies/verbs.TAN-voc.xml, will be supported during validation. Study that file for examples of how to build a <verb>. See the section called “Data (<body>)” on the use of verbs in a TAN-A file.

Prev	Up	Next
Lexico-Morphology (<TAN-A-lm>)	Home	Morphological Concepts and Patterns (`TAN-mor`)