Table of Contents
This chapter provides general background to class-3 TAN files, which are devoted to formats that do not fit the other two classes. For detailed discussion of specific elements and attributes, see Chapter 9, TAN patterns, elements, and attributes defined.
All too often, a project has a set of vocabulary it draws from time and again. To repeat the the section called “IRI + name Pattern” can be both tedious and treacherous. If a project with hundreds of TAN files decides to change or augment its vocabulary it could take a long time to find and make all the changes, everywhere and consistently.
The TAN-voc format addresses that problem. It is intended to allow a project to
define, edit, and augment the IRI + name patterns for recurrent vocabulary. TAN
supplies several standard TAN-voc files under the subdirectory
vocabularies
, supporting commonly used concepts such token
definitions, div types, licenses, and many more. For a complete list of predefined
TAN keywords, see Chapter 10, Official TAN vocabularies
It is quite common for a person or team to build vocabulary items in the course of
developing a corpus, which means that TAN-voc files tend to changed as the project
progresses. You can organize your vocabulary in whatever manner makes sense. You
might create one large TAN-voc file for all vocabulary or one file per type of
vocabulary, each independent of the other. Each approach has strengths and
weaknesses. The latter, one TAN-voc file per type of vocabulary, can create quite a
bit of extra work. Every TAN file that draws from the vocabulary must insert one
<vocabulary>
for
each relevant TAN-voc file. The best approach we have found is to have one relatively
small master TAN-voc file, which includes other TAN-voc files via <inclusion>
s (along with
<
group
include="[IDREFS]"/>
or <
item
include="[IDREFS]"/>
, pointing to the IDrefs
of the included TAN-voc files).
For more details on how this format relates to other TAN formats, see the section called “Networked Files”.
A TAN-voc file has <TAN-voc>
as the root element.
The <vocabulary-key>
of a TAN-voc file takes, in addition to core
vocabulary items, any number of <group-type>
s.
A TAN-voc file may draw directly from the vocabulary in its body, as if it were
referring to itself via <vocabulary>
.
<body>
)The <body>
of a TAN-voc
file consists simply of <item>
s or <verb>
s, perhaps gathered into groups via <group>
or @group
. These groups have, at
present, no effect upon other TAN files that use them, but they have been valuable
in certain applications. For example, the standard TAN-voc file for <div-type>
(vocabularies/div-types.TAN-voc.xml
) groups textual division types
into a rudimentary typology that allows applications to decide programmatically
whether a particular division should be treated as a block or inline element, or
whether it should be indented.
The @affects-attribute
or @affects-element
, both
weakly inheritable, defines the scope of the vocabulary items, i.e., what elements
or attributes can the items be legitimately used for. The vocabulary item will be
eligible only for specified attributes or elements.
Nearly all <item>
s in a
TAN-voc file contain the IRI + name pattern. The only exceptions are <item>
s pertaining to token
definitions, which instead of <IRI>
s take <token-definition>
s. See the section called “Defining Words and Tokens”.
<verb>
includes, in
addition to the IRI + name pattern, the option to have <constraints>
added. Those
constraints define what components are permitted in any <claim>
that uses the
<verb>
. At this time,
verb constraints are at an early phase of development. Only those constraints that
mirror standard TAN vocabulary for verbs,
vocabularies/verbs.TAN-voc.xml
, will be supported during
validation. Study that file for examples of how to build a <verb>
. See the section called “Data (<body>)” on the use of verbs in a TAN-A file.