Table of Contents
This chapter provides general background to class-3 TAN files, which are devoted to formats that do not fit the other two classes. For detailed discussion of specific elements and attributes, see Chapter 12, TAN patterns, elements, and attributes defined.
All too often, a project has a set of vocabulary it draws from time and again. To repeat the the section called “IRI + name pattern” can be both tedious and treacherous. If a project with hundreds of TAN files decides to change or augment its vocabulary it could take a long time to find and make all the changes, everywhere and consistently.
The TAN-voc format addresses that problem. It is intended to allow a project to
define, edit, and augment the IRI + name patterns for recurrent vocabulary. TAN
includes several standard TAN-voc files under the subdirectory
vocabularies
, supporting commonly used concepts such as token
definitions, div types, licenses, and many more. For a complete list of predefined
TAN keywords, see Chapter 11, Official TAN vocabularies
It is quite common for a person or team to build vocabulary items gradually while
developing a corpus, which means that TAN-voc files tend to change and grow. You can
organize your vocabulary in whatever manner makes sense. You might create one large
TAN-voc file that has all your project's vocabulary. Or you might break out the
vocabulary, one file per type. Each approach has strengths and weaknesses. If you
break your vocabulary into many files, you should designate one of them as your point
of main import, and include the other TAN-voc files via <inclusion>
s (along with
<
group
include="[IDREFS]"/>
or <
item
include="[IDREFS]"/>
, pointing to the IDrefs
of the included TAN-voc files). Doing so prevents you from having to insert numerous
<vocabulary>
s in
your other TAN files.
For more details on how this format relates to other TAN formats, see the section called “Networked Files”.
A TAN-voc file has <TAN-voc>
as the root element.
The <vocabulary-key>
of a TAN-voc file takes, in addition to core
vocabulary items, any number of <group-type>
s.
A TAN-voc file may draw directly from the vocabulary in its body, as if it were
referring to itself via <vocabulary>
.
<body>
)The <body>
of a TAN-voc
file consists simply of <item>
s or <verb>
s, perhaps gathered into groups via <group>
or @group
. These groups have, at
present, no effect upon other TAN files that use them, but they have been valuable
in certain applications. For example, the standard TAN-voc file for <div-type>
(vocabularies/div-types.TAN-voc.xml
) groups textual division types
into a rudimentary typology that allows applications to be designed to decide
programmatically whether a particular division should be treated as a block or
inline element, or whether it should be indented.
The @affects-attribute
or @affects-element
, both
weakly inheritable, defines the scope of the vocabulary items, i.e., what elements
or attributes can the items be legitimately used for. The vocabulary item will be
eligible only for specified attributes or elements.
Nearly all <item>
s in a
TAN-voc file contain the IRI + name pattern or a derived pattern. The only
exceptions are <item>
s
pertaining to token definitions, which instead of <IRI>
s take <token-definition>
s.
See the section called “Defining words and tokens”.
<verb>
includes, in
addition to the IRI + name pattern, the option to have <constraints>
added. Those
constraints define what components are permitted in any <claim>
that uses the
<verb>
. At this time,
verb constraints are an experimental feature. Only those constraints that mirror
standard TAN vocabulary for verbs, vocabularies/verbs.TAN-voc.xml
,
will be supported during validation. Study that file for examples of how to build
a <verb>
. See the section called “Data (<body>)” on the use of verbs in a TAN-A file.