In many cases, developers will want to work with TAN files, either as input or as output. But TAN files have a number of distinctive constructions: two different methods of inclusion (see the section called “Networked Files”), space-normalization rules (see the section called “Space characters and normalization”), numeration systems (see the section called “One reference system”), tokenization systems (see the section called “Defining words and tokens”), and pointing systems (see the section called “Class 2 pointer syntax: referencing texts”). You can work directly with raw TAN files, but you run the risk of misinterpreting the file.
Every TAN file is definitively interpreted through the TAN functions that undergird the Schematron validation process (see the section called “TAN validation”). That process is a core part of the standard TAN utilities and applications, and it determines the nature of some of the more important global variables.
Every TAN file is subject to two major transformations, both for validation and for applications.
The first transformation resolves the file. The goal is to
get the file into a state where it can be understood on its own terms. A resolved
TAN file contains all its relevant vocabulary and components. It can be evaluated
without having to consult the files referred to by <vocabulary>
or <inclusion>
dependencies. (See
the section called “Networked Files” for background on TAN's
approach to inclusion.) This process also does some basic file-specific
normalization; it will:
Prepare the file. This includes stamping the root element with a base
URI (the path location of the file), evaluating <alias>
, and inserting
into every element a @q
that contains a identifier unique to
the element. This identifier is used by the Schematron file to match an
element with any error messages in the corresponding element in the XSLT
output.
Insert required components from <vocabulary>
s or
<inclusion>
s using the following method:
Relevant external vocabulary items are inserted into the
<head>
, either as descendants of the
appropriate <vocabulary>
or if derived from TAN
standard vocabulary as new <tan-vocabulary>
elements immediately following the <vocabulary-key>
. All vocabulary items
are imprinted with an <id>
corresponding to an
@xml:id
from any corresponding entry from
<vocabulary-key>
, to facilitate rapid
retrieval of vocabulary. Any vocabulary <name>
that is
not normalized is duplicated with a name-normalized copy
(signaled by @norm
): lower-case, hyphens and
underscores changed to spaces, and space-normalized.
Any element with an @include
is replaced by the elements of the
same name found in the target inclusion document (constructed
recursively if need be). In addition, <inclusion>
(in the head) is populated with any vocabulary items required to
resolve the newly included material (recursively, if need be).
This last point is important, because all idrefs must be
interpreted in light of the original context. Included idrefs
are made available to the host document, so when you use
<inclusion>
you must ensure there are no
id conflicts.
Normalize all numbers in original components (i.e., excluding included elements or vocabulary items) as Arabic numerals.
Files are resolved recursively. That is, no <vocabulary>
or <inclusion>
components are
incorporated or processed until the files pointed to are themselves first
resolved.
Numerals fall at the end of the process because they might need to be resolved in light of resolved vocabulary and inclusions.
The description above is necessarily generalized. For details consult the
function library, particularly the functions/resolution
directory. In
cases of conflict between the code and the description above, the code should be
given priority.
The second transformation expands the resolved file. You must resolve a TAN file before you try to expand it. The goal behind expansion is to unpack the components of a resolved document and identify any errors along the way (see the master list of errors). There are three levels of expansion, corresponding to the three levels of Schematron validation: terse, normal, and verbose.
In terse expansion, for each value of an attribute, an element with the
attribute's name is placed within the parent (e.g., @type="a b"
produces
<type>a</type>
and <type>b</type>
). If
the value is an IDref, and it points to an alias, a copy is made for the idref of
each target vocabulary item. If an idref does not point to a vocabulary item of
the expected type, an error message is also copied in the parent. Any values that
are ranges are expanded, if need be. Select networked files are checked for basic
validity. Class-2 files undergo a extra rounds of processing during terse
validation: sources are adjusted if need be, and then checked against references
in the host class-2 file. (See the section called “Class 2 pointer syntax: referencing texts”.) In terse
expansion, all pointing mechanisms are checked. Because of this basic requirement,
some terse expansion can take a long time on lengthy files, or ones with complex
<adjustments>
.
Normal expansion builds on terse expansion by interrogating networked files more closely. Any errors that were reported during the terse stage but were suppressed to avoid clutter are enabled.
Verbose expansion generally attends to procedures that are complex, or are not
essential parts of a validation report. For example, a <model>
of a class-1 file will
be checked, to find references that one has but is lacking in the other. A class-1
<redivision>
will be analyzed, to make sure that the two transcriptions are identical. A
catalog file in the same directory will be checked, to see if it has faulty
entries.
Many errors lend themselves to solutions that can be recommended by the TAN function library. Some solutions are returned to the Schematron validation method as Schematron Quick Fixes (SQFs). XML editors that are equipped to handle SQFs (e.g., Oxygen XML Editor) can then prompt users to quickly fix an errant section. For example, if text has not been NFC Unicode-normalized, an SQF will allow a user to make the change in two clicks. Thus, TAN validation does not merely tell you what the problems are; it tries to help fix them.
The term "expansion" describes the process but possibly not the output. If the
global parameter $tan:validation-mode-on
is true, then in the course
of expanding the file the TAN templates will abandon any parts that are no longer
needed. The output is normally much smaller than the input file, restricted as it
is to the root element, which merely wraps errors, warnings, or fixes. So although
during validation the file is really being expanded, at the end only a small
portion of the expanded file is returned to the Schematron processor, to expedite
validation. But if $tan:validation-mode-on
is false (the default
value), the entire expanded file and its dependencies are returned. Such output
can be very useful in applications.
The preceding description about expansion is necessarily generalized. For
details consult the function library, especially functions/expansion
.