TAN files are suited for dozens of types of applications, many of which at this
point are only imagined or being written. The subdirectory applications
is populated with folders named with actions you might want to perform on a TAN file,
and they contain XSLT stylesheets that give you but a taste of what is possible.
Because the applications in that directory are still under development, this section
is devoted not to the specifics of those applications but to the theoretical
background behind practical applications of TAN files. It is aimed particularly at
those readers who are comfortable working with XSLT or related XML technologies, and
want to do something important and useful with their TAN files.
The Schematron validation process was designed with a view to the next steps in
practical applications. The extensive function library upon which validation is based
provides a foundation for a variety of applications. When developing an application,
the first point of order is normally to find an entry point in the
functions
subdirectory to the TAN function library. In that
directory, each XSLT file is named after one of the TAN formats. Point via
<xsl:include>
to the file that most resembles your main input.
Note | |
---|---|
You could also try to fetch the TAN library via
|
If you point to ../functions/TAN-A-functions.xsl, you will have most of the functions and templates used for both class-1 and class-2 files. It tends to be a good default entry point if you are uncertain which master function file to use.
It is also common to include ../functions/TAN-extra-functions.xsl, which is the entry point for all TAN functions, global variables, and templates that do not play a role in Schematron validation. Those extra functions include many global variables that are excluded from the core TAN library, so as not to encumber Schematron validation.
You should also pay attention to the files in the subdirectory
parameters
. Some of the global parameters there can be used
profitably to change the way an application runs.
All XSLT transformations require at least four components:
an input XML file
an XSLT file
a URL for the output
an XSLT engine (e.g., Saxon HE) to process #1 against #2 and send the output to #3.
Although #1 is the principal or catalyzing input, it need not be the main input.
Sometimes an XSLT application is written with an eye toward non-XML as the main
input. In such cases, it is impossible for the main input to be the catalyzing input.
Furthermore, although there is only one principal output document, an application may
need to create many other output documents. Those are normally created through
<xsl:result-document>
. So in any XSLT operation, there are really
two possible types of input and two types of output. We use the terms
catalyzing input for #1 and secondary
input for input that is added during the process. We use the term
primary output for #3 and secondary
output for any other output created along the way. The terms
primary and secondary refer only to
their position in the process, not their importance. Indeed, there are XSLT
applications where the secondary input and secondary output are far more important
than the catalyzing input or primary output. In its documentation, an XSLT file
should indicate whether the main input is the catalyzing input,
the secondary input, or both, and whether the main output is the
primary output, the secondary output, or both.
When developing an application where the main input is a TAN file, it is often
best to start with it in its resolved or expanded state. (See the section called “The TAN Validation Process” on resolving and expanding TAN files.) If
that TAN file is the catalyzing input, use the global variables $self-resolved
and $self-expanded
. If it is
secondary input, use tan:resolve-doc()
and tan:expand-doc()
.
For a class-2 file, $self-expanded
or the output of tan:expand-doc()
is a sequence of
documents, starting with an expansion of the class-2 file itself, followed by
expansions of its dependencies (TAN-T or TAN-mor). Its expanded class-1 sources will
be tokenized where required, and marked with anchors for each reference in the
class-2 file. If a token straddles leaf <div>
s, the token will be reconstituted by moving the tail
of the token up. These expanded sources are excellent candidates for other types of
transformation. For example, HTML pages can be created to integrate class-2
annotations and their class-1 sources, in a variety of ways.
At the verbose level, an expanded TAN-A file will conclude its $self-expanded
sequence with one
or more documents with a root element <TAN-T_merge>
, one file per
detected work. A TAN-T_merge file has one <head>
per class-1 source that has been merged, and the
<body>
contains a master
set of <div>
s that merge all
the other sources' <div>
s that
share the same reference, after all <adjustments>
have been made. Each leaf <div>
in each source appears in the
appropriate place, but as a child of a common <div>
that encompasses all other leaf <div>
s with the same reference. For
each version's leaf div, @type
is
changed to #version
, and other markers signify which source it
corresponds to. A TAN-T_merge file is a good basis building parallel displays or
statistical analyses. These merge files can be created on an ad hoc basis through the
function tan:merge-expanded-docs()
, applied to individual class-1 files,
after expansion.
If you are fetching other TAN files as secondary input, and you want to work with
them, use tan:resolve-doc()
and tan:expand-doc()
, which
will put the files in their resolved and expanded states. You must resolve a TAN file
before you try to expand it.
If you wish to create a TAN file as output (whether primary or secondary), it is
advised that you prepare ahead of time a skeleton TAN file, introduce that skeleton
as secondary input, infuse it with the new content, and let it become the primary or
secondary output. Because the application you are using to create a TAN file is
responsible for creating that file, and because responsibility for TAN files should
be documented, the algorithm used to create that new TAN file should be declared in
the <vocabulary-key>
and credited with a <resp>
, and a
<change>
should be
entered in the change log. Users of the file will be warned, during Schematron
validation, that the last change was made by an algorithm.
If you are working with a TAN file as catalyzing input, you may want to take advantage of some other global variables derived from its key files (see the section called “Networked Files”):
Table 8.1. Global variables for networked files
Raw (first document available) | Resolved | Expanded | |
---|---|---|---|
<inclusion> | — | $inclusions-resolved | — |
<vocabulary> | — | $vocabularies-resolved | — |
<source> | — | $sources-resolved | $self-expanded[tan:TAN-T] |
<see-also> | $see-alsos-1st-da | $see-alsos-resolved | — |
The column labeled "raw" lists variables that hold the first documents available,
without alteration. Variables in the next column hold the resolved form, following
the same process described above for $self-resolved
. The resolved forms of <inclusion>
and <vocabulary>
are sufficient for
validation, therefore they do not have expanded versions. Expanded sources are always
bundled with their class-2's $self-expanded
.
For relatively simple applications, a resolved file is sufficient. But even then,
there will be places where you will want to fetch the vocabulary bound to a
particular attribute or element. One of the more important functions to familiarize
yourself with is tan:vocabulary()
, which can be used to get the IRI + name pattern
of a specific node, or to get all the vocabulary available for a given type.
Some developers will find even tan:vocabulary()
a hassle to use. Consider setting the global
parameter $distribute-vocabulary
(default false
) to
true
. If that happens, whenever an IDref appears, it will be
imprinted with the corresponding IRI + name pattern for the referred vocabulary item.
Exercise this option with care: such repetition will result in a document
considerably larger than the original.