Doing things with TAN files

The TAN format is not an end in itself. Indeed, there is no point to any file format, unless you can do things with it. TAN was designed to allow users to do unusual and interesting things. /do things, a major subdirectory in the project file, is populated with folders named with actions you might want to perform on a TAN file, and they contain XSLT stylesheets that fall into that area of activity.

Those stylesheets are the front end of a long process that begins with TAN validation. Whenever you validate a TAN file, the Schematron validation file (the companion to the RELAX-NG validation file) is invoked. But that Schematron file is small, and the majority of the work is done by a very large library of XSLT stylesheets that resolve and expand the document, and marking its errors along the way.

That extensive library of XSLT we call here the function library (we use both words, to distinguish the collection from individual, generic functions). The function library provides definitive interpretations of the TAN format, marking parts that are in error. The function library is also an important step to creating your own tools or stylesheets, anticipating, as it does, many things you might want to do with a TAN file. Certain considerations that have been put into the design of the function library are worth noting.

First, the function library has a structure similar to that of the RELAX-NG schemas. That is, the primary access point is through one of the XSLT files named after a primary TAN format. You may also wish to include (or import) the extra functions, http://textalign.net/release/TAN-2018/functions/TAN-extra-functions.xsl.

During Schematron validation, it is quite common for the computer to calculate all global variables, even those that are unused. Therefore the function library defines only those global variables that are central to the validation process.

The most complex and important global variables are the two principal transformations to the TAN file itself, $self-resolved and $self-expanded.

$self-resolved is the result of changing the TAN file through some key steps, including (1) stamping the original uri of the file @base-uri in the root element, (2) converting all numeration systems to Arabic numerals, (3) replacing all elements that have @include with resolved forms of the element, (4) replacing elements with @which with their resolved IRI + name form, (5) stamping elements with @q and a number representing the nth place of that element relative to its original siblings (included elements are given the @q of their host element). If any errors arise, the relevant information is placed in the resolved file as an <error> or <warning>, based upon the master list of errors. @q, @base-uri, and other newly introduced attributes and elements are not defined by the TAN schema.

$self-expanded is the result of putting the file through a series of expansions. As noted earlier, there are three levels of Schematron validation—terse, normal, and verbose—and there are three corresponding levels of $self-expanded. Expansion is intended chiefly to support validation, and so checks for errors. It does so by normalizing the text, converting each attribute to one or more elements (one per value), checking id references, and doing a number of other activities.

For a class 2 file, $self-expanded includes not only an expansion of itself, but an expansion of its dependencies (TAN-T or TAN-mor). When taken to the verbose level, a TAN-A-div file will include in its $self-expanded special documents with a root element <TAN-T-merge>. Each work has one TAN-T-merge file, a collation into a single reference structure all the relevant sources.

All these expansions provide an excellent starting point for conversion into other formats.

The next most important global variables deal with referred files:

Table 10.1. Global variables for referred files


The column labeled "raw" lists variables that hold the first documents available, without alteration. Variables in the next column hold the resolved form, following the same process described above for $self-resolved. The resolved forms of <inclusion> and <key> are sufficient for validation, therefore they do not have expanded versions. Expanded sources are always found after the first document in $self-expanded.

These global variables have been described above very generally. To understand better how their values are calculated, please consult the function library.

The other components of the function library—the functions, keys, and templates—cannot be described conveniently or succinctly here. But they are critical parts of building successful stylesheets that transform TAN files. The next chapter provides a comprehensive, detailed view of how they work.