Doing things with TAN files

Doing things with TAN files
Prev	Chapter 8. Working with TAN Files	Next

TAN files are suited for dozens of types of applications, many of which at this point are only imagined or being written. The subdirectory applications is populated with folders named with actions you might want to perform on a TAN file, and they contain XSLT stylesheets that give you but a taste of what is possible. Because the applications in that directory are still under development, this section is devoted not to the specifics of those applications but to the theoretical background behind practical applications of TAN files. It is aimed particularly at those readers who are comfortable working with XSLT or related XML technologies, and want to do something important and useful with their TAN files.

The Schematron validation process was designed with a view to the next steps in practical applications. The extensive function library upon which validation is based provides a foundation for a variety of applications. When developing an application, the first point of order is normally to find an entry point in the functions subdirectory to the TAN function library. In that directory, each XSLT file is named after one of the TAN formats. Point via <xsl:include> to the file that most resembles your main input.

	Note
	You could also try to fetch the TAN library via `<xsl:import>`, but results may be erratic, particularly if you have not put the import command in the right order, or if templates in your master stylesheet override templates in the TAN library. `<xsl:include>` is always a more certain option.

If you point to ../functions/TAN-A-functions.xsl, you will have most of the functions and templates used for both class-1 and class-2 files. It tends to be a good default entry point if you are uncertain which master function file to use.

It is also common to include ../functions/TAN-extra-functions.xsl, which is the entry point for all TAN functions, global variables, and templates that do not play a role in Schematron validation. Those extra functions include many global variables that are excluded from the core TAN library, so as not to encumber Schematron validation.

You should also pay attention to the files in the subdirectory parameters. Some of the global parameters there can be used profitably to change the way an application runs.

All XSLT transformations require at least four components:

an input XML file
an XSLT file
a URL for the output
an XSLT engine (e.g., Saxon HE) to process #1 against #2 and send the output to #3.

Although #1 is the principal or catalyzing input, it need not be the main input. Sometimes an XSLT application is written with an eye toward non-XML as the main input. In such cases, it is impossible for the main input to be the catalyzing input. Furthermore, although there is only one principal output document, an application may need to create many other output documents. Those are normally created through <xsl:result-document>. So in any XSLT operation, there are really two possible types of input and two types of output. We use the terms catalyzing input for #1 and secondary input for input that is added during the process. We use the term primary output for #3 and secondary output for any other output created along the way. The terms primary and secondary refer only to their position in the process, not their importance. Indeed, there are XSLT applications where the secondary input and secondary output are far more important than the catalyzing input or primary output. In its documentation, an XSLT file should indicate whether the main input is the catalyzing input, the secondary input, or both, and whether the main output is the primary output, the secondary output, or both.

When developing an application where the main input is a TAN file, it is often best to start with it in its resolved or expanded state. (See the section called “The TAN Validation Process” on resolving and expanding TAN files.) If that TAN file is the catalyzing input, use the global variables $self-resolved and $self-expanded. If it is secondary input, use tan:resolve-doc() and tan:expand-doc().

For a class-2 file, $self-expanded or the output of tan:expand-doc() is a sequence of documents, starting with an expansion of the class-2 file itself, followed by expansions of its dependencies (TAN-T or TAN-mor). Its expanded class-1 sources will be tokenized where required, and marked with anchors for each reference in the class-2 file. If a token straddles leaf <div>s, the token will be reconstituted by moving the tail of the token up. These expanded sources are excellent candidates for other types of transformation. For example, HTML pages can be created to integrate class-2 annotations and their class-1 sources, in a variety of ways.

At the verbose level, an expanded TAN-A file will conclude its $self-expanded sequence with one or more documents with a root element <TAN-T_merge>, one file per detected work. A TAN-T_merge file has one <head> per class-1 source that has been merged, and the <body> contains a master set of <div>s that merge all the other sources' <div>s that share the same reference, after all <adjustments> have been made. Each leaf <div> in each source appears in the appropriate place, but as a child of a common <div> that encompasses all other leaf <div>s with the same reference. For each version's leaf div, @type is changed to #version, and other markers signify which source it corresponds to. A TAN-T_merge file is a good basis building parallel displays or statistical analyses. These merge files can be created on an ad hoc basis through the function tan:merge-expanded-docs(), applied to individual class-1 files, after expansion.

If you are fetching other TAN files as secondary input, and you want to work with them, use tan:resolve-doc() and tan:expand-doc(), which will put the files in their resolved and expanded states. You must resolve a TAN file before you try to expand it.

If you wish to create a TAN file as output (whether primary or secondary), it is advised that you prepare ahead of time a skeleton TAN file, introduce that skeleton as secondary input, infuse it with the new content, and let it become the primary or secondary output. Because the application you are using to create a TAN file is responsible for creating that file, and because responsibility for TAN files should be documented, the algorithm used to create that new TAN file should be declared in the <vocabulary-key> and credited with a <resp>, and a <change> should be entered in the change log. Users of the file will be warned, during Schematron validation, that the last change was made by an algorithm.

If you are working with a TAN file as catalyzing input, you may want to take advantage of some other global variables derived from its key files (see the section called “Networked Files”):

Table 8.1. Global variables for networked files

	Raw (first document available)	Resolved	Expanded
`<inclusion>`	—	`$inclusions-resolved`	—
`<vocabulary>`	—	`$vocabularies-resolved`	—
`<source>`	—	`$sources-resolved`	`$self-expanded[tan:TAN-T]`
`<see-also>`	`$see-alsos-1st-da`	`$see-alsos-resolved`	—

The column labeled "raw" lists variables that hold the first documents available, without alteration. Variables in the next column hold the resolved form, following the same process described above for $self-resolved. The resolved forms of <inclusion> and <vocabulary> are sufficient for validation, therefore they do not have expanded versions. Expanded sources are always bundled with their class-2's $self-expanded.

For relatively simple applications, a resolved file is sufficient. But even then, there will be places where you will want to fetch the vocabulary bound to a particular attribute or element. One of the more important functions to familiarize yourself with is tan:vocabulary(), which can be used to get the IRI + name pattern of a specific node, or to get all the vocabulary available for a given type.

Some developers will find even tan:vocabulary() a hassle to use. Consider setting the global parameter $distribute-vocabulary (default false) to true. If that happens, whenever an IDref appears, it will be imprinted with the corresponding IRI + name pattern for the referred vocabulary item. Exercise this option with care: such repetition will result in a document considerably larger than the original.

Prev	Up	Next
Sharing TAN files	Home	Using TAN outside the Network