Once you have determined the master XSLT stylesheet for the application, you may want to configure it by adjusting the values given to the global parameters. You have several possible strategies:
Work with a configuration file. If
you are comfortable writing some simple XSLT code, you might create a
small XSLT file that has nothing but an <xsl:import>
whose @href
value points to the original stylesheet. Copy
from the master XSLT stylesheet only those <xsl:param>
s
that you want to change. This method is quick to set up and easy to use,
but it also means that you do not have immediate access to
documentation.
Overwrite the values in the master XSLT stylesheet directly. This method is quick, but it also means that you might not easily restore the original settings, unless you make a backup copy. Also, if you are using configuration files, their default values will change. That could be good or bad, depending upon your setup.
Work from a copy of the master XSLT file. This method allows you to customize the entire application, and consult as needed the original settings in the master file. Like configuration files (see above), you can make new copies for new situations emerge. You should make certain that any working copies are in the same subdirectory as the original, to keep links intact.
Manage transformations from Oxygen. Oxygen XML Editor has a powerful feature, Configure Transformation Scenarios, which allows you to create custom configurations for an XSLT application. Oxygen has good documentation on how to use this flexible feature, which can be combined with any of the preceding three options. Oxygen allows you not only to configure the parameters but to manage input and output. One drawback is that you are presented with all the global parameters that can be found, whether or not they are really relevant. Documentation associated with a particular parameter may be missing or truncated. You should use this feature in conjunction with any documentation that comes with the XSLT application.
Whatever method you adopt for configuration, first find the relevant global parameters. Once you have them, you should always ensure you understand what type of data is expected, and in what quantity.
Data types. XSLT is a strongly typed programming language. The data that is bound to variables and parameters are always at least implicitly typed. Many variables or parameters specify exactly what kind of data is expected. Those that do not are assigned some default type by the XSLT processor. Most data types you encounter will be of two sorts: atomic types, and nodes. Examples of atomic types are integers, booleans, strings, and dates. Examples of nodes are elements, attributes, comments, and processing instructions. There are other types, but we will focus here on the most common.
Quantities. In XSLT, there are four quantity categories:
(1) zero or one; (2) exactly one; (3) zero or more; (4) one or more. Each of these
are specified by adding to a data-type declaration a quantifier: ?
,
nothing, *
, and +
.
Table 9.1. Quantifiers and data types
Quantity | Symbol | Atomic type example | Node type example |
---|---|---|---|
zero or one | ? | xs:string? | element()? |
exactly one | none | xs:boolean | document-node() |
zero or more | * | xs:dateTime* | attribute()* |
one or more | + | xs:integer+ | comment()+ |
Below are some of the more common data types you will find in global parameters, along with several examples going from simple values up to more complex assignments based upon XPath expressions or XSLT constructions. For more background, see the section called “XPath language”. Focus is placed upon data types and quantities expected in select TAN applications and utilities.
Strings. A string is a concatenated sequence of characters. Even when the value consists only of Arabic numerals, a string will be read and interpreted as a text, not as an integer.
In the following example, the string value is specified by the single quotation marks within the double quotation marks. The double-quotation marks delimit the value of the attribute, and the single-quotation marks specify that the value is a string. If you did not include the single quotation marks, it would be interpreted as an XPath expression pointing to the name of a child element within the context.
<xsl:param name="text-a-to-compare" as="xs:string?" select="'Every day'"/>
When more than one string is expected, the strings should be separated by a comma. It is also common to surround the series with parentheses, for visual clarity. This example assigns to the parameter a sequence of two strings.
<xsl:param name="text-a-to-compare" as="xs:string+" select="('day', 'night')"/>
In the next example, @select
is replaced by the text node within
the parameter. This technique can be useful if the value expected will be
space-normalized, and you want to wrap text, and you do not need to create
multiple strings.
<xsl:param name="text-a-to-compare" as="xs:string?">Every day</xsl:param>
The next example takes the primary input XML and converts it to a string. Such conversion is called casting. Keep in mind that the context node of any global parameter is the primary input XML document.
<xsl:param name="text-a-to-compare" as="xs:string" select="string(/)"/>
Perhaps you need to supply a path to some input. The following example
traverses the tree to a particular @href
within the primary input.
The string value in that attribute will be treated like a URL, and it will be
resolved relative to the base URI of the primary input.
<xsl:param name="path-to-source" as="xs:string" select="resolve-uri(/*/tan:head/tan:predecessor/tan:location/@href, base-uri(/))"/>
If a parameter allows multiple values, and you need to change those values frequently, you might want to bind options to global parameters or global variables of your own creation...
<xsl:variable name="dir-1-path" as="xs:string" select="'../../novels/book-a'"/> <xsl:variable name="dir-2-path" as="xs:string" select="'test/comparanda'"/> <xsl:variable name="dir-3-path" as="xs:string" select="'test/logs'"/> <xsl:variable name="dir-4-path" as="xs:string" select="'../brown/texts'"/>
...then update the master global parameter on a case-by-case-basis.
<xsl:param name="secondary-input-relative-uri-directories" as="xs:string+" select="$dir-1-path, $dir-4-path"/>
The preceding example allows you to quickly change from one set of data to another.
Booleans. A boolean is a true/false value. If
a parameter expects a boolean, you should use some XPath expression that can be
cast to a boolean, even if it is a simple one, such as true()
or
false()
. If you need to express the value as a string, it should
be either "true", "false", "0", or "1".
<param name="ignore-comments" as="xs:boolean" select="false()"/> <param name="preoptimize-string-order" as="xs:boolean" select="'true'"/>
Integers. To supply an integer, you need only use numerals, perhaps preceded by a hyphen if it is negative. You should not use quotation marks, or the parameter's child text node. There will be no confusion of the integer with an XPath step, because no element's name may begin with a digit.
<xsl:param name="start-at-depth" as="xs:integer" select="1"/> <xsl:param name="ngram-auras" as="xs:integer+" select="(2, 1)"/>
Decimals. Decimals are much like integers, but
require decimal points. If the decimal is between 1.0 and -1.0, the decimal point
must be preceded by a zero, e.g., -0.99
.
<xsl:param name="diff-threshold-of-interest" as="xs:decimal" select="0.2"/>
Elements. If a global parameter expects
elements as input, you must construct them inline, or provide an XPath expression
that directs the processor to the elements in question. The following example
shows how to construct a parameter that might be fed into tan:batch-replace()
.
<xsl:param name="additional-batch-replacements" as="element()"> <replace pattern="(\d\d)/(\d\d)/(\d\d\d\d)" replacement="$3-$1-$2" message="Converted U.S.-style date to ISO-style"/> </xsl:param>
The parameter used in the previous example might need to be given numerous elements. In those cases it might be convenient to put them in a separate XML file and point to it, with an XPath expression:
<xsl:param name="additional-batch-replacements" as="element()" select="doc('batch-replacements.xml')/*/tan:replace"/>
Running an XSLT application can be done in several ways. As noted above, at the heart of the process is the XSLT processor. The goal is to find the means to feed the primary input and the master stylesheet into the processor, and to tell the processor where to place the output.
From the command line. Processors such as Saxon allow you to initiate the process from the command line.
Windows:
Press the Windows key;
Type "cmd" and click "Command Prompt";
Type the letter of the drive where you plan to run the
process, followed by a colon, e.g., e:
Using the command cd navigate to the directory where your
files are, e.g., cd myfiles
.
Macintosh:
Open the Shell app;
Using the command cd navigate to the directory where your
files are, e.g., cd E:/myfiles
.
From there, follow the instructions provided by the vendor of the XSLT processor. Saxon provides instructions for its product at https://www.saxonica.com/documentation10/index.html#!using-xsl/commandline. A simple command-line instruction might look like the following:
java -cp "E:/xslt processors/saxon-he-10.0.jar" -s:init.xml -xsl:app.xsl -o:primary-output.xml
From Oxygen XML Editor. Oxygen provides numerous ways to initiate the XSLT process, including the following:
XSLT Debugger Perspective. This editing mode changes the appearance of Oxygen, putting eligible primary input files on the left, XSLT files in the middle, and an output pane on the right. You can choose the processor you prefer, and pick your primary input and master stylesheet. Running the application provides interactive output, with many diagnostic tools, letting you learn how the output came about.
Transformation Scenarios. You can choose configure transformation scenarios, and create a highly customized set of conditions for running an XSLT application.
These methods, and other more sophisticated approaches, are described by the vendor in their documentation, https://www.oxygenxml.com/.
All TAN utilities and applications share the same basic architecture. Once you have figured out how to use one TAN application, you are well on your way to being able to use the others as well. Each TAN utility and application has its own purpose, which means that its expected input and output will differ quite a bit from the others. Nevertheless, all TAN utilities and applications share a common set of features, to assist users.
All TAN utilities are in the utilities
directory of the TAN
files; the applications are in the applications
directory. Within
those directories, there is one subdirectory per utility or application. And
within that subdirectory, there are only two XSLT file, accompanied perhaps by
further subdirectories. One of the XSLT files has "configuration" in the name,
and it allows you to customize a particular application or utility for your
projects. The other XSLT file is the master stylesheet for the utility or
application in question, and it has the same name as its parent directory.
Subdirectories contain the heart of the code, and other important
dependencies.
The file structure is designed to make quite clear the main point of entry. Having a directory with so few files should hopefully inspire you to fill it up with copies designed for specific situations.
All master stylesheets for TAN utilities and applications share a common structure. They are designed to be as user-friendly as possible, and to focus exclusively on configuration settings that the user may want to change.
Preamble. Every master stylesheet begins with a long series of comments, indicating the name of the application, its version (an ISO date), its name, and a brief description of what it does. The preamble includes a statement of the intended primary input, secondary input, primary output, and secondary output. Cautionary notes may be included. If the utility or application has areas that are known to need development, these will be listed.
Global parameters. After the
preamble a series of global parameters are presented. Each one is
preceded by a comment that explains the expected value. The parameters
may be organized in blocks according to stages or topics. Some of the
parameters may be localized versions of global parameters that are
defined in standard TAN parameters declared by files in the main
directory parameters
. The values in the master stylesheet
of the application will take precedence over the default
values.
Import statement. At the end of
the master stylesheet is an <xsl:import>
statement,
pointing to the core stylesheet. That instruction may be followed as
well by other comments and declarations that users should not
change.
Every master stylesheet points via its import statement to a single XSLT
file in the incl
subdirectory. That XSLT file is the core
stylesheet. As an everyday user of the application, you will find this core
stylesheet to be of little or no importance. But anyone doing any kind of
customization or development should be aware of how it works, and this
description is aimed at those developers.
Each core stylesheet follows a common structure. It begins with
<xsl:include>
instructions that point to the TAN function
library, and perhaps other important components.
Next come metadata about the application: its name, its IRI, a change
message to be reported, and a variety of descriptions about the application,
and its expected input and output. A change log and a list of features to work
on may be included. The dates within those parameters dictate the version of
the application. All this metadata is used in several ways: to populate the
comments of the master stylesheet, to populate the contents of these
guidelines, and perhaps to supplement the output. The master data is here in
the stylesheet. The development branch of the TAN project includes a
maintenance
directory. Within it is a Schematron file that
makes sure that the master and core stylesheets of any given utility or
application are synchronized.
After the metadata come the XSLT declarations that drive the process. The output for most TAN utilities and applications require multiple ordered stages. A given stage might have a strong declarative element, but the stages themselves are set carefully in a sequence, signposted by global variables that incrementally build the primary or secondary output.
At the end of the core stylesheet are two unnamed templates. Each one points
to the document node of the primary input XML file, and so one of the two will
always be the initial, starting template. The first of these templates is for
diagnostics and is controlled by a static parameter that allows a developer to
turn it on or off. It normally reports back the values of the global variables,
set in process order. If that first template is turned off, then the second one
takes over, and it drives the messaging system, the primary output tree (bound
to some global variable), and initiates any processes necessary for
<xsl:result-document>
instructions required to generate
secondary output.
Any primary or secondary output that results in a TAN file must be credited to or blamed upon the application or utility. The metadata for the application will be added to the output TAN file's vocabulary, and an appropriate entry will be added to the change log.