The function library behind TAN is quite powerful, and it can be used in non-TAN applications. Below is a list of some functions that have been extremely helpful. Some of the functions are not central to validation, so must be retrieved through ../functions/TAN-extra-functions.xsl. For a complete list of all functions, see Chapter 11, TAN variables, keys, functions, and templates.
tan:batch-replace()
: runs a sequence of regular expression
replacements on any string. The sequence is prepared by constructing a
series of <replace pattern="" replacement="" [flags=""]>
whose attributes follow the rules of tan:replace()
or fn:replace()
.
tan:chop-string()
: changes a string into a sequence of
characters, as defined in TAN (i.e., combining characters are always kept
with the base character). It is roughly equivalent to the XPath expression
for $i in fn:string-to-codepoints(.) return
fn:codepoints-to-string($i)
.
tan:collate()
: like
tan:diff()
, but
applied to any number of strings. The results are treated much like a
collation of manuscript readings, with the output xml fragment tethered to
sigla corresponding to the input strings. The function can be used to
optimize the order of the input strings, and to compute pairwise similarity
of each string.
tan:copy-indentation()
: applies the white-space
indentation of an element to any other XML fragment. Useful for when you
want to insert items in an XML file and preserve/imitate its
indentation.
tan:diff()
: compare any
two strings for differences. Includes an option to mark the changes
letter-for-letter, or merely word-for-word (easier to read in some
contexts). This function, which was written under the assumption that the
input strings would have some resemblance, has been used successfully on
pairs of strings as long as 5M characters.
tan:duplicate-items()
: like tan:duplicate-values()
, but applied to any item. If a
node, duplication is determined based on whether it is deeply equal to any
other node.
tan:duplicate-values()
: finds distinct items in a
sequence whose values are repeated in the sequence. This function
complements fn:distinct-values()
.
tan:fill()
: repeats a
string a given number of times. Helpful for formatting plain-text
output.
tan:get-chars-by-name()
: retrieves Unicode characters
based upon words in their name.
tan:glob-to-regex()
: changes a glob-like expression
(normally used for filenames) into a regular expression (e.g.,
*.*
becomes .*\..*
).
tan:lang-code()
:
retrieves an ISO 639-3 code for a language of a given name.
tan:lang-name()
:
finds the name of a language, given its ISO 639-3 code.
tan:median()
:
retrieves the median from a sequence of numbers
tan:most-common-item()
: from a sequence of items,
returns the one that occurs most frequently
tan:most-common-item-count()
: returns the number of
times the most common item appears in a sequence
tan:no-outliers()
: removes outliers from a sequence of
numbers
tan:outliers()
:
returns only outliers from a sequence of numbers
tan:search-morpheus()
: retrieves lexico-morphological
data for Greek and Latin from the Morpheus service
tan:search-wikipedia()
: retrieves a set number of
records from Wikipedia
tan:shallow-copy()
: returns a copy of a node to a set
depth. Useful for messages, to provide feedback on a particular element and
its attributes, without any descendants (which would make the message hard
to read).
tan:uri-relative-to()
: converts an absolute URI to a
relative one, based on some context URI
Some numeral functions might prove useful:
Letter numerals ↔ integers: tan:aaa-to-int()
, tan:int-to-aaa()
Roman numerals → integers: tan:rom-to-int()
(reverse not available)
Greek numerals ↔ integers: tan:grc-to-int()
, tan:int-to-grc()
Syriac numerals → integers: tan:syr-to-int()
(reverse not available)
Hexadecimal ↔ decimal: tan:hex-to-dec()
, tan:dec-to-hex()
String range ↔ integers: tan:expand-numerical-sequence()
, tan:integers-to-sequence()