Some function definitions differ from one TAN format to another.
TAN-core-3-0-functions.xsl
3-parameter version of fuller one, below
Input: a sequence of strings to be collated; a sequence of strings that label each
string; a boolean indicating whether the sequence of input strings should be optimized; a
boolean indicating whether the results of tan:diff()
should be processed and weighed; a
boolean indicating whether the collation should be cleaned up.
Output: a<collation>
with (1) one<witness>
per string (and if the last parameter is true, then a sequence of children<commonality>
s, signifying how close that string is with every other, and (2) a sequence of<c>
s and<u>
s, each with a<txt>
and one or more <wit ref="" pos=""/>, indicating which string witness attests to the [c]ommon or [u]nique reading, and what position in that string the particular text fragment starts at.
If there are not enough labels (2nd parameter) for the input strings, the numerical position of the input string will be used as the string label / witness id.
If the third parameter is true, thentan:diff()
will be performed against each pair of strings. Each diff output will be weighed by closeness of the two texts, and sorted accordingly. The results of this operation will be stored in collation/witness/commonality. This requires (n-1)! operations, so should be efficient for a few input strings, but will grow progressively longer according to the number and size of the input strings. Preoptimizing strings will likely produces greater congruence in the<u>
fragments.
If the last parameter is true, then cleanup will not be performed. This parameter was
introduced because the cleanup process itself invokes tan:collate()
and one does not want to
get into an endless loop because of a mishmash of differences that can never be
reconciled or brought closer together.
This version of tan:collate was written in XSLT 3.0 to take advantage of
xsl:iterate, and has an arity of 3 and 5 parameters to distinguish it from its XSLT 2.0
predecessors, which used a different approach to collation. Tests comparing the two versions of
tan:collate()
may be profitable.
Changes in output from previous version oftan:collate()
: -@w
is now<wit>
with@ref
and@pos
- the text node of<u>
or<c>
is now wrapped in<txt>
-@length
is ignored (the value is easily calculated) With these changes, any witness can be easily reconstructed with the XPath expression tan:collation/()
TAN-core-string-functions.xsl
one parameter version of full one below
Input: any number of strings
Output: an element with<c>
and <u w="[WITNESS NUMBERS]">, showing where there are common strings and where there are departures. At the beginning are<witness>
es identifying the numbers, and providing basic statistics about how much each pair of witnesses agree.
This function was written to deal with multiple OCR results of the same page of text, to find agreement wherever possible.
This function was rewritten in 2020 as an XSLT 3.0 function, with 5-arity.
Used by template ŧ clean-up-collation-pass-1
.
Used by function tan:collate()
.
Relies upon tan:adjust-diff()
, tan:collate()
, tan:collate-loop-outer()
, tan:diff()
, tan:diff-cache()
, tan:diff-to-collation()
, tan:most-common-item-count()
, tan:trim-long-text()
, ŧ clean-up-collation-pass-1
, ŧ clean-up-collation-pass-2
, ŧ diff-to-collation
.