Cross-format functions

Cross-format functions
Prev	Chapter 11. TAN variables, keys, functions, and templates	Next

Some function definitions differ from one TAN format to another.

`tan:collate()`

`TAN-core-3-0-functions.xsl`

3-parameter version of fuller one, below

Input: a sequence of strings to be collated; a sequence of strings that label each 
string; a boolean indicating whether the sequence of input strings should be optimized; a 
boolean indicating whether the results of tan:diff() should be processed and weighed; a 
boolean indicating whether the collation should be cleaned up.

Output: a <collation> with (1) one <witness> per string (and if the last parameter is 
true, then a sequence of children <commonality>s, signifying how close that string is 
with every other, and (2) a sequence of <c>s and <u>s, each with a <txt> and one or more <wit 
ref="" pos=""/>, indicating which string witness attests to the [c]ommon or [u]nique 
reading, and what position in that string the particular text fragment starts at.

If there are not enough labels (2nd parameter) for the input strings, the numerical 
position of the input string will be used as the string label / witness id.

If the third parameter is true, then tan:diff() will be performed against each pair 
of strings. Each diff output will be weighed by closeness of the two texts, and sorted 
accordingly. The results of this operation will be stored in collation/witness/commonality. 
This requires (n-1)! operations, so should be efficient for a few input strings, but will 
grow progressively longer according to the number and size of the input strings. 
Preoptimizing strings will likely produces greater congruence in the <u> fragments.

If the last parameter is true, then cleanup will not be performed. This parameter was 
introduced because the cleanup process itself invokes tan:collate() and one does not want to 
get into an endless loop because of a mishmash of differences that can never be 
reconciled or brought closer together.

This version of tan:collate was written in XSLT 3.0 to take advantage of 
xsl:iterate, and has an arity of 3 and 5 parameters to distinguish it from its XSLT 2.0 
predecessors, which used a different approach to collation. Tests comparing the two versions of 
tan:collate() may be profitable.

Changes in output from previous version of tan:collate(): - @w is now <wit> with @ref 
and @pos - the text node of <u> or <c> is now wrapped in <txt> - @length is ignored (the value 
is easily calculated) With these changes, any witness can be easily reconstructed 
with the XPath expression tan:collation/()

`TAN-core-string-functions.xsl`

one parameter version of full one below

Input: any number of strings

Output: an element with <c> and <u w="[WITNESS NUMBERS]">, showing where there are 
common strings and where there are departures. At the beginning are <witness>es 
identifying the numbers, and providing basic statistics about how much each pair of witnesses 
agree.

This function was written to deal with multiple OCR results of the same page of text, 
to find agreement wherever possible.

This function was rewritten in 2020 as an XSLT 3.0 function, with 5-arity.

Used by template ŧ clean-up-collation-pass-1.

Used by function tan:collate().

Relies upon tan:adjust-diff(), tan:collate(), tan:collate-loop-outer(), tan:diff(), tan:diff-cache(), tan:diff-to-collation(), tan:most-common-item-count(), tan:trim-long-text(), ŧ clean-up-collation-pass-1, ŧ clean-up-collation-pass-2, ŧ diff-to-collation.

Prev	Up	Next
Mode templates	Home	Chapter 12. Errors