Definition: '\P{M}\p{M}*'
Used by function tan:chop-string
()
Does not rely upon global variables, keys, functions, or templates.
Definition: ''
Used by variable $special-end-div-chars
Does not rely upon global variables, keys, functions, or templates.
Definition: ($zwj, $soft-hyphen)
Used by variable $special-end-div-chars-regex
Used by function tan:normalize-div-text
()
Relies upon $zwj
$soft-hyphen
.
Definition: concat('[', string-join($special-end-div-chars, ''), ']$')
Used by function tan:normalize-div-text
()
Relies upon $special-end-div-chars
.
Definition: $token-definitions-reserved[following-sibling::tan:name = 'nonspace']
Used by function tan:remodel-div-ref
()
Relies upon $token-definitions-reserved
.
Definition: ''
Used by variable $special-end-div-chars
Does not rely upon global variables, keys, functions, or templates.
tan:analyze-stats($arg as xs:anyAtomicType*) as element()?
Takes a series of integers, doubles, or other numbers and returns basic statistics
as attributes in a single element
Used by function tan:merge-analyzed-stats
()
Does not rely upon global variables, keys, functions, or templates.
Option 1 (TAN-class-1-and-2-functions)
tan:analyze-string-length($resolved-class-1-doc-or-fragment as item()*) as item()*
One-parameter function of the two-parameter version below
Used by variable $self-class-1-errors-marked
Used by function tan:remodel-div-ref
() tan:analyze-string-length
()
Relies upon tan:analyze-string-length
.
Option 2 (TAN-class-1-and-2-functions)
tan:analyze-string-length($resolved-class-1-doc-or-fragment as item()*, $mark-only-leaf-divs as xs:boolean) as item()*
Input: any class-1 document or fragment; an indication whether string lengths should be added only to leaf divs, or to every div.
Output: the same document, with @string-length
and @string-pos
added to every div
Function to calculate string lengths of each leaf elements and their relative position, so that a raw text can be segmented proportionally and given the structure of a model exemplar. NB: any $special-end-div-chars
that terminate a <div>
not only will not be counted, but the
assumed space that follows will also not be counted. On the other hand, the lack of a special
character at the end means that the nominal space that follows a div will be included in both
the length and the position. Thus input...
<div type="m" n="1">abc­</div>
<div type="m" n="2">def‍</div>
<div type="m" n="3">ghi</div>
<div type="m" n="4">xyz</div>
...presumes a raw joined text of "abcdefghi xyz ", and so becomes output:
<div type="m" n="1" string-length="3" string-pos="1">abc­</div>
<div type="m" n="2" string-length="3" string-pos="4">def‍</div>
<div type="m" n="3" string-length="4" string-pos="7">ghi</div>
<div type="m" n="4" string-length="4" string-pos="11">xyz</div>
Used by variable $self-class-1-errors-marked
Used by function tan:remodel-div-ref
() tan:analyze-string-length
()
Relies upon ŧ c1-stamp-string-length
ŧ c1-stamp-string-pos
.
Option 1 (TAN-class-1-and-2-functions)
tan:arabic-numerals($strings as xs:string*) as xs:string*
Input: any strings that might be convertible to Arabic numerals, but of unknown format or type
Output: Best-guess Arabic numeral equivalents, as strings. Roman numerals take precedence over alphabet numerals (that is, 'i' is interpreted as 1, not 9)
Used by template ŧ prep-class-1
Relies upon tan:strings-to-numeral-or-numeral-type
.
Option 2 (TAN-class-1-and-2-functions)
tan:arabic-numerals($strings as xs:string*, $treat-ambiguous-a-or-i-type-as-roman-numeral as xs:boolean?) as xs:string*
Input: any strings that might be convertible to Arabic numerals, plus the type they are known to conform to
Output: Best-guess Arabic numeral equivalents, as strings.
Used by template ŧ prep-class-1
Relies upon tan:strings-to-numeral-or-numeral-type
.
tan:chop-string($input as xs:string?) as xs:string*
Input: any string
Output: that string chopped into a sequence of strings, following TAN rules about modifying characters
Used by template ŧ mark-tok-chars
ŧ char-setup
Used by function tan:string-length
()
Relies upon $char-reg-exp
.
tan:div-to-div-transfer($divs-with-content-to-be-transferred as element()*, $divs-to-be-infused-with-new-content as element()*) as element()*
Input: (1) any set of divs with content to be transferred into the structure of (2) another set of divs.
Output: The div structure of (2), infused with the content of (1). The content is allocated proportionately, with preference given to punctuation, within a certain range, and then word breaks.
This function is useful for transforming class-1 documents from one reference system to another. It starts by getting the text content of (1), then string values for (2).
No variables, keys, functions, or named templates depend upon this xsl:function.
Relies upon tan:text-join
ŧ c1-stamp-string-length
ŧ c1-stamp-string-pos
ŧ infuse-tokenized-text
ŧ strip-all-attributes-except
.
Option 1 (TAN-class-1-and-2-functions)
tan:flatref($node as element()?) as xs:string?
Simple, one-param function of the fuller one, below
Used by template ŧ get-mismatched-text
Used by function tan:flatref
() tan:get-ref-seq
()
Relies upon tan:flatref
.
Option 2 (TAN-class-1-and-2-functions)
tan:flatref($node as element()?, $div-types-to-suppress as xs:string*, $div-ns-to-rename as element()*) as xs:string?
Input: div node in a TAN-T(
EI) document; truth value whether references that fit a number pattern should be converted to integers
Output: string value concatenating the reference values from the topmost div ancestor to the node.
This function assumes that @n
has already been normalized per tan:resolve-doc
(), which converts @ns
to Arabic numerals wherever possible
Used by template ŧ get-mismatched-text
Used by function tan:flatref
() tan:get-ref-seq
()
Relies upon $separator-hierarchy
.
tan:get-n-types($src-1st-da-resolved as document-node()*) as element()*
Input: any class 1 TAN documents
Calculates types of @n
values per div type per source and div type
October 2016: this function used to be used for validation, but a better routine is preferred. The function is left here, however, in case it proves useful in other contexts.
Used by template ŧ prep-class-2-doc-pass-2
Relies upon tan:number-type
$n-type
.
tan:get-src-skeleton($src-1st-da-prepped as document-node()*) as document-node()?
one-parameter form of the master version below; it results in a merger of sources, but without text and empty leaf divs
Used by function tan:prep-verbosely
()
Relies upon tan:merge-sources
.
tan:median($numbers as xs:double*) as xs:double?
Input: any sequence of numbers
Output: the median value
It is assumed that the input has already been sorted by tan:numbers-sorted(
) vel sim
Used by function tan:outliers
()
Does not rely upon global variables, keys, functions, or templates.
tan:merge-analyzed-stats($analyzed-stats as element()*, $add-stats as xs:boolean?) as element()
Takes a group of elements that follow the pattern that results from tan:analyze-stats and synthesizes them into a single element. If $add-stats
is true, then they are added; if false, the sum of the 2nd - last elements is subtracted from the first; if neither true nor false, nothing happens. Will work on elements of any name, so long as they have tan:d children, with the data points to be merged.
Used by function tan:merge-sources
() tan:synthesize-merged-group
()
Relies upon tan:error
tan:analyze-stats
.
tan:merge-source-loop($not-fully-merged-source as document-node()?, $so-far-merged-to-what-depth as xs:integer, $add-stats as xs:boolean?, $order-of-source-ids as xs:string*) as document-node()?
Input: a rough merge (the result of tan:merge-source(
)); an initial depth (usually 1), a boolean indicating whether statistics, if present, should be added or if the sum of tail should be subtracted from the head, and a list of source ids (only if the order of sources should be respected)
Output: a single document that joins sibling <div>
s that share a common @ref.
Further, if any statistics are present and $add-stats
is true, then the matching attributes in merged <d>
s are added or checked for differences, as required. If $add-stats
is false then the statistics are subtracted (the head of the sequence minus the sum of the tail of the sequence)
No special provision is made for the order of synthesized <div>
s; to control for order, the input unmerged sources in every <div>
should have an @r
that specifies the relative rank (values 0 to 1) a div takes. The average of the @r
's will be calculated in the merged <div>
, so that sorting can take place. In some cases, that @r-avg
can be misleading, since it excludes any outliers of @r
(to avoid the undue influence of <div>
s inserted via realignment or of sources that have the work in only a fragmentary state), but the data needed to recalculate the proper average and re-sort the <div>
s should all be present.
If, in the course of preparation, all the children <div>
s of a <div>
have been eliminated, because of <realign>
s in a TAN-A-div file, the result is a hollow <div>
, with neither <ver>
nor <div>
children. These are retained in the loop; if they are to be omitted, it should be done by whatever process handles these results.
Used by function tan:merge-tan-a-div-prepped
() tan:merge-sources
() tan:merge-source-loop
()
Relies upon tan:merge-source-loop
ŧ synthesize-merged-sources
.
Option 1 (TAN-class-1-and-2-functions)
tan:merge-sources($src-1st-da-prepped as document-node()*, $keep-sources-in-order as xs:boolean?) as document-node()?
two-parameter form of the master function below; it results in a merger of sources, but keeping text, juxtaposed in leaf divs and differentiated with new <ver src="[SOURCE NAME]"> to distinguish one version from the next
Used by template ŧ class-1-errors
Used by function tan:get-src-skeleton
() tan:merge-sources
()
Relies upon tan:merge-sources
.
Option 2 (TAN-class-1-and-2-functions)
tan:merge-sources($src-1st-da-prepped as document-node()*, $keep-text as xs:boolean, $keep-sources-in-order as xs:boolean?, $add-stats as xs:boolean?) as document-node()?
input: one or more prepped class 1 document (usually has @ref
with flatref values); a boolean indicating whether text should be kept or dropped (skeleton); and a boolean indicating whether the order of sources should be respected
output: a single document that merges the bodies of the input documents into a single structure based on the values of @ref
This function is useful for determining orphan, defective, and complete <div>
s, and in preparation of publishing TAN-A-div files. To that end, this function automatically handles <div>
s that have been marked for realignment.
This function assumes that the sources have at the bare minimum gone through the first level of preparation; that is, tei:TEI, tei:body, and tei:div have been converted to TAN equivalents, and the only tei elements in the body are in leaf divs.
Used by template ŧ class-1-errors
Used by function tan:get-src-skeleton
() tan:merge-sources
()
Relies upon tan:merge-analyzed-stats
tan:merge-source-loop
ŧ prepare-class-1-doc-for-merge
.
tan:no-outliers($numbers as xs:anyAtomicType*) as xs:anyAtomicType*
Input: any sequence of numbers
Output: the same sequence, without outliers
Used by function tan:synthesize-merged-group
()
Relies upon tan:outliers
.
tan:normalize-div-text($div-strings as xs:string*) as xs:string*
Input: any sequence of strings
Output: the same sequence, normalized according to TAN rules. Each item in the sequence is space normalized and then if its end matches one of the special div-end characters, ZWJ U+200D or SOFT HYPHEN U+AD, the character is removed; otherwise a space is added at the end. Zero-length strings are skipped.
This function is designed specifically for TAN's commitment to nonmixed content. That is, every TAN element contains either elements or non-whitespace text but not both, which also means that whitespace text nodes are effectively ignored. It is assumed that every TAN element is followed by a notional whitespace.
Used by template ŧ compare-copies
ŧ get-mismatched-text
Used by function tan:text-join
()
Relies upon $special-end-div-chars-regex
.
tan:number-sort($numbers as xs:anyAtomicType*) as xs:double*
Input: any sequence of items
Output: the same sequence, sorted with string numerals converted to numbers
Used by function tan:outliers
()
Does not rely upon global variables, keys, functions, or templates.
tan:number-type($strings as xs:string*) as xs:string*
Version of tan:strings-to-numeral-or-numeral-type
() that fetches merely the numeral type
Used by function tan:get-n-types
()
Relies upon tan:strings-to-numeral-or-numeral-type
.
tan:outliers($numbers as xs:anyAtomicType*) as xs:anyAtomicType*
Input: any sequence of numbers
Output: outliers in the sequence,
Used by function tan:no-outliers
()
Relies upon tan:number-sort
tan:median
.
tan:string-length($input as xs:string?) as xs:integer
Input: any string
Output: the number of characters in the string, as defined by TAN (i.e., modifiers are counted with the preceding base character)
Used by template ŧ c1-stamp-string-length
Relies upon tan:chop-string
.
tan:strings-to-numeral-or-numeral-type($strings as xs:string*, $convert-to-arabic as xs:boolean, $treat-ambiguous-a-or-i-type-as-roman-numeral as xs:boolean?, $preface-ambiguous-numeral-with-negative-sign as xs:boolean) as xs:string*
Input: any sequence of strings that may be a numeral type, and an indication whether what should be returned is not the type but the Arabic numeral equivalent (as a string)
Output: the same number of strings, with the value of either the $n-type
that is the first match or the Arabic numeral equivalent
In general, Roman numerals are checked first, strings last ('i' = 1 not 9); mixed numeral types result in hyphen-joined Arabic numerals (e.g., 1a - > 1-1)
Used by function tan:arabic-numerals
() tan:arabic-numerals
() tan:number-type
()
Relies upon $n-type
tan:rom-to-int
$n-type-pattern
tan:aaa-to-int
tan:letter-to-number
.
tan:synthesize-merged-group($current-group as element()*, $add-stats as xs:boolean?) as element()?
Input: a group of elements that share the same @ref
; a parameter indicating whether stats, if present, should be added
Output: a single element that merges the content of the grouped element
This function is intended solely for the template synthesize-src-skeleton, to handle in identical ways content that has been chosen and ordered differently.
Used by template ŧ synthesize-merged-sources
Relies upon tan:merge-analyzed-stats
tan:no-outliers
.
Option 1 (TAN-class-1-and-2-functions)
tan:text-join($items as item()*) as xs:string
Used by template ŧ c1-stamp-string-length
ŧ tokenize-prepped-class-1
ŧ class-1-errors
Used by function tan:text-join
() tan:div-to-div-transfer
() tan:compare-copies
()
Relies upon tan:text-join
.
Option 2 (TAN-class-1-and-2-functions)
tan:text-join($items as item()*, $prep-end as xs:boolean) as xs:string
Input: any number of elements, text nodes, or strings; a boolean indicating whether the end of the sequence should also be prepared
Output: a single string that joins and normalizes them according to TAN rules: if the item is (1) a <tok>
or <non-tok>
that has following siblings or (2) the last leaf element and $prep-end
is false then the bare text is used; otherwise the text return follows the rules of tan:normalize-div-text
()
If the second parameter is true, then the end of the resultant string is checked for special div-end characters
Used by template ŧ c1-stamp-string-length
ŧ tokenize-prepped-class-1
ŧ class-1-errors
Used by function tan:text-join
() tan:div-to-div-transfer
() tan:compare-copies
()
Relies upon tan:normalize-div-text
tan:normalize-text
.