TAN-class-1-and-2 global variables, keys, and functions summarized

Definition: '\P{M}\p{M}*'

Used by function tan:chop-string()

Does not rely upon global variables, keys, functions, or templates.

Definition: '­'

Used by variable $special-end-div-chars

Does not rely upon global variables, keys, functions, or templates.

Definition: $token-definitions-reserved[following-sibling::tan:name = 'nonspace']

Used by function tan:remodel-div-ref()

Relies upon $token-definitions-reserved .

Definition: '‍'

Used by variable $special-end-div-chars

Does not rely upon global variables, keys, functions, or templates.

Option 1 (TAN-class-1-and-2-functions)

tan:analyze-string-length($resolved-class-1-doc-or-fragment as item()*) as item()*

One-parameter function of the two-parameter version below

Used by variable $self-class-1-errors-marked

Used by function tan:remodel-div-ref() tan:analyze-string-length()

Relies upon tan:analyze-string-length .

Option 2 (TAN-class-1-and-2-functions)

tan:analyze-string-length($resolved-class-1-doc-or-fragment as item()*, $mark-only-leaf-divs as xs:boolean) as item()*

Input: any class-1 document or fragment; an indication whether string lengths should be added only to leaf divs, or to every div.

Output: the same document, with @string-length and @string-pos added to every div

Function to calculate string lengths of each leaf elements and their relative position, so that a raw text can be segmented proportionally and given the structure of a model exemplar. NB: any $special-end-div-chars that terminate a <div> not only will not be counted, but the

assumed space that follows will also not be counted. On the other hand, the lack of a special

character at the end means that the nominal space that follows a div will be included in both

the length and the position. Thus input...

<div type="m" n="1">abc&#xad;</div>

<div type="m" n="2">def&#x200d;</div>

<div type="m" n="3">ghi</div>

<div type="m" n="4">xyz</div>

...presumes a raw joined text of "abcdefghi xyz ", and so becomes output:

<div type="m" n="1" string-length="3" string-pos="1">abc&#xad;</div>

<div type="m" n="2" string-length="3" string-pos="4">def&#x200d;</div>

<div type="m" n="3" string-length="4" string-pos="7">ghi</div>

<div type="m" n="4" string-length="4" string-pos="11">xyz</div>

Used by variable $self-class-1-errors-marked

Used by function tan:remodel-div-ref() tan:analyze-string-length()

Relies upon ŧ c1-stamp-string-length ŧ c1-stamp-string-pos .

Option 1 (TAN-class-1-and-2-functions)

tan:arabic-numerals($strings as xs:string*) as xs:string*

Input: any strings that might be convertible to Arabic numerals, but of unknown format or type

Output: Best-guess Arabic numeral equivalents, as strings. Roman numerals take precedence over alphabet numerals (that is, 'i' is interpreted as 1, not 9)

Used by template ŧ prep-class-1

Relies upon tan:strings-to-numeral-or-numeral-type .

Option 2 (TAN-class-1-and-2-functions)

tan:arabic-numerals($strings as xs:string*, $treat-ambiguous-a-or-i-type-as-roman-numeral as xs:boolean?) as xs:string*

Input: any strings that might be convertible to Arabic numerals, plus the type they are known to conform to

Output: Best-guess Arabic numeral equivalents, as strings.

Used by template ŧ prep-class-1

Relies upon tan:strings-to-numeral-or-numeral-type .

Option 1 (TAN-class-1-and-2-functions)

tan:flatref($node as element()?) as xs:string?

Simple, one-param function of the fuller one, below

Used by template ŧ get-mismatched-text

Used by function tan:flatref() tan:get-ref-seq()

Relies upon tan:flatref .

Option 2 (TAN-class-1-and-2-functions)

tan:flatref($node as element()?, $div-types-to-suppress as xs:string*, $div-ns-to-rename as element()*) as xs:string?

Input: div node in a TAN-T(EI) document; truth value whether references that fit a number pattern should be converted to integers

Output: string value concatenating the reference values from the topmost div ancestor to the node.

This function assumes that @n has already been normalized per tan:resolve-doc(), which converts @ns to Arabic numerals wherever possible

Used by template ŧ get-mismatched-text

Used by function tan:flatref() tan:get-ref-seq()

Relies upon $separator-hierarchy .

tan:get-n-types($src-1st-da-resolved as document-node()*) as element()*

Input: any class 1 TAN documents

Calculates types of @n values per div type per source and div type

October 2016: this function used to be used for validation, but a better routine is preferred. The function is left here, however, in case it proves useful in other contexts.

Used by template ŧ prep-class-2-doc-pass-2

Relies upon tan:number-type $n-type .

tan:merge-source-loop($not-fully-merged-source as document-node()?, $so-far-merged-to-what-depth as xs:integer, $add-stats as xs:boolean?, $order-of-source-ids as xs:string*) as document-node()?

Input: a rough merge (the result of tan:merge-source()); an initial depth (usually 1), a boolean indicating whether statistics, if present, should be added or if the sum of tail should be subtracted from the head, and a list of source ids (only if the order of sources should be respected)

Output: a single document that joins sibling <div>s that share a common @ref. Further, if any statistics are present and $add-stats is true, then the matching attributes in merged <d>s are added or checked for differences, as required. If $add-stats is false then the statistics are subtracted (the head of the sequence minus the sum of the tail of the sequence)

No special provision is made for the order of synthesized <div>s; to control for order, the input unmerged sources in every <div> should have an @r that specifies the relative rank (values 0 to 1) a div takes. The average of the @r's will be calculated in the merged <div>, so that sorting can take place. In some cases, that @r-avg can be misleading, since it excludes any outliers of @r (to avoid the undue influence of <div>s inserted via realignment or of sources that have the work in only a fragmentary state), but the data needed to recalculate the proper average and re-sort the <div>s should all be present.

If, in the course of preparation, all the children <div>s of a <div> have been eliminated, because of <realign>s in a TAN-A-div file, the result is a hollow <div>, with neither <ver> nor <div> children. These are retained in the loop; if they are to be omitted, it should be done by whatever process handles these results.

Used by function tan:merge-tan-a-div-prepped() tan:merge-sources() tan:merge-source-loop()

Relies upon tan:merge-source-loop ŧ synthesize-merged-sources .

Option 1 (TAN-class-1-and-2-functions)

tan:merge-sources($src-1st-da-prepped as document-node()*, $keep-sources-in-order as xs:boolean?) as document-node()?

two-parameter form of the master function below; it results in a merger of sources, but keeping text, juxtaposed in leaf divs and differentiated with new <ver src="[SOURCE NAME]"> to distinguish one version from the next

Used by template ŧ class-1-errors

Used by function tan:get-src-skeleton() tan:merge-sources()

Relies upon tan:merge-sources .

Option 2 (TAN-class-1-and-2-functions)

tan:merge-sources($src-1st-da-prepped as document-node()*, $keep-text as xs:boolean, $keep-sources-in-order as xs:boolean?, $add-stats as xs:boolean?) as document-node()?

input: one or more prepped class 1 document (usually has @ref with flatref values); a boolean indicating whether text should be kept or dropped (skeleton); and a boolean indicating whether the order of sources should be respected

output: a single document that merges the bodies of the input documents into a single structure based on the values of @ref

This function is useful for determining orphan, defective, and complete <div>s, and in preparation of publishing TAN-A-div files. To that end, this function automatically handles <div>s that have been marked for realignment.

This function assumes that the sources have at the bare minimum gone through the first level of preparation; that is, tei:TEI, tei:body, and tei:div have been converted to TAN equivalents, and the only tei elements in the body are in leaf divs.

Used by template ŧ class-1-errors

Used by function tan:get-src-skeleton() tan:merge-sources()

Relies upon tan:merge-analyzed-stats tan:merge-source-loop ŧ prepare-class-1-doc-for-merge .

tan:synthesize-merged-group($current-group as element()*, $add-stats as xs:boolean?) as element()?

Input: a group of elements that share the same @ref; a parameter indicating whether stats, if present, should be added

Output: a single element that merges the content of the grouped element

This function is intended solely for the template synthesize-src-skeleton, to handle in identical ways content that has been chosen and ordered differently.

Used by template ŧ synthesize-merged-sources

Relies upon tan:merge-analyzed-stats tan:no-outliers .

Option 1 (TAN-class-1-and-2-functions)

tan:text-join($items as item()*) as xs:string

Used by template ŧ c1-stamp-string-length ŧ tokenize-prepped-class-1 ŧ class-1-errors

Used by function tan:text-join() tan:div-to-div-transfer() tan:compare-copies()

Relies upon tan:text-join .

Option 2 (TAN-class-1-and-2-functions)

tan:text-join($items as item()*, $prep-end as xs:boolean) as xs:string

Input: any number of elements, text nodes, or strings; a boolean indicating whether the end of the sequence should also be prepared

Output: a single string that joins and normalizes them according to TAN rules: if the item is (1) a <tok> or <non-tok> that has following siblings or (2) the last leaf element and $prep-end is false then the bare text is used; otherwise the text return follows the rules of tan:normalize-div-text()

If the second parameter is true, then the end of the resultant string is checked for special div-end characters

Used by template ŧ c1-stamp-string-length ŧ tokenize-prepped-class-1 ŧ class-1-errors

Used by function tan:text-join() tan:div-to-div-transfer() tan:compare-copies()

Relies upon tan:normalize-div-text tan:normalize-text .