TAN-class-1-and-2 global variables, keys, and functions summarized

Function to calculate string lengths of each leaf elements and their relative position, so that a raw text can be segmented proportionally and given the structure of a model exemplar. NB: any $special-end-div-chars that terminate a <div> not only will not be counted, but the

assumed space that follows will also not be counted. On the other hand, the lack of a special

character at the end means that the nominal space that follows a div will be included in both

the length and the position. Thus input...

...presumes a raw joined text of "abcdefghi xyz ", and so becomes output:

Used by variable $self-class-1-errors-marked

Used by function tan:remodel-div-ref() tan:analyze-string-length()

Relies upon ŧ c1-stamp-string-length ŧ c1-stamp-string-pos .

`tan:arabic-numerals()`

Option 1 (TAN-class-1-and-2-functions)

tan:arabic-numerals($strings as xs:string*) as xs:string*

Input: any strings that might be convertible to Arabic numerals, but of unknown format or type

Output: Best-guess Arabic numeral equivalents, as strings. Roman numerals take precedence over alphabet numerals (that is, 'i' is interpreted as 1, not 9)

Used by template ŧ prep-class-1

Relies upon tan:strings-to-numeral-or-numeral-type .

Option 2 (TAN-class-1-and-2-functions)

tan:arabic-numerals($strings as xs:string*, $treat-ambiguous-a-or-i-type-as-roman-numeral as xs:boolean?) as xs:string*

Input: any strings that might be convertible to Arabic numerals, plus the type they are known to conform to

Output: Best-guess Arabic numeral equivalents, as strings.

Used by template ŧ prep-class-1

Relies upon tan:strings-to-numeral-or-numeral-type .

`tan:chop-string()`

tan:chop-string($input as xs:string?) as xs:string*

Input: any string

Output: that string chopped into a sequence of strings, following TAN rules about modifying characters

Used by template ŧ mark-tok-chars ŧ char-setup

Used by function tan:string-length()

Relies upon $char-reg-exp .

`tan:div-to-div-transfer()`

tan:div-to-div-transfer($divs-with-content-to-be-transferred as element()*, $divs-to-be-infused-with-new-content as element()*) as element()*

Input: (1) any set of divs with content to be transferred into the structure of (2) another set of divs.

Output: The div structure of (2), infused with the content of (1). The content is allocated proportionately, with preference given to punctuation, within a certain range, and then word breaks.

This function is useful for transforming class-1 documents from one reference system to another. It starts by getting the text content of (1), then string values for (2).

No variables, keys, functions, or named templates depend upon this xsl:function.

Relies upon tan:text-join ŧ c1-stamp-string-length ŧ c1-stamp-string-pos ŧ infuse-tokenized-text ŧ strip-all-attributes-except .

`tan:flatref()`

Option 1 (TAN-class-1-and-2-functions)

tan:flatref($node as element()?) as xs:string?

Simple, one-param function of the fuller one, below

Used by template ŧ get-mismatched-text

Used by function tan:flatref() tan:get-ref-seq()

Relies upon tan:flatref .

Option 2 (TAN-class-1-and-2-functions)

tan:flatref($node as element()?, $div-types-to-suppress as xs:string*, $div-ns-to-rename as element()*) as xs:string?

Input: div node in a TAN-T(EI) document; truth value whether references that fit a number pattern should be converted to integers

Output: string value concatenating the reference values from the topmost div ancestor to the node.

This function assumes that @n has already been normalized per tan:resolve-doc(), which converts @ns to Arabic numerals wherever possible

Used by template ŧ get-mismatched-text

Used by function tan:flatref() tan:get-ref-seq()

Relies upon $separator-hierarchy .

`tan:get-n-types()`

tan:get-n-types($src-1st-da-resolved as document-node()*) as element()*

Input: any class 1 TAN documents

Calculates types of @n values per div type per source and div type

October 2016: this function used to be used for validation, but a better routine is preferred. The function is left here, however, in case it proves useful in other contexts.

Used by template ŧ prep-class-2-doc-pass-2

Relies upon tan:number-type $n-type .

`tan:get-src-skeleton()`

tan:get-src-skeleton($src-1st-da-prepped as document-node()*) as document-node()?

one-parameter form of the master version below; it results in a merger of sources, but without text and empty leaf divs

Used by function tan:prep-verbosely()

Relies upon tan:merge-sources .

`tan:median()`

tan:median($numbers as xs:double*) as xs:double?

Input: any sequence of numbers

Output: the median value

It is assumed that the input has already been sorted by tan:numbers-sorted() vel sim

Used by function tan:outliers()

Does not rely upon global variables, keys, functions, or templates.

`tan:merge-analyzed-stats()`

tan:merge-analyzed-stats($analyzed-stats as element()*, $add-stats as xs:boolean?) as element()

Takes a group of elements that follow the pattern that results from tan:analyze-stats and synthesizes them into a single element. If $add-stats is true, then they are added; if false, the sum of the 2nd - last elements is subtracted from the first; if neither true nor false, nothing happens. Will work on elements of any name, so long as they have tan:d children, with the data points to be merged.

Used by function tan:merge-sources() tan:synthesize-merged-group()

Relies upon tan:error tan:analyze-stats .

`tan:merge-source-loop()`

tan:merge-source-loop($not-fully-merged-source as document-node()?, $so-far-merged-to-what-depth as xs:integer, $add-stats as xs:boolean?, $order-of-source-ids as xs:string*) as document-node()?

Input: a rough merge (the result of tan:merge-source()); an initial depth (usually 1), a boolean indicating whether statistics, if present, should be added or if the sum of tail should be subtracted from the head, and a list of source ids (only if the order of sources should be respected)

Output: a single document that joins sibling <div>s that share a common @ref. Further, if any statistics are present and $add-stats is true, then the matching attributes in merged <d>s are added or checked for differences, as required. If $add-stats is false then the statistics are subtracted (the head of the sequence minus the sum of the tail of the sequence)

No special provision is made for the order of synthesized <div>s; to control for order, the input unmerged sources in every <div> should have an @r that specifies the relative rank (values 0 to 1) a div takes. The average of the @r's will be calculated in the merged <div>, so that sorting can take place. In some cases, that @r-avg can be misleading, since it excludes any outliers of @r (to avoid the undue influence of <div>s inserted via realignment or of sources that have the work in only a fragmentary state), but the data needed to recalculate the proper average and re-sort the <div>s should all be present.

If, in the course of preparation, all the children <div>s of a <div> have been eliminated, because of <realign>s in a TAN-A-div file, the result is a hollow <div>, with neither <ver> nor <div> children. These are retained in the loop; if they are to be omitted, it should be done by whatever process handles these results.

Used by function tan:merge-tan-a-div-prepped() tan:merge-sources() tan:merge-source-loop()

Relies upon tan:merge-source-loop ŧ synthesize-merged-sources .

`tan:merge-sources()`

Option 1 (TAN-class-1-and-2-functions)

tan:merge-sources($src-1st-da-prepped as document-node()*, $keep-sources-in-order as xs:boolean?) as document-node()?

two-parameter form of the master function below; it results in a merger of sources, but keeping text, juxtaposed in leaf divs and differentiated with new <ver src="[SOURCE NAME]"> to distinguish one version from the next

Used by template ŧ class-1-errors

Used by function tan:get-src-skeleton() tan:merge-sources()

Relies upon tan:merge-sources .

Option 2 (TAN-class-1-and-2-functions)

tan:merge-sources($src-1st-da-prepped as document-node()*, $keep-text as xs:boolean, $keep-sources-in-order as xs:boolean?, $add-stats as xs:boolean?) as document-node()?

input: one or more prepped class 1 document (usually has @ref with flatref values); a boolean indicating whether text should be kept or dropped (skeleton); and a boolean indicating whether the order of sources should be respected

output: a single document that merges the bodies of the input documents into a single structure based on the values of @ref

This function is useful for determining orphan, defective, and complete <div>s, and in preparation of publishing TAN-A-div files. To that end, this function automatically handles <div>s that have been marked for realignment.

This function assumes that the sources have at the bare minimum gone through the first level of preparation; that is, tei:TEI, tei:body, and tei:div have been converted to TAN equivalents, and the only tei elements in the body are in leaf divs.

Used by template ŧ class-1-errors

Used by function tan:get-src-skeleton() tan:merge-sources()

Relies upon tan:merge-analyzed-stats tan:merge-source-loop ŧ prepare-class-1-doc-for-merge .

`tan:no-outliers()`

tan:no-outliers($numbers as xs:anyAtomicType*) as xs:anyAtomicType*

Input: any sequence of numbers

Output: the same sequence, without outliers

Used by function tan:synthesize-merged-group()

Relies upon tan:outliers .

`tan:normalize-div-text()`

tan:normalize-div-text($div-strings as xs:string*) as xs:string*

Input: any sequence of strings

Output: the same sequence, normalized according to TAN rules. Each item in the sequence is space normalized and then if its end matches one of the special div-end characters, ZWJ U+200D or SOFT HYPHEN U+AD, the character is removed; otherwise a space is added at the end. Zero-length strings are skipped.

This function is designed specifically for TAN's commitment to nonmixed content. That is, every TAN element contains either elements or non-whitespace text but not both, which also means that whitespace text nodes are effectively ignored. It is assumed that every TAN element is followed by a notional whitespace.

Used by template ŧ compare-copies ŧ get-mismatched-text

Used by function tan:text-join()

Relies upon $special-end-div-chars-regex .

`tan:number-sort()`

tan:number-sort($numbers as xs:anyAtomicType*) as xs:double*

Input: any sequence of items

Output: the same sequence, sorted with string numerals converted to numbers

Used by function tan:outliers()

Does not rely upon global variables, keys, functions, or templates.

`tan:number-type()`

tan:number-type($strings as xs:string*) as xs:string*

Version of tan:strings-to-numeral-or-numeral-type() that fetches merely the numeral type

Used by function tan:get-n-types()

Relies upon tan:strings-to-numeral-or-numeral-type .

`tan:outliers()`

tan:outliers($numbers as xs:anyAtomicType*) as xs:anyAtomicType*

Input: any sequence of numbers

Output: outliers in the sequence,

Used by function tan:no-outliers()

Relies upon tan:number-sort tan:median .

`tan:string-length()`

tan:string-length($input as xs:string?) as xs:integer

Input: any string

Output: the number of characters in the string, as defined by TAN (i.e., modifiers are counted with the preceding base character)

Used by template ŧ c1-stamp-string-length

Relies upon tan:chop-string .

`tan:strings-to-numeral-or-numeral-type()`

tan:strings-to-numeral-or-numeral-type($strings as xs:string*, $convert-to-arabic as xs:boolean, $treat-ambiguous-a-or-i-type-as-roman-numeral as xs:boolean?, $preface-ambiguous-numeral-with-negative-sign as xs:boolean) as xs:string*

Input: any sequence of strings that may be a numeral type, and an indication whether what should be returned is not the type but the Arabic numeral equivalent (as a string)

Output: the same number of strings, with the value of either the $n-type that is the first match or the Arabic numeral equivalent

In general, Roman numerals are checked first, strings last ('i' = 1 not 9); mixed numeral types result in hyphen-joined Arabic numerals (e.g., 1a - > 1-1)

Used by function tan:arabic-numerals() tan:arabic-numerals() tan:number-type()

Relies upon $n-type tan:rom-to-int $n-type-pattern tan:aaa-to-int tan:letter-to-number .

`tan:synthesize-merged-group()`

tan:synthesize-merged-group($current-group as element()*, $add-stats as xs:boolean?) as element()?

Input: a group of elements that share the same @ref; a parameter indicating whether stats, if present, should be added

Output: a single element that merges the content of the grouped element

This function is intended solely for the template synthesize-src-skeleton, to handle in identical ways content that has been chosen and ordered differently.

Used by template ŧ synthesize-merged-sources

Relies upon tan:merge-analyzed-stats tan:no-outliers .

`tan:text-join()`

Option 1 (TAN-class-1-and-2-functions)

tan:text-join($items as item()*) as xs:string

Used by template ŧ c1-stamp-string-length ŧ tokenize-prepped-class-1 ŧ class-1-errors

Used by function tan:text-join() tan:div-to-div-transfer() tan:compare-copies()

Relies upon tan:text-join .

Option 2 (TAN-class-1-and-2-functions)

tan:text-join($items as item()*, $prep-end as xs:boolean) as xs:string

Input: any number of elements, text nodes, or strings; a boolean indicating whether the end of the sequence should also be prepared

Output: a single string that joins and normalizes them according to TAN rules: if the item is (1) a <tok> or <non-tok> that has following siblings or (2) the last leaf element and $prep-end is false then the bare text is used; otherwise the text return follows the rules of tan:normalize-div-text()

If the second parameter is true, then the end of the resultant string is checked for special div-end characters

Used by template ŧ c1-stamp-string-length ŧ tokenize-prepped-class-1 ŧ class-1-errors

Used by function tan:text-join() tan:div-to-div-transfer() tan:compare-copies()

Relies upon tan:normalize-div-text tan:normalize-text .

Prev	Up	Next
TAN-class-2-errors global variables, keys, and functions summarized	Home	TAN-key global variables, keys, and functions summarized