Definition: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'
Used by function tan:dec-to-hex
() tan:hex-to-dec
()
Does not rely upon global variables, keys, functions, or templates.
tan:dec-to-hex($in as xs:integer) as xs:string
Change any integer into a hexadecimal string
Input: xs:integer
Output: hexadecimal equivalent as a string
E.g., 31 - > '1F'
Used by function tan:dec-to-hex
()
Relies upon tan:dec-to-hex
$hex-key
.
tan:expand-search($regex as xs:string?) as xs:string?
This function takes a string representation of a regular expression pattern and replaces every unescaped
character with a character class that lists all Unicode characters that would recursively decompose to that base
character.
E.g., 'word' - > '[wŵʷẁẃẅẇẉẘⓦw𝐰𝑤𝒘𝓌𝔀𝔴𝕨𝖜𝗐𝘄𝘸𝙬𝚠][oºòóôõöōŏőơǒǫǭȍȏȫȭȯȱᵒṍṏṑṓọỏốồổỗộớờởỡợₒℴⓞ㍵o𝐨𝑜𝒐𝓸𝔬𝕠𝖔𝗈𝗼𝘰𝙤𝚘][rŕŗřȑȓʳᵣṙṛṝṟⓡ㎭㎮㎯r𝐫𝑟𝒓𝓇𝓻𝔯𝕣𝖗𝗋𝗿𝘳𝙧𝚛][dďdždzᵈḋḍḏḑḓⅆⅾⓓ㍲㍷㍸㍹㎗㏈d𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍]'
This function is useful for cases where it is more efficient to change the search term rather than to transform
the text to be searched into base characters.
No variables, keys, functions, or named templates depend upon this xsl:function.
Relies upon tan:get-ucd-decomp
$regex-escaping-characters
tan:string-composite
ŧ add-square-brackets
.
tan:get-ucd-decomp()
Used by function tan:string-base
() tan:string-composite
() tan:expand-search
()
Does not rely upon global variables, keys, functions, or templates.
tan:hex-to-dec($hex as xs:string?) as item()*
Change any hexadecimal string into an integer
E.g., '1F' - > 31
Used by function tan:process-regex-escape-k
()
Relies upon $hex-key
.
Option 1 (regex-ext-tan-functions)
tan:matches($input as xs:string?, $pattern as xs:string) as xs:boolean
two-param function of the three-param version below
Used by function tan:obeyed-by-m
() tan:get-toks
() tan:matches
()
Relies upon tan:matches
.
Option 2 (regex-ext-tan-functions)
tan:matches($input as xs:string?, $pattern as xs:string, $flags as xs:string) as xs:boolean
Parallel to fn:matches(
), but converts TAN-exceptions into classes. See tan:regex
() for details.
Used by function tan:obeyed-by-m
() tan:get-toks
() tan:matches
()
Relies upon tan:regex
.
tan:process-regex-escape-k($val-inside-braces as xs:string, $unicode-db as document-node()) as xs:string?
Used by function tan:regex
()
Relies upon tan:hex-to-dec
.
tan:regex($regex as xs:string?) as xs:string?
Input: string of a regex search
Output: the same string, with TAN-reserved escape sequences replaced by characters class sequences
E.g., '\k{.greek.capital.perispomeni}' - - > '[ἎἏἮἯἾἿὟὮὯᾎᾏᾞᾟᾮᾯ]'
\k{.latin.cedilla} - - > '[ÇçĢģĶķĻļŅņŖŗŞşŢţȨȩᷗḈḉḐḑḜḝḨḩ]'
'angle \k{4d-4f, 51}' - - > 'angle [MNOQ]'
This function grabs entire classes of Unicode characters either by their codepoint or by the parts of
their name. It performs specially upon the form \k{***VALUE***}, where ***VALUE*** is either (1) one or
more hexadecimal numbers joined by commas and hyphens or (2) one or more words each one prepended by a
non-word character. In the first option, there will be returned every Unicode character that has been
picked, filling in ranges where indicated by the hyphen. In the second option, there will be returned
every Unicode character that has all of those words in its official Unicode name, or alias.
Other examples:
Any word with an omega, even if not in any of the Greek blocks: '\k{.omega}' (useful if you
wish to find nonstandard uses of the omega, especially in the symbol block)
Any word with two successive omegas, no matter their accentuation or capitalizaton, or if they
have an iota subscript: '\k{.greek.omega}{2}' (useful for looking up a Greek word where accentuation
changes depending upon context or inflection)
Every Greek word that attracts an accent from an enclitic:
'[\k{.greek.oxia}\k{.greek.tonos}\k{.greek.perispomeni}]\w*[\k{.greek.tonos}\k{.greek.oxia}]'
Used by function tan:matches
() tan:replace
() tan:tokenize
()
Relies upon tan:process-regex-escape-k
ŧ add-square-brackets
.
Option 1 (regex-ext-tan-functions)
tan:replace($input as xs:string?, $pattern as xs:string, $replacement as xs:string) as xs:string
three-param function of the four-param version below
Used by function tan:batch-replace
() tan:replace
()
Relies upon tan:replace
.
Option 2 (regex-ext-tan-functions)
tan:replace($input as xs:string?, $pattern as xs:string, $replacement as xs:string, $flags as xs:string) as xs:string
Parallel to fn:replace(
), but converts TAN-exceptions into classes. See tan:regex
() for details.
Used by function tan:batch-replace
() tan:replace
()
Relies upon tan:regex
.
tan:string-base($arg as xs:string?) as xs:string?
This function takes any string and replaces every character with its base Unicode character.
E.g., ἀνθρὠπους - > ανθρωπουσ
This is useful for preparing text to be searched without respect to accents
No variables, keys, functions, or named templates depend upon this xsl:function.
Relies upon tan:get-ucd-decomp
.
tan:string-composite($arg as xs:string?) as xs:string?
This function is the inverse of tan:string-base, in that it replaces every character with
those Unicode characters that use it as a base. If none exist, then the character itself is
returned.
E.g., 'Max' - > 'MᴹḾṀṂℳⅯⓂ㎆㎒㎫㎹㎿㏁M𝐌𝑀𝑴𝓜𝔐𝕄𝕸𝖬𝗠𝘔𝙈𝙼🄼🅋🅪🅫aªàáâãäåāăąǎǟǡǻȁȃȧᵃḁẚạảấầẩẫậắằẳẵặₐ℀℁ⓐ㏂a𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊xˣẋẍₓⅹⅺⅻⓧx𝐱𝑥𝒙𝓍𝔁𝔵𝕩𝖝𝗑𝘅𝘹𝙭𝚡'
This is useful for preparing regex character classes to broaden a search.
Used by function tan:expand-search
()
Relies upon tan:get-ucd-decomp
.
Option 1 (regex-ext-tan-functions)
tan:tokenize($input as xs:string?, $pattern as xs:string) as xs:string*
two-param function of the three-param version below
Used by function tan:tokenize
()
Relies upon tan:tokenize
.
Option 2 (regex-ext-tan-functions)
tan:tokenize($input as xs:string?, $pattern as xs:string, $flags as xs:string) as xs:string*
Parallel to fn:tokenize(
), but converts TAN-exceptions into classes. See tan:regex
() for details.
Used by function tan:tokenize
()
Relies upon tan:regex
.