The element token-definition
takes a regular expression to define a word token. This element will be used to segment a string into token and non-token components.
This element takes attributes that function as the parameters for the function xsl:analyze-string (see https://www.w3.org/TR/xslt-30/#element-analyze-string).
For more see the section called “Defining Words and Tokens”
Formal Definition
~ed-stamp
?, (~inclusion
| ( {[TAN-A-lm (~sources-ref
):] {empty}} OR {[TAN-class-2 (~sources-ref
):]@src
} OR {[TAN-core (~sources-ref
):] {empty}}, (@which
| (@pattern
,@flags
?))))
Defined at:
TAN-core.rng
Used by: ~defn-class-1
, ~definition-class-2
, ~entity-tok-def
Caution | |
---|---|
No source may be given more than one token definition. |
Example 8.215. <token-definition>
<definitions>
<comment when="2016-02-22-05:00" who="park">The following token definition treats the
following as words: sequences of letters, any individual character that is neither a
letter nor a space (i.e., punctuation).</comment>
<token-definition src="eng-us" pattern="[-\w]+"/>
<person xml:id="park">
.........
</person>
.........
</definitions>
Note | |
---|---|
Taken from ringoroses.div.1 |
Example 8.216. <token-definition>
<definitions>
<token-definition pattern="[\w]+"/>
<lexicon xml:id="LSJ">
.........
</lexicon>
.........
</definitions>
Note | |
---|---|
Example 8.217. <token-definition>
<definitions>
.........
<lexicon xml:id="english">
.........
</lexicon>
<token-definition which="letters and punctuation"/>
<person xml:id="park">
.........
</person>
.........
</definitions>
Note | |
---|---|
Taken from ring-o-roses.eng.1881.lm |
Example 8.218. <token-definition>
<definitions>
.........
<reuse-type xml:id="adaptation">
.........
</reuse-type>
<token-definition src="ring1881 ring1987" which="letters"/>
<person xml:id="park">
.........
</person>
.........
</definitions>
Note | |
---|---|
Taken from ringoroses.01+02.token.1 |