The element rename
indicates the name of a <div>
@n
that should be changed in a given @type
, and the name to which it should be changed.
There is no need to use this feature to convert Roman, alphabetic, or other numerals, which are detected and converted automatically
Formal Definition
@old
,@new
Used by: ~decl-rename-div-n
Caution | |
---|---|
|
Example 8.140. <rename>
<rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="e" new="4"/> </rename-div-ns>
Note | |
---|---|
Taken from ringoroses.div.1 |
Example 8.141. <rename>
<rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="5" new="4"/> </rename-div-ns>
Note | |
---|---|
Taken from ringoroses.01+03.token.2 |
The element rename-div-ns
provisionally reassigns @n
values for one or more sources and one or more div types. Renaming applies only to the current file.
This element is especially useful for converting Roman numerals or letter numerals into Arabic numerals. See <rename>
for syntax.
This feature is strictly speaking a convenience, not a necessity. All TAN-compliant preprocessors are required to automatically detect Roman and alphabetic numbering systems and treat them as Arabic numerals.
It is also useful for div types that use descriptive names for @n
(such as books of the Bible), particularly for reconciling those names with a system that prevails or is preferred (e.g., "mt" to "Matt").
Note for TAN-A-div users: Although this element can reconcile simple differences, it should not be used for more complex inconsistencies that affect alignment, best handled in the <body>
of a TAN-A-div file.
For more inforrmation see the section called “Class 2 Metadata (<head>)”
Formal Definition
~ed-stamp
?, (~inclusion
| ( {[TAN-class-2 (~source-refs
):]@src
} OR {[TAN-core (~source-refs
):] {empty}} OR {[TAN-LM-core (~source-refs
):] {empty}},@div-type-ref
,<rename>
+))
Used by: ~declaration-items
Caution | |
---|---|
Every div type reference must be valid in every source |
Example 8.142. <rename-div-ns>
<declarations> ......... <token-definition src="eng-us" regex="[-\w]+"/> <rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="e" new="4"/> </rename-div-ns> </declarations>
Note | |
---|---|
Taken from ringoroses.div.1 |
Example 8.143. <rename-div-ns>
<declarations> ......... <token-definition src="eng ger" which="letters and punctuation"/> <rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="5" new="4"/> </rename-div-ns> </declarations>
Note | |
---|---|
Taken from ringoroses.01+03.token.2 |
The element suppress-div-types
marks div types in a source that should be suppressed in references. Suppressions occur shallowly. That is, it does not suppress any descendants of that div type. But if the suppression applies to a leaf div, that div and its text is effectively suppressed.
Any suppression of a div type must preserve the Leaf Div Uniqueness Rule (LDUR). See the section called “Flattened References, and the Leaf Div Uniqueness Rule”
This element will be used seldomly, for cases where a source has a div type that is dispensable in text references.
Formal Definition
~ed-stamp
?, (~inclusion
| ( {[TAN-class-2 (~source-refs
):]@src
} OR {[TAN-core (~source-refs
):] {empty}} OR {[TAN-LM-core (~source-refs
):] {empty}},@div-type-ref
))
Used by: ~declaration-items
Caution | |
---|---|
Every div type reference must be valid in every source |
Example 8.144. <suppress-div-types>
<declarations> <suppress-div-types src="fra" div-type-ref="sec"/> </declarations>
Note | |
---|---|
Taken from ar.cat.tan-a-div |
Example 8.145. <suppress-div-types>
<declarations> <suppress-div-types src="eng-1790" div-type-ref="poem"/> <comment when="2016-02-22-05:00" who="park">The following token definition treats the following as words: sequences of letters, any individual character that is neither a letter nor a space (i.e., punctuation).</comment> ......... </declarations>
Note | |
---|---|
Taken from ringoroses.div.1 |
The element tok
identifies one or more words or word fragments. Used by class 2 files to make assertions about specific words.
In TAN-A-div and TAN-A-tok files, <tok>
has no linguistic connotations; in TAN-LM, it normally does.
<tok>
s are two types: simple and complex.
SIMPLE: <tok>
s that are restricted to a single token, or a portion of a single token. This is the normal behavior of <tok>
. Multiple values in @src
, @ref
, and @pos
will result in expansion across all values. But multiple values of @chars
are taken to refer to the constituent parts of a single <tok>
and so no expansion occurs on @chars.
For example, a <tok>
with 2 values for @src
, 3 for @ref
, 4 for @pos
, and 5 for @chars
will result in a <tok>
that points to 24 tokens, each of which is filtered to the same five characters (by position, not content). This syntax, then, allows multiple <tok>
s to be collapsed into a single one, to save space and perhaps enhance legibility. Put another way, <tok src="X" ref="a" pos="1"/> and <tok src="X" ref="a" pos="2"/> is always identical to <tok src="X" ref="a" pos="1-2"/>
COMPLEX: There are cases where one wishes to treat more than one token, in whole or part, as a single entity. In this case, @cont
should be used, and it must join <tok>
s that have only single values for @src
, @ref
, and @pos.
@chars
may take multiple values.
The behavior of <tok>
differs from <div-ref>
. The former is never treated as a group, whereas the latter is. For more, see <div-ref>
.
Formal Definition
~tok-attr-core
, {[TAN-A-div (~tok-source-ref-opt
):] {empty}} OR {[TAN-class-2 (~tok-source-ref-opt
):] {{[TAN-class-2 (~source-refs
):]@src
}} OR {{[TAN-core (~source-refs
):] {empty}}} OR {{[TAN-LM-core (~source-refs
):] {empty}}}}, {[TAN-LM-lang (~pointer-to-div-range
):] {empty}} OR {[TAN-class-2 (~pointer-to-div-range
):]@ref
}, (@val
| {[TAN-LM-lang (~seq-pos-ref
):] {empty}} OR {[TAN-class-2 (~seq-pos-ref
):]@pos
} | (@val
, {[TAN-LM-lang (~seq-pos-ref
):] {empty}} OR {[TAN-class-2 (~seq-pos-ref
):]@pos
})), {[TAN-A-div (~tok-cert-opt
):] {empty}} OR {[TAN-class-2 (~tok-cert-opt
):] (@cert
| (@cert
,@cert2
))?}~tok-sequence-attr-core
,@src
, {[TAN-A-div (~continuation-opt
):] {empty}} OR {[TAN-class-2 (~continuation-opt
):]@cont
}, (@cert
| (@cert
,@cert2
))?~tok-sequence-attr-core
, {[TAN-A-div (~continuation-opt
):] {empty}} OR {[TAN-class-2 (~continuation-opt
):]@cont
}~tok-sequence-attr-core
Used by: ~split
, ~complex-text-ref
, ~alignment-content-non-class-2
, ~tok-sequence
, ~TAN-LM-item
Caution | |
---|---|
Every token must be locatable in every cited ref in every source. |
Caution | |
---|---|
No source may be split more than once in the same place. |
Caution | |
---|---|
Splits may not be made at the first token in a div. |
Caution | |
---|---|
Any ana with an |
Example 8.146. <tok>
<TAN-A-div TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ar.cat.tan-a-div:claims"> ......... <body claimant="lmp"> ......... <claim subject="andronicus boethus" adverb="perhaps" verb="omits" claim-basis="dexippus porphyry"> <locus work="grc"> <tok ref="1 a 2" pos="3-4"/> </locus> </claim> <claim subject="herminus comm-omnes" verb="agrees"> <locus work="grc"> <tok ref="1 a 2" pos="3-4"/> </locus> </claim> ......... <claim subject="B" verb="replaces"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> ......... </claim> <claim subject="Λ" adverb="perhaps" verb="replaces"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> ......... </claim> <claim subject="π α φ ο" verb="agrees"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> </claim> </body> </TAN-A-div>
Note | |
---|---|
Taken from ar.cat.tan-a-div.claims |
The attribute chars
list of one or more characters, specified through Arabic numerals, the keyword 'last' or 'last-X' (where X is a valid number), joined with commas or hyphens.
Examples: '1', 'last', 'last-3 - last-1', '1, 3, 5, 7 - 11, last-8, last'
Formal Definition
string (pattern ((last|max|all|\*)|((last|max)-\d+)|(\d+))(\s*-\s*(((last|max))|((last|max)-\d+)|(\d+)))?(\s*[, ]\s*(((last|max))|((last|max)-\d+)|(\d+))(\s+-\s+(((last|max))|((last|max)-\d+)|(\d+)))?)*|.*\?\?\?.*)
Used by: ~tok-attr-core
Caution | |
---|---|
Sequences may not include values less than 1. |
Caution | |
---|---|
Sequences may not include values greater than the maximum allowed. |
Caution | |
---|---|
Sequences may not include ranges that go from a larger value to a smaller, e.g., 4 - 2. |
The attribute cont
indicates whether the current element is continued by the next one and to be treated as a single one. Value must be 1 or true, implied by the very presence of the attribute. If you wish to decare it to be false, delete the attribute altogether.
This feature is useful in <tok>
for rejoining the portion of a word split across two <div>
s, or for uniting into a single linguistic token multiple tokens separated by the tokenization process, e.g., "pom pom".
This feature is useful in <div-ref>
for creating groups of references that cannot be expressed in a single <div-ref>
Formal Definition
boolean (pattern true|1)
Used by: ~continuation-opt
Caution | |
---|---|
Any element taking |
The attribute div-type-ref
is used by class-2 files to point to one or more <div-type>
s in class-1 files. Permits multiple values separated by spaces.
Formal Definition
Used by: ~div-type-ref-cluster
, ~decl-supp-div-type
, ~decl-rename-div-n
Caution | |
---|---|
Every div type reference must be valid in every source |
Example 8.147. @div-type-ref
<declarations> <suppress-div-types src="fra" div-type-ref="sec"/> </declarations>
Note | |
---|---|
Taken from ar.cat.tan-a-div |
Example 8.148. @div-type-ref
<TAN-A-div TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ring01-TAN-A-ring02"> <head> ......... <declarations> <suppress-div-types src="eng-1790" div-type-ref="poem"/> <comment when="2016-02-22-05:00" who="park">The following token definition treats the following as words: sequences of letters, any individual character that is neither a letter nor a space (i.e., punctuation).</comment> <token-definition src="eng-us" regex="[-\w]+"/> <rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="e" new="4"/> </rename-div-ns> </declarations> ......... </head> <body> ......... <equate-div-types> <div-type-ref src="ger" div-type-ref="Zeile"/> <div-type-ref src="eng-uk" div-type-ref="line"/> </equate-div-types> ......... </body> </TAN-A-div>
Note | |
---|---|
Taken from ringoroses.div.1 |
The attribute new
provides the new name for an @n
that is to be renamed
Formal Definition
string (pattern (\w+|\d+-\d+)(\s+(\w+|\d+-\d+))*)
Used by: <rename>
Example 8.149. @new
<rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="e" new="4"/> </rename-div-ns>
Note | |
---|---|
Taken from ringoroses.div.1 |
Example 8.150. @new
<rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="5" new="4"/> </rename-div-ns>
Note | |
---|---|
Taken from ringoroses.01+03.token.2 |
The attribute old
provides the name of an @n
to be renamed
Formal Definition
string (pattern (\w+|\d+-\d+)(\s+(\w+|\d+-\d+))*)
Used by: <rename>
Caution | |
---|---|
|
Example 8.151. @old
<rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="e" new="4"/> </rename-div-ns>
Note | |
---|---|
Taken from ringoroses.div.1 |
Example 8.152. @old
<rename-div-ns src="ger" div-type-ref="Zeile"> <rename old="5" new="4"/> </rename-div-ns>
Note | |
---|---|
Taken from ringoroses.01+03.token.2 |
The attribute pos
lists one or more items, specified through Arabic numerals and the keyword 'last' or 'last-X' (where X is a valid number), joined with commas or hyphens.
Examples: '1', 'last', 'last-3 - last-1', '1, 3, 5, 7 - 11, last-8, last'
For more see the section called “@pos and @val”
Formal Definition
string (pattern ((last|max|all|\*)|((last|max)-\d+)|(\d+))(\s*-\s*(((last|max))|((last|max)-\d+)|(\d+)))?(\s*[, ]\s*(((last|max))|((last|max)-\d+)|(\d+))(\s+-\s+(((last|max))|((last|max)-\d+)|(\d+)))?)*|.*\?\?\?.*)string (pattern ((last|max)|((last|max)-\d+)|(\d+))|.*\?\?\?.*)
Used by: ~tok-regular
, ~tok-sequence-attr-core
Caution | |
---|---|
Sequences may not include values less than 1. |
Caution | |
---|---|
Sequences may not include values greater than the maximum allowed. |
Caution | |
---|---|
Sequences may not include ranges that go from a larger value to a smaller, e.g., 4 - 2. |
Example 8.153. @pos
<TAN-A-div TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ar.cat.tan-a-div:claims"> ......... <body claimant="lmp"> ......... <claim subject="andronicus boethus" adverb="perhaps" verb="omits" claim-basis="dexippus porphyry"> <locus work="grc"> <tok ref="1 a 2" pos="3-4"/> </locus> </claim> <claim subject="herminus comm-omnes" verb="agrees"> <locus work="grc"> <tok ref="1 a 2" pos="3-4"/> </locus> </claim> ......... <claim subject="B" verb="replaces"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> ......... </claim> <claim subject="Λ" adverb="perhaps" verb="replaces"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> ......... </claim> <claim subject="π α φ ο" verb="agrees"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> </claim> </body> </TAN-A-div>
Note | |
---|---|
Taken from ar.cat.tan-a-div.claims |
The attribute ref
lists references to one or more <div>
s. It consists of one or more simple references joined by commas or hyphens. A simple reference is a string value that points to a single <div>
.
It is assumed that any simple reference that has fewer @n
values than preceding simple references has been truncated. The abbreviated form will be checked before the form actually stated. For example, 1 1 - 3 will be interpreted first as 1 1 through 1 3; if that is invalid, it will be interpeted as 1 1 through 3. Examples: '2.4 - 7, 9', 'iv 7 - 9'
In a range with members of uneven depth, those <div>
s that are closest to the shallowest member are retrieved. For example, 2 - 3 2 2 might fetch 2, 3 1, 3 2 1, 3 2 2 (and not 3 or 3 1 1).
For more, see the section called “Class 2 Data Patterns (<body>)”
Formal Definition
string (pattern (\w+([^\w\-]\w+)*)(((\s*-\s*)|(\s*,\s+))(\w+([^\w\-]\w+)*))*|.*\?\?\?.*)string (pattern (\w+([^\w\-]\w+)*)|.*\?\?\?.*)
Used by: ~anchor-div-ref-item
, ~reanchor-div-ref-item
, ~simple-textual-reference
, ~claim-div-ref-item
, ~tok-regular
, ~tok-sequence-attr-core
Caution | |
---|---|
No single set of references may mix Roman numerals, alphabetic numerals, and numerals that are ambiguously either. |
Caution | |
---|---|
Every atomic reference in a |
Caution | |
---|---|
Every range in a |
Caution | |
---|---|
If |
Important | |
---|---|
A defective reference is a value of |
Example 8.154. @ref
<TAN-A-div TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ar.cat.tan-a-div:claims"> ......... <body claimant="lmp"> ......... <claim subject="andronicus boethus" adverb="perhaps" verb="omits" claim-basis="dexippus porphyry"> <locus work="grc"> <tok ref="1 a 2" pos="3-4"/> </locus> </claim> <claim subject="herminus comm-omnes" verb="agrees"> <locus work="grc"> <tok ref="1 a 2" pos="3-4"/> </locus> </claim> ......... <claim subject="B" verb="replaces"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> ......... </claim> <claim subject="Λ" adverb="perhaps" verb="replaces"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> ......... </claim> <claim subject="π α φ ο" verb="agrees"> <locus work="grc"> <tok ref="1 a 5" pos="1-2"/> </locus> </claim> </body> </TAN-A-div>
Note | |
---|---|
Taken from ar.cat.tan-a-div.claims |
The attribute src
refers to the ID of one or more <source>
s
The attribute src
refers to the ID of only one <source>
Formal Definition
NCName
Used by: ~div-type-ref-cluster
, ~split
, ~anchor-div-ref-item
, ~reanchor-div-ref-item
, ~simple-textual-reference
, ~complex-textual-reference-set
, ~decl-supp-div-type
, ~decl-rename-div-n
, ~tok-source-ref-opt
, ~tok-with-src-and-cont
, ~decl-tok-def
Caution | |
---|---|
Every idref in an attribute must point to the |
Caution | |
---|---|
All idrefs in an attribute must be unique. |
Caution | |
---|---|
Every atomic reference in a |
Caution | |
---|---|
Every range in a |
Example 8.155. @src
<TAN-A-div TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ar.cat.tan-a-div"> <head> ......... <declarations> <suppress-div-types src="fra" div-type-ref="sec"/> </declarations> ......... </head> <body> <split-leaf-div-at> <tok src="fra" ref="5 5" val="Ceci"/> <tok src="fra" ref="5 5" val="Il"/> <tok src="fra" ref="5 6" val="Si" pos="1"/> <tok src="fra" ref="5 12" val="Ainsi" pos="1, last"/> ......... </split-leaf-div-at> ......... </body> </TAN-A-div>
Note | |
---|---|
Taken from ar.cat.tan-a-div |
The attribute val
specifies a particular word token by means of its string value. Permits regular expressions.
For more see the section called “@pos and @val”
Formal Definition
string (pattern .+)
Used by: ~tok-regular
, ~tok-sequence-attr-core
Caution | |
---|---|
Attributes that take a regular expression must use escape sequences recognized by XML schema or TAN escape extensions (\k{}). See http://www.w3.org/TR/xmlschema-2/#regexs for details. |
Caution | |
---|---|
|
Important | |
---|---|
A |
Example 8.156. @val
<body> <split-leaf-div-at> <tok src="fra" ref="5 5" val="Ceci"/> <tok src="fra" ref="5 5" val="Il"/> <tok src="fra" ref="5 6" val="Si" pos="1"/> <tok src="fra" ref="5 12" val="Ainsi" pos="1, last"/> <tok src="fra" ref="5 12" val="Quant"/> ......... </split-leaf-div-at> ......... </body>
Note | |
---|---|
Taken from ar.cat.tan-a-div |