The element ana
contains a one or more assertions about the lexical or morphological properties of one or more tokens.
Claims within an <ana>
are distributive. That is, every combination of <l>
and <m>
within an <lm>
is asserted of every <tok>
.
Formal Definition
((~ed-stamp
?,~inclusion
) | (~certainty-stamp
,@group
?,@xml:id
?, (<comment>
* & ((<tok>
|~tok-sequence
)+,<lm>
+))))
Used by: ~item
Example 8.186. <ana>
<TAN-LM TAN-version="1 dev" id="tag:kalvesmaki.com,2014:tan-t:ar.cat.grc.1949.minio-paluello:semantic-refs:TAN-LM:2016-04-05T07:07:40.033-04:00"> ......... <body lexicon="LSJ Lampe new" morphology="Perseus"> <ana> <tok ref="11 2 1 1" pos="1"/> <lm> ......... </lm> </ana> <ana> <tok ref="10 6 3 2" pos="4"/> <tok ref="10 6 3 3" pos="15"/> <tok ref="10 6 4 2" pos="37"/> ......... </ana> <ana> <tok ref="8 3 5 4" pos="6"/> <tok ref="8 3 7 3" pos="7"/> <lm> ......... </lm> </ana> <ana> <tok ref="7 1 2 1" pos="12"/> <tok ref="7 3 1 3" pos="22"/> <tok ref="7 3 1 3" pos="24"/> ......... </ana> <ana> ......... </ana> ......... </body> </TAN-LM>
Note | |
---|---|
The element l
names a lexeme, by points to the main word entry in the lexicon defined by the element's inherited value of @lexicon.
This element should not be used to point to roots, only to lexical headwords.
In many languages, especially those that are lightly inflected, this word will be identical to the word token itself. In those cases, <l>
may be left empty, indicating that the value of <tok>
is to be supplied.
Because there is no TAN format for lexicons, values in this element will not be validated.
Formal Definition
@lexicon
?,@def-ref
?,~certainty-stamp
, text
Used by: ~TAN-LM-item
Example 8.187. <l>
<TAN-LM TAN-version="1 dev" id="tag:kalvesmaki.com,2014:tan-t:ar.cat.grc.1949.minio-paluello:semantic-refs:TAN-LM:2016-04-05T07:07:40.033-04:00"> ......... <body lexicon="LSJ Lampe new" morphology="Perseus"> <ana> ......... <lm> <l>Δῆλος</l> <m>n e - s - - - f a -</m> </lm> </ana> <ana> ......... <lm> <l>Σωκράτης</l> <m>n e - s - - - m g -</m> </lm> </ana> <ana> ......... <lm> <l>αἰσχύνω</l> <m>v - - - a n p - - -</m> </lm> </ana> <ana> ......... <lm> <l>αἴσθησις</l> <m>n - - s - - - f n -</m> </lm> </ana> <ana> ......... <lm> <l>αἴσθησις</l> <m>n - - s - - - f g -</m> </lm> </ana> ......... </body> </TAN-LM>
Note | |
---|---|
The element lexicon
names a lexicographical authority. This element is optional, because the lexical informat could be based upon the knowledge of the <agent>
s who wrote the data.
Formal Definition
~ed-stamp
?, (~inclusion
| (@xml:id
,<for-lang>
*, ( ((<IRI>
+,~metadata-human
,<checksum>
*,<location>
+) |@which
) | ((<IRI>
+,~metadata-human
) |@which
))))
Used by: ~declaration-items
Example 8.188. <lexicon>
<head> ......... <declarations> <token-definition regex="[\w]+"/> <lexicon xml:id="LSJ"> <for-lang>grc</for-lang> <IRI>http://lccn.loc.gov/95032369</IRI> <name xml:lang="eng">Liddell-Scott-Jones, 9th ed. plus rev. supplement</name> </lexicon> <lexicon xml:id="Lampe"> <for-lang>grc</for-lang> <IRI>http://lccn.loc.gov/77372171</IRI> <name xml:lang="eng">G.H.W. Lampe, A Patristic Greek Lexicon, Oxford 1961.</name> </lexicon> <lexicon xml:id="new"> <for-lang>grc</for-lang> <IRI>urn:uuid:d6558d00-8f68-11e3-950a-425861b86ab6</IRI> <name xml:lang="eng">Lexicon generated from words in this document not to be found in any other lexicon.</name> </lexicon> <morphology xml:id="Perseus"> ......... </morphology> ......... </declarations> ......... </head>
Note | |
---|---|
Example 8.189. <lexicon>
<declarations> <morphology xml:id="penn" ed-when="2015-08-20-04:00" ed-who="park"> ......... </morphology> <lexicon xml:id="test"> <IRI>tag:kalvesmaki@gmail.com,2014:lexicon:eng:test</IRI> <name>test lexicon</name> </lexicon> <token-definition which="letters and punctuation"/> </declarations>
Note | |
---|---|
Taken from ring-o-roses.eng.1881.lm |
The element lm
contains lexical or morphological data.
Claims within an <lm>
are distributive. That is, every <l>
is asserted against every <m>
within an <lm>
is asserted of every <tok>
.
Formal Definition
~certainty-stamp
, (<comment>
* & ((<l>
+,<m>
*) | (<l>
*,<m>
+)))
Used by: <ana>
Example 8.190. <lm>
<TAN-LM TAN-version="1 dev" id="tag:kalvesmaki.com,2014:tan-t:ar.cat.grc.1949.minio-paluello:semantic-refs:TAN-LM:2016-04-05T07:07:40.033-04:00"> ......... <body lexicon="LSJ Lampe new" morphology="Perseus"> <ana> <tok ref="11 2 1 1" pos="1"/> <lm> <l>Δῆλος</l> <m>n e - s - - - f a -</m> </lm> </ana> <ana> ......... <tok ref="10 6 4 2" pos="37"/> <lm> <l>Σωκράτης</l> <m>n e - s - - - m g -</m> </lm> </ana> <ana> ......... <tok ref="8 3 7 3" pos="7"/> <lm> <l>αἰσχύνω</l> <m>v - - - a n p - - -</m> </lm> </ana> <ana> ......... <tok ref="7 4 9 2" pos="4"/> <lm> <l>αἴσθησις</l> <m>n - - s - - - f n -</m> </lm> </ana> ......... </body> </TAN-LM>
Note | |
---|---|
The element m
carries a morphological code that conforms to the rules or patterns defined in the TAN-mor file upon which the data depends.
Codes are space-delimited. If a value of <m>
violates the rules established by the TAN-mor file, an error will be generated. For more about how codes are built, and how they function, see the section called “Lexico-Morphology”.
Formal Definition
~certainty-stamp
,@morphology
?, string (pattern [^\+\s]+(\s+[^\+\s]+)*)
Used by: ~TAN-LM-item
Caution | |
---|---|
When using a category-based morphology, the number of feature codes in an |
Caution | |
---|---|
Every feature code in an |
The element morphology
identifies a <TAN-mor>
file that defines the parts of speech for a language, the codes for those parts, and the rules for combining them
Formal Definition
~ed-stamp
?, (~inclusion
| (@xml:id
,<for-lang>
*, (@which
| (@href
| (<IRI>
,~metadata-human
,<checksum>
*,<location>
+)))))
Used by: ~declaration-items
Example 8.191. <morphology>
<declarations> ......... <lexicon xml:id="new"> ......... </lexicon> <morphology xml:id="Perseus"> <for-lang>grc</for-lang> <IRI>tag:kalvesmaki.com,2014:tan-r-mor:grc:perseus</IRI> <name xml:lang="eng">Perseus Greek morphology</name> ......... </morphology> <group-type xml:id="status" which="status"/> </declarations>
Note | |
---|---|
Example 8.192. <morphology>
<declarations> <morphology xml:id="penn" ed-when="2015-08-20-04:00" ed-who="park"> <IRI>tag:kalvesmaki.com,2014:tan-r-mor:eng:penn</IRI> <name>Penn Treebank tag set</name> <location href="../TAN-mor/eng.kalvesmaki.com%2C2014.2.xml" when-accessed="2015-11-03-05:00"/> </morphology> <lexicon xml:id="test"> ......... </lexicon> ......... </declarations>
Note | |
---|---|
Taken from ring-o-roses.eng.1881.lm |
The element TAN-LM
specifies that the file is a TAN file containing lexico-morphology data about a text. Root element.
Formal Definition
~TAN-root
Important | |
---|---|
Every validated TAN file will include the following message at its root. “This version of TAN is under development, and is subject to change. Participants in developing the TAN schemas, functions, and guidelines are welcome. See http://textalign.net for details.” |
Example 8.193. <TAN-LM>
<TAN-LM TAN-version="1 dev" id="tag:kalvesmaki.com,2014:tan-t:ar.cat.grc.1949.minio-paluello:semantic-refs:TAN-LM:2016-04-05T07:07:40.033-04:00"> <head> ......... </head> <body lexicon="LSJ Lampe new" morphology="Perseus"> ......... </body> </TAN-LM>
Note | |
---|---|
Example 8.194. <TAN-LM>
<TAN-LM TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ring01-lm"> <head> ......... </head> <body lexicon="test" morphology="penn" in-progress="false"> ......... </body> </TAN-LM>
Note | |
---|---|
Taken from ring-o-roses.eng.1881.lm |
The attribute def-ref
identifies which definition is meant. This attribute is essential in cases where a lexicon has multiple entries for lexemes that are orthographically indistinguishable.
Because there is no TAN format for lexicons, the value in this attribute will not be validated.
Formal Definition
Used by: <l>
The attribute lexicon
points to one or more <lexicon>
or <agent>
IDs
This attribute is inheritable. See the section called “Interpretation of inheritable attributes”
Formal Definition
Used by: ~other-body-attributes
, ~lexeme
Caution | |
---|---|
Every idref in an attribute must point to the |
Caution | |
---|---|
All idrefs in an attribute must be unique. |
Example 8.195. @lexicon
<TAN-LM TAN-version="1 dev" id="tag:kalvesmaki.com,2014:tan-t:ar.cat.grc.1949.minio-paluello:semantic-refs:TAN-LM:2016-04-05T07:07:40.033-04:00"> <head> ......... </head> <body lexicon="LSJ Lampe new" morphology="Perseus"> <ana> ......... </ana> <ana> ......... </ana> <ana> ......... </ana> ......... </body> </TAN-LM>
Note | |
---|---|
Example 8.196. @lexicon
<TAN-LM TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ring01-lm"> <head> ......... </head> <body lexicon="test" morphology="penn" in-progress="false"> <ana> ......... <lm> <l lexicon="test">ring-a-ring-a-rose</l> <m>NNS ;</m> </lm> </ana> <ana> ......... </ana> <ana xml:id="anatest"> ......... </ana> ......... </body> </TAN-LM>
Note | |
---|---|
Taken from ring-o-roses.eng.1881.lm |
The attribute morphology
points to one or more <morphology>
IDs
This attribute is inheritable. See the section called “Interpretation of inheritable attributes”
Formal Definition
Used by: ~other-body-attributes
, ~morph
Caution | |
---|---|
Every idref in an attribute must point to the |
Caution | |
---|---|
All idrefs in an attribute must be unique. |
Example 8.197. @morphology
<TAN-LM TAN-version="1 dev" id="tag:kalvesmaki.com,2014:tan-t:ar.cat.grc.1949.minio-paluello:semantic-refs:TAN-LM:2016-04-05T07:07:40.033-04:00"> <head> ......... </head> <body lexicon="LSJ Lampe new" morphology="Perseus"> <ana> ......... </ana> <ana> ......... </ana> <ana> ......... </ana> ......... </body> </TAN-LM>
Note | |
---|---|
Example 8.198. @morphology
<TAN-LM TAN-version="1 dev" id="tag:parkj@textalign.net,2015:ring01-lm"> <head> ......... </head> <body lexicon="test" morphology="penn" in-progress="false"> <ana> ......... </ana> <ana> ......... </ana> <ana xml:id="anatest"> ......... </ana> ......... </body> </TAN-LM>
Note | |
---|---|
Taken from ring-o-roses.eng.1881.lm |