TAN-LM-core elements and attributes summarized

The element ana contains a one or more assertions about the lexical or morphological properties of one or more tokens.

Claims within an <ana> are distributive. That is, every combination of <l> and <m> within an <lm> is asserted of every <tok>.

Formal Definition

   ((~ed-stamp?, ~inclusion) | 
      (~certainty-stamp, @group?, @xml:id?, 
         (<comment>* & ((<tok> | ~tok-sequence)+, <lm>+))))

Used by: ~item

[Caution]Caution

A <tok> may not duplicate any sibling <tok>.


The element l names a lexeme, by points to the main word entry in the lexicon defined by the element's inherited value of @lexicon. This element should not be used to point to roots, only to lexical headwords.

In many languages, especially those that are lightly inflected, this word will be identical to the word token itself. In those cases, <l> may be left empty, indicating that the value of <tok> is to be supplied.

Because there is no TAN format for lexicons, values in this element will not be validated.

Formal Definition

@lexicon?, @def-ref?, ~certainty-stamp, text

Used by: ~TAN-LM-item


The element lexicon names a lexicographical authority. This element is optional, because the lexical informat could be based upon the knowledge of the <agent>s who wrote the data.

Formal Definition

~ed-stamp?, 
   (~inclusion | 
      (@xml:id, <for-lang>*, (
         
            ((<IRI>+, ~metadata-human, <checksum>*, <location>+) | @which) | 
         
            ((<IRI>+, ~metadata-human) | @which))))

Used by: ~declaration-items



The element lm contains lexical or morphological data.

Claims within an <lm> are distributive. That is, every <l> is asserted against every <m> within an <lm> is asserted of every <tok>.

Formal Definition

~certainty-stamp, 
   (<comment>* & 
      ((<l>+, <m>*) | (<l>*, <m>+)))

Used by: <ana>


The element m carries a morphological code that conforms to the rules or patterns defined in the TAN-mor file upon which the data depends.

Codes are space-delimited. If a value of <m> violates the rules established by the TAN-mor file, an error will be generated. For more about how codes are built, and how they function, see the section called “Lexico-Morphology”.

Formal Definition

~certainty-stamp, @morphology?, string (pattern [^\+\s]+(\s+[^\+\s]+)*)

Used by: ~TAN-LM-item

[Caution]Caution

When using a category-based morphology, the number of feature codes in an <m> may not exceed the number of categories.

[Caution]Caution

Every feature code in an <m> must be found in the target morphology file.

[Caution]Caution

Every condition of a relevant <assert> (<report>) must be true (false) otherwise an error will be returned.

[Important]Important

Every condition of an uncertain but relevant <assert> (<report>) must be true (false) otherwise a warning will be returned.

The element morphology identifies a <TAN-mor> file that defines the parts of speech for a language, the codes for those parts, and the rules for combining them

Formal Definition

~ed-stamp?, 
   (~inclusion | 
      (@xml:id, <for-lang>*, (@which | 
         
            (@href | (<IRI>, ~metadata-human, <checksum>*, <location>+)))))

Used by: ~declaration-items



The element TAN-LM specifies that the file is a TAN file containing lexico-morphology data about a text. Root element.

Formal Definition

~TAN-root
[Important]Important

Every validated TAN file will include the following message at its root. This version of TAN is under development, and is subject to change. Participants in developing the TAN schemas, functions, and guidelines are welcome. See http://textalign.net for details.



The attribute def-ref identifies which definition is meant. This attribute is essential in cases where a lexicon has multiple entries for lexemes that are orthographically indistinguishable.

Because there is no TAN format for lexicons, the value in this attribute will not be validated.

Formal Definition

Used by: <l>

The attribute lexicon points to one or more <lexicon> or <agent> IDs

This attribute is inheritable. See the section called “Interpretation of inheritable attributes”

Formal Definition

Used by: ~other-body-attributes, ~lexeme

[Caution]Caution

Every idref in an attribute must point to the @xml:id value of the appropriate corresponding element.

[Caution]Caution

All idrefs in an attribute must be unique.



The attribute morphology points to one or more <morphology> IDs

This attribute is inheritable. See the section called “Interpretation of inheritable attributes”

Formal Definition

Used by: ~other-body-attributes, ~morph

[Caution]Caution

Every idref in an attribute must point to the @xml:id value of the appropriate corresponding element.

[Caution]Caution

All idrefs in an attribute must be unique.