Division-Based Annotations and Alignments (<TAN-A>)

Division-Based Annotations and Alignments (<TAN-A>)
Prev	Chapter 6. Class-2 TAN Files, Annotations of Texts	Next

Division-Based Annotations and Alignments (`<TAN-A>`)

TAN-A is the format for macroscopic, division-based alignment and annotations. It is dedicated to aligning any number of versions of any number of works on the basis of <div>s in its sources. The A also stands for annotations, because the TAN-A format allows you to make general assertions, usually but not necessarily about texts. TAN-A is a type of advanced RDF for textual scholarship (see the section called “Resource Description Framework (RDF) and Linked Open Data”).

Root Element and Header

The root element of a TAN division-based alignment file is <TAN-A>.

TAN-A's <head> has zero or more <source>s.

Any concepts that will be mentioned in the <claim>s (the only children of <body>) need to be supplied in <vocabulary-key>.

Data (`<body>`)

The <body> of a TAN-A file takes, in addition to the customary optional attributes (see the section called “Edit Stamp”), @claimant, @object, @subject, or @verb, stipulating the default values for the enclosed claims.

The rest of the body consists of zero or more <claim>s, each of which represents one or more claims. Claims can be used for a variety of purposes, e.g.,:

to list quotations and allusions;
to indicate which passages deal with what general subjects and topics;
to connect commentary or notes from one source to another;
to indicate where other scripta have different readings (apparatus criticus).

<claim>'s data model is inspired by the Resource Description Framework (RDF; see the section called “Resource Description Framework (RDF) and Linked Open Data”), where each statement consists of three items termed a subject, a predicate, and an object. The first and third are thought of as nodes, and the second as a connector (or edge) between the nodes. RDF follows a graph model, where the connector (edge) always links exactly two nodes.

RDF is adequate for but a limited range of scholarly assertions. An RDF statement lacks context or qualifiers. No RDF statement can indicate who made the assertion, or when, or if it was uttered with any doubt or nuance. Sometimes we wish to claim a bare negation, e.g., "Aristotle was not the author of De mundo"—which cannot be expressed in RDF.

TAN's <claim> extends the graph RDF model into a hypergraph, where the connector (edge) links two or more nodes. The following adjustments are made:

Every claim must have at least one claimant, some person, organization, or algorithm to be credited/blamed for the assertion.
Every claim must have at least one subject, the topic of the claim.
Every claim must have at least one verb (in RDF called predicate), specifying something about the subject.
Every claim may have at least one adverb, qualifying the verb.
Every claim may assert a level or range of certainty, between zero and one, reflecting how certain the claimant is of the claim.
Every claim may have at least one object, an entity or value expected by the verb.
Every claim may have at least one temporal qualifier, restricting the claim to a specific time.
Every claim may have at least one locative qualifier, restricting the claim to a specific geographical region.
Every claim may have other components, if so defined by the verb. Currently, this entails for select verbs a language qualifier (@in-lang, <in-lang>) and a reference qualifier (<at-ref>).

Items 1-3 above are required parts of any claim. Items 4-9 may be rendered as being required, optional, or disallowed by a <verb>'s definition. For example, a <verb> representing an idea that in normal discourse is intransitive (e.g., sleep) can be defined such that <object> is not allowed.

Furthermore, a <verb> may be defined to restrict what kinds of objects or subjects are allowed. For example, the standard TAN verb lacks_text_at (see vocabularies/verbs.TAN-voc.xml) is defined to allow only scripta as a subject. An object is not allowed. A <claim> with this verb expects one or more <at-ref>s, which restricts the claim to a particular passage in a TAN-T file. A <verb> can specify that an object must be data, and it can also define the type of data allowed and its permitted lexical form.

Claims may refer to other claims. That is, <claim>s can nest inside each other (e.g., X claims that Y claims that Z claims that...). Or a <claim> may take an @xml:id, whose value can then be cited as the object or subject of any other <claim>.

If a <claim> is about a work or source in general, as a whole, one or more IDrefs may be placed in @subject or @object. But if the claim is about a specific part of the textual object, then more information is needed, so the attributes cannot be used.

Such textual references come in three flavors: assertions pertaining to a work, assertions pertaining to a work in only some versions, and assertions pertaining to scripta. In the first case, <subject> or <object> must take @work, with IDrefs pointing to vocabulary items for <work>s. In the second case, @src is used, pointing by IDref to the applicable <source>s. In the third case @scriptum is used, pointing to vocabulary items for <scriptum>. Remember, you may combine commonly grouped IDrefs in an <alias>.

A @work means that the claim applies to any versions of the work, whether a source or not; a @src specifies that the claim applies only to the specific <source>. In each case, <subject> or <object> may be given more attributes and elements to restrict the claim to specific parts of the work or source, with @ref, <tok>, @val, @pos, and @chars, following the conventions used in pointing to parts of texts (see the section called “Class 2 Pointer Syntax: Referencing Texts”).

If a <subject> or <object> points via @scriptum to a scriptum, specifying the claim necessarily takes a different approach than that used for @work or @src. Bear in mind, it is encouraged in these guidelines to avoid scriptum-oriented methods of dividing class 1 files. Therefore, clarifying a portion of a scriptum (e.g., a particular manuscript folio number) requires an apparatus that likely does not correspond to a TAN file. Therefore, a a <subject> or <object> with a @scriptum can be restricted through descendant <div>s that specify via @n and @type a specific region on the scriptum. These scriptum filters, unlike TAN-T <div>s, are always empty; their sole purpose is to point in native terms to a specific region on a scriptum.

Multiple values in any component of a <claim> are distributed, which means that one <claim> might contain multiple assertions. For example, <claim subject="A B" verb="taught promoted" object="X Y Z"/> has within it twelve claims (the combinatory permutations of the three attributes' individual values). The exception to this general rule is @adverb, whose multiple values are taken as ampliative and restrictive. For example, <claim subject="A" adverb="probably not" verb="taught" object="X"/> is a single claim, not two, even though @adverb has two values.

A limited set of verbs have been defined in standard TAN vocabulary; see the section called “TAN keywords for verbs (<verb>)”. The strictures defined in these verbs are checked during Schematron validation. For a brief discussion on defining your own verbs in a TAN-voc file see the section called “Data (<body>)”.