<TAN-A>
)TAN-A is the format for macroscopic, division-based alignment and annotations. It
is dedicated to aligning any number of versions of any number of works on the basis
of <div>
s in its sources. The A
also stands for annotations, because the TAN-A format allows you to make general
assertions, usually but not necessarily about texts. TAN-A is a type of advanced RDF
for textual scholarship (see the section called “Resource Description Framework (RDF) and Linked Open Data”).
The root element of a TAN division-based alignment file is <TAN-A>
.
TAN-A's <head>
has zero or
more <source>
s.
Any concepts that will be mentioned in the <claim>
s (the only children of <body>
) need to be supplied in
<vocabulary-key>
.
<body>
)The <body>
of a TAN-A file
takes, in addition to the customary optional attributes (see the section called “Edit Stamp”), @claimant
, @object
, @subject
, or @verb
, stipulating the default values for the enclosed
claims.
The rest of the body consists of zero or more <claim>
s, each of which
represents one or more claims. Claims can be used for a variety of purposes,
e.g.,:
to list quotations and allusions;
to indicate which passages deal with what general subjects and topics;
to connect commentary or notes from one source to another;
to indicate where other scripta have different readings (apparatus criticus).
<claim>
's data model is
inspired by the Resource Description Framework (RDF; see the section called “Resource Description Framework (RDF) and Linked Open Data”), where each statement consists of three items termed a
subject, a predicate, and an object. The first and third are thought of as nodes,
and the second as a connector (or edge) between the nodes. RDF follows a graph
model, where the connector (edge) always links exactly two nodes.
RDF is adequate for but a limited range of scholarly assertions. An RDF statement lacks context or qualifiers. No RDF statement can indicate who made the assertion, or when, or if it was uttered with any doubt or nuance. Sometimes we wish to claim a bare negation, e.g., "Aristotle was not the author of De mundo"—which cannot be expressed in RDF.
TAN's <claim>
extends
the graph RDF model into a hypergraph, where the connector (edge) links two or
more nodes. The following adjustments are made:
Every claim must have at least one claimant, some person, organization, or algorithm to be credited/blamed for the assertion.
Every claim must have at least one subject, the topic of the claim.
Every claim must have at least one verb (in RDF called predicate), specifying something about the subject.
Every claim may have at least one adverb, qualifying the verb.
Every claim may assert a level or range of certainty, between zero and one, reflecting how certain the claimant is of the claim.
Every claim may have at least one object, an entity or value expected by the verb.
Every claim may have at least one temporal qualifier, restricting the claim to a specific time.
Every claim may have at least one locative qualifier, restricting the claim to a specific geographical region.
Every claim may have other components, if so defined by the verb.
Currently, this entails for select verbs a language qualifier
(@in-lang
,
<in-lang>
) and a reference qualifier (<at-ref>
).
Items 1-3 above are required parts of any claim. Items 4-9 may be
rendered as being required, optional, or disallowed by a <verb>
's definition. For example,
a <verb>
representing an
idea that in normal discourse is intransitive (e.g., sleep) can be defined such
that <object>
is not
allowed.
Furthermore, a <verb>
may be defined to restrict what kinds of objects or subjects are allowed. For
example, the standard TAN verb lacks_text_at
(see
vocabularies/verbs.TAN-voc.xml
) is defined to allow only scripta
as a subject. An object is not allowed. A <claim>
with this verb expects one or more <at-ref>
s, which restricts the
claim to a particular passage in a TAN-T file. A <verb>
can specify that an object
must be data, and it can also define the type of data allowed and its permitted
lexical form.
Claims may refer to other claims. That is, <claim>
s can nest inside each other (e.g., X claims that Y
claims that Z claims that...). Or a <claim>
may take an @xml:id
, whose value can then be cited as the object or
subject of any other <claim>
.
If a <claim>
is about a
work or source in general, as a whole, one or more IDrefs may be placed in
@subject
or
@object
. But if the
claim is about a specific part of the textual object, then more information is
needed, so the attributes cannot be used.
Such textual references come in three flavors: assertions pertaining to a work,
assertions pertaining to a work in only some versions, and assertions pertaining
to scripta. In the first case, <subject>
or <object>
must take @work
, with IDrefs pointing to vocabulary items for
<work>
s. In the
second case, @src
is used,
pointing by IDref to the applicable <source>
s. In the third case @scriptum
is used, pointing to
vocabulary items for <scriptum>
. Remember, you may combine commonly grouped
IDrefs in an <alias>
.
A @work
means that the claim
applies to any versions of the work, whether a source or not; a @src
specifies that the claim applies
only to the specific <source>
. In each case, <subject>
or <object>
may be given more attributes and elements to
restrict the claim to specific parts of the work or source, with @ref
, <tok>
, @val
, @pos
, and @chars
, following the conventions
used in pointing to parts of texts (see the section called “Class 2 Pointer Syntax: Referencing Texts”).
If a <subject>
or
<object>
points via
@scriptum
to a
scriptum, specifying the claim necessarily takes a different approach than that
used for @work
or @src
. Bear in mind, it is encouraged
in these guidelines to avoid scriptum-oriented methods of dividing class 1 files.
Therefore, clarifying a portion of a scriptum (e.g., a particular manuscript folio
number) requires an apparatus that likely does not correspond to a TAN file.
Therefore, a a <subject>
or
<object>
with a
@scriptum
can be
restricted through descendant <div>
s that specify via @n
and @type
a specific region on the scriptum. These scriptum filters,
unlike TAN-T <div>
s, are always
empty; their sole purpose is to point in native terms to a specific region on a
scriptum.
Multiple values in any component of a <claim>
are distributed, which means that one <claim>
might contain multiple
assertions. For example, <claim subject="A B" verb="taught promoted"
object="X Y Z"/>
has within it twelve claims (the combinatory
permutations of the three attributes' individual values). The exception to this
general rule is @adverb
,
whose multiple values are taken as ampliative and restrictive. For example,
<claim subject="A" adverb="probably not" verb="taught"
object="X"/>
is a single claim, not two, even though @adverb
has two values.
A limited set of verbs have been defined in standard TAN vocabulary; see the section called “TAN keywords for verbs (<verb>)”. The strictures defined in these verbs are checked during Schematron validation. For a brief discussion on defining your own verbs in a TAN-voc file see the section called “Data (<body>)”.