Table of Contents
This chapter provides general background to the elements and attributes that are common to all TAN files. For more detailed discussion, see Chapter 9, TAN patterns, elements, and attributes defined.
This chapter does not discuss TAN catalog files, on which see the section called “TAN Catalog Files (collection
)”.
Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore opportunities for error, the clarity is critical. It is the only way at present to ensure that any person or algorithm that approaches the data can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: human- and computer-readable descriptions should comport. Any discrepancy signals an error that should be checked.
Some metadata, such as that inside <comment>
or <change>
, are neither easily nor profitably translated
into a computer-actionable string. In such cases only the human-readable form is
required. Other metadata involve regular expressions (e.g., @pattern
) or ISO-compliant dates
(e.g., @when
), both of which
are well formed and are usually human-legible. Such data are not repeated,
although they may be explained via <desc>
or <comment>
.
Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI>
s and <name>
s and zero or more <desc>
s. This is the core pattern
for nearly all TAN vocabulary items.
Some entities identified by the the section called “IRI + name Pattern” will be digital resources. In those cases, the IRI + name Pattern is extended.
There must be one or more <location>
s, with @href
and @accessed-when
, which signals where the resource is and when
it was last consulted. In validation, only the first document available will be
used. Extra <location>
s
might prove helpful for applications.
There may be an optional <checksum>
, to more accurately specify which version of a
file was consulted.
If the entity is a TAN file, then <IRI>
(one and only one) must be a valid tag URN that
matches the @id
value of the TAN
file being referred to. If the entity is not a TAN file, then any IRI may be used,
including its resolved URL.
@accessed-when
indicates when a file was last accessed. During validation, the target file will
be checked. any changes before that date will be ignored, and any after will be
reported, normally as warnings. See the section called “TAN file versions”.
All these requirements may seem excessive, since in other contexts (HTML, TEI),
one needs simply a link, via @href
or @src
. TAN files
are meant to be valid long after their creation, when @href
point to broken links. An
<IRI>
might allow one to
find a missing file, and it will also check, in case the original file has been
deleted and another, with a different name, has taken its place.
Most TAN elements allow for an optional edit stamp, an @ed-who
and an @ed-when
, stating who created or
edited the enclosed data and when. Neither attribute is allowed without the other.
@ed-when
is one of the
attributes that help determine a file's version. See the section called “TAN file versions”.
An edit stamp is much like a <change>
without a description. The attributes simply mark
the element where a change has been made. If a description of the alteration is
considered necessary, <change>
should be used, perhaps in addition to the edit
stamp.