Table of Contents
This chapter provides general background to the elements and attributes that are common to all TAN files. For detailed discussion see Chapter 8, TAN patterns, elements, and attributes defined.
Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore new opportunities for error, the clarity is critical. It is the only way at present to ensure that anyone who approaches the data—computer or human—can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: human- and computer-readable descriptions should correspond. Any discrepancy is a signal that an error should be diagnosed and fixed.
Some metadata, such as comments, are neither easily nor profitably translated
into a computer-actionable string. In such cases only the human-readable form is
required. Other metadata involve regular expressions or ISO-compliant dates, both
of which are well formed and are usually human-legible. In those cases the data is
not repeated. In cases where a datum is not understandable to humans, such as a
complex regular expression, a <comment>
may be provided.
Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI>
and <name>
and zero or more <desc>
s. If the thing being
described is a digital file, then the IRI + name pattern is part of a larger
pattern, the the section called “Digital Entity Metadata Pattern”.
Some entities identified by the the section called “IRI + name Pattern” will be digital resources. In those cases, the IRI + name Pattern is extended in two different ways, according to whether the entity is a TAN file or not.
If the entity is a TAN file, then <IRI>
(one and only one) must be a valid tag URN that
matches the @id
value of the TAN
file being referred to. This may seem excessive, since in other contexts (HTML,
TEI), one need only the @href
or @src
. This extra
measure has been introduced because TAN files are meant to be valid long after
their creation, when they may be separated from their original context, or when a
server no longer has the files referred to. Without the @id
value, recovering the referred to
file would be difficult or impossible; with it, easier, and perhaps
possible.
If the entity is not a TAN file, then any IRI may be used. If you choose to use the digital resource's URL as its name (and as its location; see below), then it will be inferred that you mean to identify the digital resource that appeared at that URL at the date or time you accessed it.
In either case, the pattern adds to the IRI + name pattern one or more
<location>
s and an
optional <checksum>
.
Most TAN elements allow for an optional edit stamp, an @ed-who
and an @ed-when
, stating who created or
edited the enclosed data and when. Neither attribute is allowed without the other.
@ed-when
, along with
@when
and @when-accessed
, are the
attributes through which a TAN file's version is calculated. The latest date
serves as the version number.
An edit stamp performs the same function as <change>
, except that no
description can be provided, and it points precisely to the element where a change
has been made. If a description of the alteration is necessary, <change>
should be used.