Table of Contents
This chapter provides general background to the elements and attributes that are common to all TAN files. For more detailed discussion, see Chapter 12, TAN patterns, elements, and attributes defined.
This chapter does not discuss TAN catalog files, on which see the section called “TAN Catalog Files (collection
)”.
Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore opportunities for error, the clarity is critical. It is the only way at present to ensure that any person or algorithm that approaches the data can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: human- and computer-readable descriptions should comport. Any discrepancy signals a problem that should be checked.
Some metadata, such as that inside <comment>
or <change>
, are neither easily nor profitably translated
into a computer-actionable string. In such cases only the human-readable form is
required. Other metadata involve regular expressions (e.g., @pattern
) or ISO-compliant dates
(e.g., @when
), both of which
are well formed and are usually human-legible. Such data are not repeated,
although they may be explained via <desc>
or <comment>
.
Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI>
s followed by one or more
<name>
s then zero or
more <desc>
s. This is the core
pattern for nearly all TAN vocabulary items.
Some entities identified by the the section called “IRI + name pattern” will be digital resources. In those cases, the IRI + name pattern is extended.
There must be one or more <location>
s, with @href
and @accessed-when
, which signals where the resource is and when
it was last consulted. In validation, only the first document available will be
used. Extra <location>
s
might prove helpful for applications.
There may be an optional <checksum>
, to more accurately specify which version of a
file was consulted.
If the entity is a TAN file, then <IRI>
must be a valid tag URN that matches the @id
value of the TAN file being
referred to. Because there is only one @id
in a TAN file, any IRI + name pattern that points to it
will have only one <IRI>
. If the
entity is not a TAN file, then any IRI may be used, including its resolved
URL.
@accessed-when
states when a file was last accessed. During validation, the target file will be
checked. Any changes before that date will be ignored; those after will be
reported, normally as warnings. See the section called “TAN file versions”.
All these requirements may seem excessive, since in other formats (HTML, TEI),
to refer to another file one needs simply a link, via @href
or
@src
. But TAN files are meant to be valid long after their
creation, when @href
points to
broken links. An <IRI>
might
allow one to find a missing file. It also helps specify which file is intended.
Sometimes one file gets overwritten by a different one.
Most TAN elements allow for an optional edit stamp, an @ed-who
and an @ed-when
, stating who created or
edited the enclosed data and when. Neither attribute is allowed without the other.
@ed-when
is one of the
attributes that help determine a file's version. See the section called “TAN file versions”.
An edit stamp is much like a <change>
without a narrative. The attributes simply mark
the element where a change has been made. If a description of the alteration is
considered necessary, <change>
should be used.