Chapter 4. Patterns and Structures Common to All TAN Encoding Formats

Table of Contents

Common Patterns
IRI + name Pattern
Digital Entity Metadata Pattern
Edit Stamp
Overall Structure (root)
@id and a TAN file's IRI Name
Metadata (<head>)
Rights and Licenses
Inclusions and Keys
Distinguishing <source>s and <see-also>s
Interpretation of inheritable attributes
Defining Words and Tokens

This chapter provides general background to the elements and attributes that are common to all TAN files. For detailed discussion see Chapter 8, TAN patterns, elements, and attributes defined.

Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore new opportunities for error, the clarity is critical. It is the only way at present to ensure that anyone who approaches the data—computer or human—can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: the human- and computer-readable descriptions should correspond. Any discrepancy is a signal that an error should be diagnosed and fixed.

Some metadata, such as comments, are neither easily nor profitably translated into a computer-actionable string. In such cases only the human-readable form is required. Other metadata use regular expressions or ISO-compliant dates, both of which are well formed and usually human-legible. In those cases the human- and computer-readable components are not distinguished and duplicated. In other cases, where a datum is not understandable to humans, such as a complex regular expression, a <comment> may be provided.

Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI> and <name> and zero or more <desc>s. If the thing being described is a digital file, then the IRI + name pattern is part of a larger pattern, the the section called “Digital Entity Metadata Pattern”.

Some entities identified by the the section called “IRI + name Pattern” will be digital resources. In those cases, the IRI + name Pattern is extended in two different ways, according to whether the entity is a TAN file or not.

If the entity is a TAN file, then <IRI> (one and only one) must be a valid tag URN that matches the @id value of the TAN file being referred to.

If the entity is not a TAN file, then any IRI may be used. If you choose to use the digital resource's URL as its name (as well as its location; see below), then it will be inferred that you mean to identify the digital resource that appeared at that URL at the date or time you accessed it.

In either case, the pattern adds to the IRI + name pattern one or more <location>s and an optional <checksum>.

Most TAN elements allow for an optional edit stamp, an @ed-who and an @ed-when, stating who created or edited the enclosed data and when. Neither attribute is allowed without the other.

@ed-when, along with @when and @when-accessed, are the attributes through which a TAN file's version is calculated. The latest date serves as the version number.

An edit stamp performs the same function as <change>, except that no description can be provided, and it points precisely to the element where a change has been made. If a description of the alteration is necessary, <change> should be used.