Chapter 4. Patterns and Structures Common to All TAN Encoding Formats

Chapter 4. Patterns and Structures Common to All TAN Encoding Formats
Prev	Part II. Detailed Description	Next

This chapter provides general background to the elements and attributes that are common to all TAN files. For detailed discussion see Chapter 8, TAN patterns, elements, and attributes defined.

Common Patterns

IRI + name Pattern

Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore new opportunities for error, the clarity is critical. It is the only way at present to ensure that anyone who approaches the data—computer or human—can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: human- and computer-readable descriptions should correspond. Any discrepancy is a signal that an error should be diagnosed and fixed.

Some metadata, such as comments, are neither easily nor profitably translated into a computer-actionable string. In such cases only the human-readable form is required. Other metadata involve regular expressions or ISO-compliant dates, both of which are well formed and are usually human-legible. In those cases the data is not repeated. In cases where a datum is not understandable to humans, such as a complex regular expression, a <comment> may be provided.

Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI> and <name> and zero or more <desc>s. If the thing being described is a digital file, then the IRI + name pattern is part of a larger pattern, the the section called “Digital Entity Metadata Pattern”.

Digital Entity Metadata Pattern

Some entities identified by the the section called “IRI + name Pattern” will be digital resources. In those cases, the IRI + name Pattern is extended in two different ways, according to whether the entity is a TAN file or not.

If the entity is a TAN file, then <IRI> (one and only one) must be a valid tag URN that matches the @id value of the TAN file being referred to. This may seem excessive, since in other contexts (HTML, TEI), one need only the @href or @src. This extra measure has been introduced because TAN files are meant to be valid long after their creation, when they may be separated from their original context, or when a server no longer has the files referred to. Without the @id value, recovering the referred to file would be difficult or impossible; with it, easier, and perhaps possible.

If the entity is not a TAN file, then any IRI may be used. If you choose to use the digital resource's URL as its name (and as its location; see below), then it will be inferred that you mean to identify the digital resource that appeared at that URL at the date or time you accessed it.

In either case, the pattern adds to the IRI + name pattern one or more <location>s and an optional <checksum>.

Edit Stamp

Most TAN elements allow for an optional edit stamp, an @ed-who and an @ed-when, stating who created or edited the enclosed data and when. Neither attribute is allowed without the other.

@ed-when, along with @when and @when-accessed, are the attributes through which a TAN file's version is calculated. The latest date serves as the version number.

An edit stamp performs the same function as <change>, except that no description can be provided, and it points precisely to the element where a change has been made. If a description of the alteration is necessary, <change> should be used.

Prev	Up	Next
Interpretation of multiple values	Home	Overall Structure (root)