Chapter 4. Patterns and Structures Common to All TAN Encoding Formats

Chapter 4. Patterns and Structures Common to All TAN Encoding Formats
Prev	Part II. Detailed Description	Next

This chapter provides general background to the elements and attributes that are common to all TAN files. For more detailed discussion, see Chapter 9, TAN patterns, elements, and attributes defined.

This chapter does not discuss TAN catalog files, on which see the section called “TAN Catalog Files (collection)”.

Common Patterns

IRI + name Pattern

Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore opportunities for error, the clarity is critical. It is the only way at present to ensure that any person or algorithm that approaches the data can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: human- and computer-readable descriptions should comport. Any discrepancy signals an error that should be checked.

Some metadata, such as that inside <comment> or <change>, are neither easily nor profitably translated into a computer-actionable string. In such cases only the human-readable form is required. Other metadata involve regular expressions (e.g., @pattern) or ISO-compliant dates (e.g., @when), both of which are well formed and are usually human-legible. Such data are not repeated, although they may be explained via <desc> or <comment>.

Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI>s and <name>s and zero or more <desc>s. This is the core pattern for nearly all TAN vocabulary items.

Digital Entity Metadata Pattern

Some entities identified by the the section called “IRI + name Pattern” will be digital resources. In those cases, the IRI + name Pattern is extended.

There must be one or more <location>s, with @href and @accessed-when, which signals where the resource is and when it was last consulted. In validation, only the first document available will be used. Extra <location>s might prove helpful for applications.

There may be an optional <checksum>, to more accurately specify which version of a file was consulted.

If the entity is a TAN file, then <IRI> (one and only one) must be a valid tag URN that matches the @id value of the TAN file being referred to. If the entity is not a TAN file, then any IRI may be used, including its resolved URL.

@accessed-when indicates when a file was last accessed. During validation, the target file will be checked. any changes before that date will be ignored, and any after will be reported, normally as warnings. See the section called “TAN file versions”.

All these requirements may seem excessive, since in other contexts (HTML, TEI), one needs simply a link, via @href or @src. TAN files are meant to be valid long after their creation, when @href point to broken links. An <IRI> might allow one to find a missing file, and it will also check, in case the original file has been deleted and another, with a different name, has taken its place.

Edit Stamp

Most TAN elements allow for an optional edit stamp, an @ed-who and an @ed-when, stating who created or edited the enclosed data and when. Neither attribute is allowed without the other.

@ed-when is one of the attributes that help determine a file's version. See the section called “TAN file versions”.

An edit stamp is much like a <change> without a description. The attributes simply mark the element where a change has been made. If a description of the alteration is considered necessary, <change> should be used, perhaps in addition to the edit stamp.

Prev	Up	Next
Core Technology	Home	Overall Structure