Chapter 4. Common patterns and structures

Chapter 4. Common patterns and structures
Prev	Part II. Detailed description	Next

This chapter provides general background to the elements and attributes that are common to all TAN files. For more detailed discussion, see Chapter 12, TAN patterns, elements, and attributes defined.

This chapter does not discuss TAN catalog files, on which see the section called “TAN Catalog Files (collection)”.

Common patterns

IRI + name pattern

Both humans and computers need to read and write TAN metadata. Very often what is readable to humans is unreadable to computers, and vice versa. So the TAN format requires that all metadata be provided whenever possible in both forms. Although this rule may appear to introduce redundancy and therefore opportunities for error, the clarity is critical. It is the only way at present to ensure that any person or algorithm that approaches the data can parse and use it. In addition, doubly expressed metadata provides a safeguard much like a checksum: human- and computer-readable descriptions should comport. Any discrepancy signals a problem that should be checked.

Some metadata, such as that inside <comment> or <change>, are neither easily nor profitably translated into a computer-actionable string. In such cases only the human-readable form is required. Other metadata involve regular expressions (e.g., @pattern) or ISO-compliant dates (e.g., @when), both of which are well formed and are usually human-legible. Such data are not repeated, although they may be explained via <desc> or <comment>.

Those exceptions aside, all other metadata takes what is called the IRI + name pattern: one or more <IRI>s followed by one or more <name>s then zero or more <desc>s. This is the core pattern for nearly all TAN vocabulary items.

Digital entity metadata pattern

Some entities identified by the the section called “IRI + name pattern” will be digital resources. In those cases, the IRI + name pattern is extended.

There must be one or more <location>s, with @href and @accessed-when, which signals where the resource is and when it was last consulted. In validation, only the first document available will be used. Extra <location>s might prove helpful for applications.

There may be an optional <checksum>, to more accurately specify which version of a file was consulted.

If the entity is a TAN file, then <IRI> must be a valid tag URN that matches the @id value of the TAN file being referred to. Because there is only one @id in a TAN file, any IRI + name pattern that points to it will have only one <IRI>. If the entity is not a TAN file, then any IRI may be used, including its resolved URL.

@accessed-when states when a file was last accessed. During validation, the target file will be checked. Any changes before that date will be ignored; those after will be reported, normally as warnings. See the section called “TAN file versions”.

All these requirements may seem excessive, since in other formats (HTML, TEI), to refer to another file one needs simply a link, via @href or @src. But TAN files are meant to be valid long after their creation, when @href points to broken links. An <IRI> might allow one to find a missing file. It also helps specify which file is intended. Sometimes one file gets overwritten by a different one.

Edit stamp

Most TAN elements allow for an optional edit stamp, an @ed-who and an @ed-when, stating who created or edited the enclosed data and when. Neither attribute is allowed without the other.

@ed-when is one of the attributes that help determine a file's version. See the section called “TAN file versions”.

An edit stamp is much like a <change> without a narrative. The attributes simply mark the element where a change has been made. If a description of the alteration is considered necessary, <change> should be used.

Prev	Up	Next
Core technology	Home	Overall structure