Note | |
---|---|
This section is to be read in conjunction with Chapter 5, Class-1 TAN Files, Representations of Textual Objects (Scripta) and the section called “The Text Encoding Initiative”, which address related technical issues. |
Some creators and editors of transcriptions will find the rather stripped-down TAN-T format inadequate. Some may wish to mark up the text further. Some may already have a library of transcriptions whose annotations are desirable to keep, even if uninteresting to most users. In these cases, you should use TAN-TEI, an extension to the Text Encoding Intiative (TEI) format, which is well known for its expressiveness, its stability, its flexibility, and its widespread use in scholarship.
TEI was designed to be maximally expressive and flexible, to serve the detailed needs of humanities scholars. In serving this mission, TEI has come to define more than five hundred different element names, and more than two hundred attributes (roughly six times more than are defined in TAN). Of course, any given TEI file uses only a small subset of those elements and attributes, and TEI itself comes in different flavors, from TEI Lite, which uses only 75 attributes and 140 elements, to TEI All, which opens up almost the entire library.
Although the TEI format is oftentimes seen as a standard, it lacks some of the charactistics one normally expects in a standard. It is very flexible, admits flavors and interpretation, and has been designed to encourage customization. Individuals and projects may define their own subset of TEI elements, to constrict or expand the allowable rules as they see fit. TAN-TEI is one of those customizations. The major difference is that TAN-TEI attempts to impose extra strictures not defined in TEI, to ensure that transcriptions are maximally likely to be interchangeable with other TAN-TEI files.
TAN's customization of the TEI can be summarized as follows (the default namespace
in this section is the TEI namespace,
http://www.tei-c.org/ns/1.0
):
Table 5.1. Synopsis of TAN-TEI customization
TEI element | summary of alteration |
---|---|
<TEI> | |
<text> |
|
<body> |
|
<div> |
Like all other TAN files, the root elements of TAN-TEI files must take an
@id
, the IRI name. See above,
the section called “Tag URNs”.
TAN-TEI files have two heads, which may strike you as odd. The TEI head and the
TAN head were designed for different purposes. Whereas the TAN <head>
is meant to be brief and
keyed to both IRIs and human-readable data, the <teiHeader>
permits
quite an expansive range of metadata, and about matters that bear only indirectly on
the transcription (e.g., manuscript descriptions). Further,
<teiHeader>
was designed to be read principally by humans.
Processors of TAN-TEI files will in general ignore the contents of
<teiHeader>
, since the contents are unpredictable. If your
<teiHeader>
has any kind of metadata relevant to TAN users, you
will need first to create a standard TAN <head>
(see the section called “Metadata (<head>)” and the section called “Principles and Assumptions”). This conversion needs to be performed
manually, since the two headers are incommensurate, and writing each one requires a
different kind of mentality.
In a TAN-TEI file, the TAN <head>
must take the TAN namespace, i.e., <head
xmlns="tag:textalign.net,2015:ns">
or <tan:head>
if the
prefix tan:
has been defined in the root element.
Within any leaf <div>
, you may
use whatever TEI markup you wish, to whatever level of depth or complexity. All users
of your TAN-TEI file will be interested in the text; only a subset will care about
any markup within leaf <div>
s. For
this reason, even if you change the value of @xml:lang
within a leaf <div>
, there is no guarantee that readers or processors of
your data will take it into account.
TAN-TEI should not be used to try to represent the physical appearance of the text on the object.
You may need to prepare a TEI file to be TAN compliant. As a matter of practicality, it is helpful to envision the conversion process as falling in three steps:
Structure: insert new processing instructions (TAN-TEI validation files);
adjust root element by supplying IRI name to @id
, TAN namespace to
@xmlns:tan
.
Metadata: create new <head>
and populate it
Data: edit <body>
to
restrict the content to a single work; restructure <body>
content into nesting
<div>
s with correct
@type
and
@n
values.
It has been the experience of those who have made TEI to TAN-TEI conversions that
step 2 is the most time-consuming. The TAN <head>
requires one to more carefully curate the metadata
than does <teiHeader>
. But step 3 should not be underestimated,
either. Many people write TEI files with a focus on the original textual object, and
they do not normalize to the level expected in a TAN file. In general, the more
simple the TEI file the better.