Note | |
---|---|
This section is to be read in conjunction with Chapter 5, Class-1 TAN Files, Representations of Textual Objects (Scripta) and the section called “The Text Encoding Initiative”, which address some technical issues that relate to TAN-compliant TEI to XML and validation generally. |
Some creators and editors of transcriptions will find the rather stripped-down TAN-T format inadequate. Some may wish to mark up the text further, or already have a library of transcriptions whose annotations are desirable to keep, even if some users may not disinterested. To serve these needs, you should use TAN-TEI, an extension to the Text Encoding Intiative (TEI) format, which is well known for its expressiveness, its stability, its flexibility, and its widespread use in scholarship.
TEI was designed to be maximally expressive and flexible, to serve the detailed needs of humanities scholars. In serving this mission, TEI has come to define more than five hundred different element names, and more than two hundred attributes (roughly six times more than are defined in TAN). Of course, any given TEI file uses only a small subset of those elements and attributes, and TEI itself comes in different flavors, from TEI Lite, which uses only 75 attributes and 140 elements, to TEI All, which opens up almost the entire library.
Although the TEI format is oftentimes seen as a standard, it lacks some of the charactistics expected in a standard. It is greatly flexible, admits flavors and interpretation, and has been designed to encourage customization. Individuals and projects may define their own subset of TEI elements, to constrict or expand the allowable rules as they see fit. TAN-TEI is one of those customizations. The major difference is that TAN-TEI attempts to impose extra strictures not defined in TEI, to ensure that transcriptions are maximally likely to be interchangeable with other TAN files.
TAN's customization of the TEI can be summarized as follows (the default namespace
in this section is the TEI namespace,
http://www.tei-c.org/ns/1.0
):
Table 5.1. Synopsis of TAN-TEI customization
TEI element | summary of alteration |
---|---|
<TEI> | |
<text> |
|
<body> |
|
<div> |
Like all other TAN files, the root elements of TAN-TEI files must take an
@id
, the IRI name. See above,
the section called “Tag URNs”.
TAN-TEI files have two heads, which may strike you as odd. The TEI head and the
TAN head were designed for different purposes. Whereas the TAN <head>
is meant to be brief and
keyed to both IRIs and human-readable data, the <teiHeader>
has been
designed principally for human readability, and permits quite an expansive range of
metadata, and about matters that bear on the transcription only indirectly (e.g.,
manuscript descriptions).
Processors of TAN-TEI files will in general ignore the contents of
<teiHeader>
, since the contents are unpredictable. If your
<teiHeader>
has any kind of metadata relevant to TAN users, you
will need to adapt it for the standard TAN <head>
(see the section called “Metadata (<head>)” and the section called “Principles and Assumptions”). You may find that some of the material you
put in <teiHeader>
is not suitable for <head>
and vice versa. This conversion
needs to be performed manually, since the two headers are incommensurate, and writing
each one requires a different kind of outlook.
In a TAN-TEI file, the TAN <head>
must declare the TAN namespace to be its default, i.e.,
<head xmlns="tag:textalign.net,2015:ns">
or
<tan:head>
if the prefix tan:
has been defined in the
root element.
Within any leaf <div>
, you may
use whatever TEI markup you wish, to whatever level of depth or complexity. All users
of your TAN-TEI file will be interested in the text; only a subset will care about
any markup within leaf <div>
s. For
this reason, even if you change the value of @xml:lang
within a leaf <div>
, there is no guarantee that readers or processors of
your data will take it into account.
TAN-TEI should not be used to try to represent the physical appearance of the text on the object. Write a separate TEI (non-TAN) file first, and then use TAN-TEI to create a more normalized version.
You may need to prepare a TEI file to be TAN compliant. As a matter of practicality, it is helpful to envision the conversion process as falling in three steps:
Structure: insert new processing instructions (TAN-TEI validation files);
adjust root element by supplying IRI name to @id
, TAN namespace to
@xmlns:tan
.
Metadata: create new <head>
and populate it
Data: edit <body>
to
restrict the content to a single work; restructure <body>
content into nesting
<div>
s with correct
@type
and
@n
values.
It has been the experience of those who have made TEI to TAN-TEI conversions that
step 2 is the most time-consuming. The TAN <head>
requires one to more carefully curate the metadata
than does <teiHeader>
. But step 3 should not be overlooked, either.
Many people write TEI files with a focus on the original textual object, and they
make editorial decisions that look toward the scriptum and not the intertextual
ecosystem that TAN supports. It is advisable to trim from the body of your TEI file
any elements that would interfere with direct comparison with other versions of the
text in the TAN format.