Table of Contents
This chapter presents ways to manage, create, edit, and share TAN files. These suggestions, based upon the experience of users, are both brief and general. To get into specifics, read the other chapters in this part of the guidelines, as well as the appendixes.
The TAN suite can be downloaded from a master data repository listed at http://textalign.net/. The project has been developed using the version-control software Git. Whether you download the files directly or you use Git, place the TAN code wherever is most convenient on your computer. No extra steps are necessary. Once you've downloaded the files, you have everything you need.[18]
Unlike many other applications, you do not install the TAN suite, and you do not have to put it in a specific place on your local drive. There is no executable file in the suite. You will work with TAN through Oxygen, another XML editor, a text editor, or (if you are a power user) the command line.
You will be creating and editing TAN files. Those files may be set up in whatever directory structure you prefer. Because TAN files are part of a network, they are meant to be shared and interlinked. So it is beneficial to develop predictable directory structures. However you organize your TAN files, keep them separate from the suite of core TAN files.
Many TAN projects will involve dozens of versions of a particular work, and it is
easy to get confused as to what file does what. Naming files becomes a challenge (the
filename, not the @id
, on which see
the section called “Identifying TAN files: @id”). In projects with many text versions, it is
recommended that your names for class-1 files start with an acronym or short
abbreviation for the author and work, followed by the language code, the last name of
the editor/author of the scriptum, the date when the scriptum was created or
published. If you have a transcription that has been redivided into multiple TAN
files linked to each other via <redivision>
, the reference system might need to be
mentioned in the filename. Some suggestive examples:
ar.cat.grc.1949.minio-paluello.ref-logical.xml
: Aristotle's
Categories, in Greek, from the 1949 edition by Minio
Paluello, following a reference system based on semantic units (paragraphs,
sentences, independent clauses).
apocr.eng.kjv.1760.xml
: apocrypha, English, King James
Version, 1760 edition. If the file adopted an unusual reference system, that
would be important to include in the name.
tlg0059.tlg031.perseus-grc1-Pl.Ti.xml
: Plato's
Timaeus in Greek. This filename has some duplication
in that the catalog number tlg0059
already implies
Pl
and tlg031
, Ti
, but only an
elite few know the meaning of the numerical codes used by the Thesaurus
Linguae Graecae.
pl.ti.grc.1905.burnet.stephanus.xml
: Plato's
Timaeus in Greek, Burnet's 1905 edition divided into
a system that approximates Stephanus numbers.[19]
Some TAN applications, such as the section called “Diff+”, use filenames to order
output. If you wish your class-1 files to be read in chronological order according to
source, then it is a good practice to put the date in ISO form
(YYYY(-MM(-DD)?)?)
, placed before any alphabetizable elements that
are less important.
In sum, a good sequence for ordering components in a filename would be: collection, work, language/version, date, editor/author, reference system.
Class-2 files are tougher. They unite multiple files and concepts, so comprehensive filenames could become very long or unpredictably structured. One approach is to make sure that each class-2 file is given a brief but meaningful name that points to the research question that motivated its creation. Some examples:
ar.cat.grc.1949.minio-paluello-sem-TAN-LM-sample.xml
: a
sample of lexico-morphological data for Aristotle's
Categories, in Greek. Each source-specific TAN-A-lm
file has no more than one source, so including the source in the filename
does not pose a challenge.
nt.grc-syr.selections.TAN-A-tok.xml
: a selection of
word-for-word correspondences between the Syriac and Greek New
Testaments.
plato.general.TAN-A.xml
: a general alignment and annotation
file concerning Plato's works.
Class-3 filenames are a bit easier. It is recommended that TAN-mor files begin with the language code then an acronym for the person or group responsible for creating the rules and codes. TAN-voc files are written generally to serve a specific project or collection, so the collection name and the type of vocabulary should suffice. Examples:
eng.example.com,2014.1.xml
: tagging scheme #1 for English,
by the owner of the domain example.com
in 2014.
ar.cat.general.TAN-voc.xml
: general vocabulary items serving
a project dealing with Aristotle's Categories.
If you have a local copy of someone else's TAN collection, and you wish to create
TAN files that depend on them, you will in all likelihood use relative URLs pointing
to copies of the files stored locally. If those files have <master-location>
s pointing
to their master copies, you should occasionally validate them, to see if there have
been any updates.
If you need to move a TAN file from one directory to another, you should think
about any internal links that might need to be updated. A standard TAN utility, the section called “File Copier”, will copy a file for you and update any relative
values of @href
. That application
does not delete the old file, because file deletion is treated as a security risk in
XSLT.
[18] The one exception pertains to the output/js
directory, which
has Javascript libraries that are designed to handle certain types of output
from TAN applications. Documentation in a TAN application will let you know
what Javascript dependencies are required.
[19] Many classicists refer to Stephanus numbers in Plato's corpus and Bekker numbers in Aristotle's as canonical, as if the systems are immutable and unambiguous. But any edition that claims to follow Stephanus or Bekker numbers always makes slight adjustments to that system. Words do not always break exactly where they do in the 19th-century edition, and words and phrases here and there get transposed, inserted, or deleted, inevitably throwing off the lineation. Making one's edition conform exactly to the original line numbers is frequently a fool's errand.