Table of Contents
In this chapter we discuss ways to manage, create, edit, and share TAN files. The material discussed here is non-normative. That is, these are suggestions based upon the experience of TAN users.
TAN files may be set up in any kind of structure one wishes, but because those
files are meant to be shared and interlinked, it is beneficial to use similar local
conventions, so that relative URLs remain intact from one person's system to another.
It is especially important that collections be able to "talk" to each other via local
URLs in @href
, so it is a good
idea to name collection subdirectories as predictably as possible.
Below is one way to organize the subdirectories of a typical setup for local TAN work:
library-
[abbreviated name of creator 1]
[abbreviated name of collection 1]—TAN-T(EI) files here
TAN-A-div
(for TAN-A-div files)
TAN-A-tok
(for TAN-A-tok files)
[etc.]
[abbreviated name of collection 2]
[etc.]
library-
[abbreviated name of creator 2]
output
—saved results from transformations, tests
pre-TAN
—third-party files to be used to populate TAN files,
or to be converted into them
TAN-2018
—the core TAN files, downloaded from the website
or the Git repository
stylesheets
—stylesheets you have created
tools
—third-party tools
Under this approach, you create a library subdirectory for each provider or creator (including one for yourself). For any TAN corpus you publish, you should advise what name should be used for the library subdirectory. Likewise, for any TAN corpus you download, you should use the library name suggested by the provider.
Any time you create or download a collection of TAN files, you save them in a subdirectory within the creator's library subdirectory. Once again, you should advise on the name to be used, and use the names that are advised.
If you use Git, it is advisable to make each collection its own Git repository. If you use GitHub, it is advisable to use your username for the library subdirectory.
This two-step approach to subdirectories anticipates cases where different people
will want to encode the same body of texts, particularly heavily quoted collections
that will commonly be given very brief, descriptive names, e.g., bible
,
quran
.
When you name class 1 files (the filename, not the IRI name; see the section called “@id and a TAN file's IRI Name”), it is a good idea to start with an acronym or abbreviation for the work, followed by the language code, the editor's last name, the date when the source scriptum was created or published. If a work lends itself to multiple reference schemes, you may need to include that in the filename. Some examples:
ar.cat.grc.1949.minio-paluello-sem.xml
(Aristotle's
Categories, in Greek, 1949, edition by Minio Paluello, following a reference
system based on semantic units [paragraphs, sentences, independent
clauses]).
apocr.eng.kjv.1760.xml
(apocrypha, English, King James
Version, 1760 edition)
tlg0059.tlg031.perseus-grc1-Pl.Ti.xml
(Plato's Timaeus in
Greek)
Class 2 files are tougher. Because they bring two or more files or concepts together, filenames could become very long or unpredictably structured. At this time, the best recommendation is to make sure that each class 2 file is put into a subdirectory, separate from class 1 files, given a brief but meaningful name that points to the research question that motivated its creation. Some examples:
ar.cat.grc.1949.minio-paluello-sem-TAN-LM-sample.xml
(lexico-morphology for Aristotle's Categories, in Greek)
nt.grc-syr.selections.TAN-A-tok.xml
(word-for-word
correspondences between the Syriac and Greek New Testaments)
plato.TAN-A-div.xml
Class 3 are a bit easier. It is recommended that TAN-mor files begin with the language code then an acronym for the person or group responsible for creating the features. TAN-key files are written generally to serve a specific project or collection, so the collection name and the TAN type should suffice. Examples:
ar.cat.TAN-key.xml
eng.kalvesmaki.com,2014.1.xml
(tagging scheme #1 for
English)
If you have a local copy of someone else's TAN collection, and you wish to create
TAN files that depend on them, you are in all likelihood going to use relative URLs
to copies of the files stored on your local drive. It is recommended that you also
include absolute URL through secondary <location>
s. The validation routine checks only the first
document available. From time to time, you might comment out the first <location>
and run the validation
process again. If you share your dependent TAN file with someone else who does not
have a local copy of the collection, the second <location>
, with the absolute URL,
will point to the original copy of the document.
In a given project, you are likely to repeat basic information, particularly
<person>
, <role>
, and <work>
. such as elements with the
the section called “IRI + name Pattern”, consider moving those to a TAN-key file.
It is almost always preferable to develop TAN-keys before resorting to <inclusion>
s. Sorting out lines of
inclusion can be confusing.