Table of Contents
This chapter presents ways to manage, create, edit, and share TAN files. The material discussed here is non-normative. That is, these are suggestions based upon the experience of TAN users.
Descriptions in this chapter are both brief and general. To understand better the
underlying framework, study the files in the subdirectory functions
, or
their reformatted versions in the chapter Chapter 11, TAN variables, keys, functions, and templates.
TAN can be downloaded from a master data repository listed at http://textalign.net/. The project has been developed using the version-control software Git. Whether you download the files directly or use Git, place the TAN code wherever is most convenient on your computer.
The TAN files you create may be set up in whatever structure you want. Because TAN files are meant to be shared and interlinked, it is beneficial to develop predictable directory structures. In the 2018 version of TAN, advice was given on how to organize directories and files. But experience with a variety of projects, each with their own needs and preferences, has shown that such advice is shortsighted. One point does still seem valid, however: keep your TAN libraries separate from the core TAN files.
Many TAN projects will find it necessary to work with dozens of versions of a
particular work, and it is easy to get confused as to what file does what. In
projects with many text versions, it is recommendad that your names for class-1 files
(the filename, not the @id
; see
the section called “Identifying TAN files: @id”) start with an acronym or short abbreviation for
the author and work, followed by the language code, the last name of the
editor/author of the scriptum, the date when the scriptum was created or published.
If you have multiple TAN files that refer to each other via <redivision>
, because each has
a different reference system, you may need to include that in the filename. Some examples:
ar.cat.grc.1949.minio-paluello.ref-logical.xml
(Aristotle's
Categories, in Greek, 1949, edition by Minio Paluello, following a reference
system based on semantic units [paragraphs, sentences, independent
clauses]).
apocr.eng.kjv.1760.xml
(apocrypha, English, King James
Version, 1760 edition)
tlg0059.tlg031.perseus-grc1-Pl.Ti.xml
(Plato's
Timaeus in Greek). This filename has some
duplication in that tlg0059
already implies Pl
and
tlg031
, Ti
, but only die-hard users of the
Thesaurus Linguae Graecae know the meaning of the numerical codes.
pl.ti.grc.1905.burnet.stephanus.xml
(Plato's
Timaeus in Greek). This filename is an alternative
way to construct the previous example.
Class-2 files are tougher. They together multiple files and concepts, so filenames could become very long or unpredictably structured, especially if trying to express which class-1 sources they use. At this time, the best recommendation is to make sure that each class-2 file is put into its own subdirectory, separate from class-1 files, and given a brief but meaningful name that points to the research question that motivated its creation. Some examples:
ar.cat.grc.1949.minio-paluello-sem-TAN-LM-sample.xml
(a
sample of lexico-morphological data for Aristotle's
Categories, in Greek)
nt.grc-syr.selections.TAN-A-tok.xml
(a selection of
word-for-word correspondences between the Syriac and Greek New
Testaments)
plato.general.TAN-A.xml
(a general alignment and annotation
file on Plato's works)
Class-3 filenames are a bit easier. It is recommended that TAN-mor files begin with the language code then an acronym for the person or group responsible for creating the features. TAN-voc files are written generally to serve a specific project or collection, so the collection name and the type of vocabulary should suffice. Examples:
eng.example.com,2014.1.xml
(tagging scheme #1 for English,
by the owner of the domain example.com
in 2014)
ar.cat.general.TAN-voc.xml
(general vocabulary items for a
project for Aristotle's Categories)
If you have a local copy of someone else's TAN collection, and you wish to create
TAN files that depend on them, you are in all likelihood going to use relative URLs
pointing to copies of the files stored on your local drive. It is recommended that
you also point to the master versions through absolute URLs in extra <location>
s. The validation routine
checks only the first document available. From time to time, you might comment out
the first <location>
and run
the validation process again. This will tell you if there have been any updates since
you last accessed the file. Or you should occasionally validate other TAN files you
have downloaded. If the <master-location>
is intact, you will be notified of any
updates.
In a given project, you are likely to repeat basic information, particularly
<person>
, <role>
, and <work>
. such as elements with the
the section called “IRI + name Pattern”, consider moving those to a project TAN-voc
file. It is almost always preferable to develop TAN-vocs before resorting to
<inclusion>
s.
<inclusion>
s are
powerful, but they can become quickly complex and confusing to navigate.
If you use an advanced XML editor such as oXygen, you can set up a project so that TAN validation files can be easily located and validation can be automatically applied. A sample oXygen project file is included within the TAN library to get you started. You may wish to create a copy of that project file for yourself before developing it.
TAN also includes select oXygen frameworks files, which provides editing tools
for oXygen's Author mode. The Author mode includes a variety of editing tools,
primarily for class-1 files. After opening the supplied Oxygen project file,
tan.xpr
, use Author mode to view at a sample TAN file and look at
the options in the menu, the toolbars, and the context-click menu, to see what is
possible.
Both the project file and the frameworks files are in their early infancy, and are therefore incomplete and imperfect. They have tremendous potential for development, slated for future versions of TAN.