Chapter 8. Working with TAN Files

Chapter 8. Working with TAN Files
Prev	Part III. Working with the Text Alignment Network	Next

This chapter presents ways to manage, create, edit, and share TAN files. The material discussed here is non-normative. That is, these are suggestions based upon the experience of TAN users.

Descriptions in this chapter are both brief and general. To understand better the underlying framework, study the files in the subdirectory functions, or their reformatted versions in the chapter Chapter 11, TAN variables, keys, functions, and templates.

Local Setup

TAN can be downloaded from a master data repository listed at http://textalign.net/. The project has been developed using the version-control software Git. Whether you download the files directly or use Git, place the TAN code wherever is most convenient on your computer.

The TAN files you create may be set up in whatever structure you want. Because TAN files are meant to be shared and interlinked, it is beneficial to develop predictable directory structures. In the 2018 version of TAN, advice was given on how to organize directories and files. But experience with a variety of projects, each with their own needs and preferences, has shown that such advice is shortsighted. One point does still seem valid, however: keep your TAN libraries separate from the core TAN files.

Many TAN projects will find it necessary to work with dozens of versions of a particular work, and it is easy to get confused as to what file does what. In projects with many text versions, it is recommendad that your names for class-1 files (the filename, not the @id; see the section called “Identifying TAN files: @id”) start with an acronym or short abbreviation for the author and work, followed by the language code, the last name of the editor/author of the scriptum, the date when the scriptum was created or published. If you have multiple TAN files that refer to each other via <redivision>, because each has a different reference system, you may need to include that in the filename. Some examples:

ar.cat.grc.1949.minio-paluello.ref-logical.xml (Aristotle's Categories, in Greek, 1949, edition by Minio Paluello, following a reference system based on semantic units [paragraphs, sentences, independent clauses]).
apocr.eng.kjv.1760.xml (apocrypha, English, King James Version, 1760 edition)
tlg0059.tlg031.perseus-grc1-Pl.Ti.xml (Plato's Timaeus in Greek). This filename has some duplication in that tlg0059 already implies Pl and tlg031, Ti, but only die-hard users of the Thesaurus Linguae Graecae know the meaning of the numerical codes.
pl.ti.grc.1905.burnet.stephanus.xml (Plato's Timaeus in Greek). This filename is an alternative way to construct the previous example.

Class-2 files are tougher. They together multiple files and concepts, so filenames could become very long or unpredictably structured, especially if trying to express which class-1 sources they use. At this time, the best recommendation is to make sure that each class-2 file is put into its own subdirectory, separate from class-1 files, and given a brief but meaningful name that points to the research question that motivated its creation. Some examples:

ar.cat.grc.1949.minio-paluello-sem-TAN-LM-sample.xml (a sample of lexico-morphological data for Aristotle's Categories, in Greek)
nt.grc-syr.selections.TAN-A-tok.xml (a selection of word-for-word correspondences between the Syriac and Greek New Testaments)
plato.general.TAN-A.xml (a general alignment and annotation file on Plato's works)

Class-3 filenames are a bit easier. It is recommended that TAN-mor files begin with the language code then an acronym for the person or group responsible for creating the features. TAN-voc files are written generally to serve a specific project or collection, so the collection name and the type of vocabulary should suffice. Examples:

eng.example.com,2014.1.xml (tagging scheme #1 for English, by the owner of the domain example.com in 2014)
ar.cat.general.TAN-voc.xml (general vocabulary items for a project for Aristotle's Categories)

If you have a local copy of someone else's TAN collection, and you wish to create TAN files that depend on them, you are in all likelihood going to use relative URLs pointing to copies of the files stored on your local drive. It is recommended that you also point to the master versions through absolute URLs in extra <location>s. The validation routine checks only the first document available. From time to time, you might comment out the first <location> and run the validation process again. This will tell you if there have been any updates since you last accessed the file. Or you should occasionally validate other TAN files you have downloaded. If the <master-location> is intact, you will be notified of any updates.

In a given project, you are likely to repeat basic information, particularly <person>, <role>, and <work>. such as elements with the the section called “IRI + name Pattern”, consider moving those to a project TAN-voc file. It is almost always preferable to develop TAN-vocs before resorting to <inclusion>s. <inclusion>s are powerful, but they can become quickly complex and confusing to navigate.

Using TAN with Oxygen XML Editor

If you use an advanced XML editor such as oXygen, you can set up a project so that TAN validation files can be easily located and validation can be automatically applied. A sample oXygen project file is included within the TAN library to get you started. You may wish to create a copy of that project file for yourself before developing it.

TAN also includes select oXygen frameworks files, which provides editing tools for oXygen's Author mode. The Author mode includes a variety of editing tools, primarily for class-1 files. After opening the supplied Oxygen project file, tan.xpr, use Author mode to view at a sample TAN file and look at the options in the menu, the toolbars, and the context-click menu, to see what is possible.

Both the project file and the frameworks files are in their early infancy, and are therefore incomplete and imperfect. They have tremendous potential for development, slated for future versions of TAN.