Creating and Editing TAN Files

In beginning a new TAN file, it is almost always the best idea to begin with a copy of another TAN file that most resembles what you have in mind, and begin editing from there.

If you find that you are creating in a single collection numerous files that repeat basic information, such as elements with the the section called “IRI + name Pattern”, consider moving those to a TAN-key file. It is almost always preferable to use TAN-key as an including device before resorting to <inclusion>s, mainly because sorting out different lines of inclusion can become rather confusing.

Suppose you have a digital predecessor—a Word file, a web page, or plain text—that you intend to serve as the basis for a TAN file. Our first impulse is to find the content we want, copy it, and paste it into the body of our TAN file, and then begin to correct and change things. That is the most common way to convert any file, but it also means that if there are changes made to your source, you may have an enormous task ahead of you to figure out exactly what was changed where. Further, some transformations involve complex processes, and you may find, in the course of correcting the intermediary, that you made a major mistake that cannot, at that point be undone. Perhaps you have accidentally deleted all punctuation when you didn't mean to. Or you eliminated line breaks that were useful signals about where <div>s should be separated. Success might be equally bad. After converting data to the TAN format you may be given a new version of the pre-TAN data with errors corrected. If any significant time has elapsed since the last transformation, you may have forgotten what procedure you followed to convert the data.

For all these reason, it is recommended that when you first begin to import data into a new TAN file, you create an XSLT stylesheet, or the equivalent. Use that stylesheet to analyze and transform the digital source into data that is TAN compliant. As you find mistakes such as those described above, no harm is done. You can adjust your algorithm and re-run the process as many times as you need, each time getting better and better results. This approach takes extra work on the front end. That is, you will need to get to know XSLT (or the language you have chosen) well. Establishing the a good transformation process can be somewhat time-consuming. But it is an investment that pays off in the long run. What you write for one set of files might work for the next. The more alike the original files are, the easier it is to tailor the process.

If you begin to use stylesheets to create or populate your TAN files, it is almost always best to begin the process with a template TAN file that resembles, even if skeletally, the TAN file you want out. You would feed this TAN template along with the pre-TAN file into the stylesheet, which would generate a copy of the template infused with the pre-TAN data. Under this scenario, the stylesheet becomes an <agent> in its own right. You are encouraged to give your XSLT file a unique identifier, and to stamp the resultant TAN file with an <agent>, a <role>, and a <change> that documents the changes that were made.

The XSLT approach to creating and populating TAN files, described above, has been used successfully to handle not only historical documents but living ones as well, e.g., a working, evolving scholarly translation of ancient texts. In those situations, the traditional cut-paste-and-edit method is not only unproductive; it is foolish.

Such transformations may seem laborious at first, because of how difficult it is to think how how best to handle and manipulate a TAN file. But there is a good chance that the labor you have in mind has already been done for you in the built-in TAN functions (see next chapter).