Converting to TAN from an irregular format can be a chore. Suppose you have a a
Word file, a web page, or plain text that you intend to serve as the basis for a TAN
file. A common first impulse is to copy the desired content, paste it into the body
of our TAN file, and then begin to manually correct and change things. Although this
is the most common approach, it means that if there are changes made to your source,
you may have an enormous task ahead of you to figure out exactly what was changed
where. Further, some transformations involve complex processes, and you may find, in
the course of correcting the intermediary, that you made a major mistake that cannot,
at that point be undone. Perhaps you have accidentally deleted all punctuation when
you didn't mean to. Or you eliminated line breaks that were useful signals about
where <div>
s should be separated.
Even if all goes well, after all that hard work you might be find out that the
pre-TAN data source has been updated, with errors corrected. If any significant time
has elapsed since the last transformation, you may have forgotten what procedure you
followed to convert the data. And if you remember, you have to repeat the steps
again, and plan for the next time when the pre-TAN source is updated.
For all these reason, it is recommended that data be converted to a TAN file by means of an XSLT stylesheet to analyze and transform the digital source into data that is TAN compliant. As you find mistakes such as those described above, no harm is done. You can adjust your algorithm and re-run the process as many times as you need, each time getting better and better results. This approach requires extra initial work. That is, you will need to get to know XSLT (or an alternative) well. Establishing a good transformation process can be time consuming. But the investment pays off in the long run. All or part of what you write for one set of files may work for the next.
Whether or not you use stylesheets to create or populate your TAN files, it is
almost always best to begin the process with a sample TAN file that resembles, even
if skeletally, your desired output, then populate it with the proper content. If you
feed the TAN template along with the pre-TAN data into a stylesheet, the stylesheet
becomes an <agent>
in its own
right. You are encouraged to give your XSLT file a unique identifier, and to stamp
the resultant TAN file with an <agent>
, a <role>
, and a <change>
that documents the changes that were made.
The XSLT approach to creating and populating TAN files, described above, has been used successfully to handle not only historical documents but living ones as well, e.g., a working, evolving scholarly translation of ancient texts. In those situations, where updates are made very frequently, the traditional cut-paste-and-edit method is not only unproductive; it is foolish.
Writing transformations may seem laborious at first, because of how difficult it is to think how how best to handle and manipulate a TAN file. But there is a good chance that the labor you have in mind has already been done for you in the built-in TAN functions (see Chapter 11, TAN variables, keys, functions, and templates).