TAN is a representational format. Every TAN file models some source. If those
sources are non-digital, it is a relatively straightforward task to create and
populate a TAN file. Just start editing, using a template (e.g., a file from the
examples
directory). In some cases, you might benefit by starting
with an algorithm. For example, optical character recognition (OCR) on an edition
might give you a dirty but useful start for a TAN-T file. Applying OCR to a printed
index of quotations might be the first step to a TAN-A file. Despite the computer's
assistance, the majority of the task will be spent in correcting any conversions.
Thoughtful attention is needed to making these files suitable for use.
In many other cases, you want to take something that already exists digitally and
convert it into a TAN format. Many times, when you find a Word file, a web page, or a
plain text file that can serve as the basis for a TAN file, the first impulse is to
copy the desired content, paste it into the body of an new TAN file, then manually
adjust and correct it. That solution is quick and easy, but short-sighted. You may
find only hours into the task that you made a major mistake, but that it happened so
early in the process, you cannot backtrack. Perhaps you have accidentally deleted all
punctuation when you didn't mean to. Or you eliminated line breaks that you didn't
realize at the time were useful signals about where <div>
s should be separated.
Even if all goes well, after all that hard work you might discover that the pre-TAN data sources you started out with have been updated, and other things have been corrected. If any significant time has elapsed, you may have forgotten what procedure you followed to convert the data. And even if you do remember, you will have to repeat the steps again, and dread the day when those pre-TAN sources are updated yet again.
Save yourself time and hassle. Stop fixing files by hand. Instead, build a system to convert the files. Create an automated or semiautomated workflow that can be applied when needed, so that pre-TAN files can be channeled at will into your TAN library. This approach to the editorial task takes some extra investment at the outset, but in the long run it can save you many hours of labor.
A very useful utility is the section called “Body Builder”, which allows you to create a list of changes to be made to a particular document, to convert it to TAN-T or TAN-TEI (or even generic TEI). Or if you or a project member has experience in XSLT, develop your own stylesheets.
When you find mistakes such as those described above, no harm is done. You can simply adjust the Body Builder configuration or XSLT file and re-run your process, each time getting better and better results. This approach requires extra work, initially. Establishing a stable transformation process can be time-consuming, since it requires repeated sequences of trial, error, and diagnosis. But the investment pays off in the long run, especially if you are dealing with dozens, hundreds, or thousands of files. The routines you write for one set of files might be useful for the next.