Assumptions in the Creation of TAN Data

All creators and users of TAN files are expected to share few basic assumptions.

First, all TAN-compliant data is to be understood as largely derivative. That is, data files have no originality or creativity independent of their sources (but see below about interpretation). TAN-compliant data is to be created with intent of adhering as closely as possible to some model or archetype. For example, a transcription should replicate faithfully some earlier digital edition or text-bearing material object (e.g., stone, papyrus, manuscript, printed book for written text; audiovisual media for oral or performative texts). Morphological files and alignment files should describe as clearly and as reliably as possible their source transcriptions. In creating and publishing a TAN file you claim to have offered a good-faith representation or description of something; in using a TAN file, you hold the creator to that expectation.

Second, all core TAN files are interpretive. That is, they are permeated by editorial assumptions and opinions that might not be shared by everyone. If there is any originality or creativity in a TAN file, it is in that interpretive outlook. For example, if you edit a transcription file you must decide how to handle unusual letterforms and other visible marks. Your decisions will be informed by how you view the original text and its native writing system, and how you interpret and use Unicode. If you write an alignment file, you must make decisions about what factors caused one text to be transformed into another. Lexicomorphological files require you to commit to one or more grammars and dictionaries, and you must discern how best to handle cases of vagueness and ambiguity. As a general rule, the TAN classes go from least interpretive (class 1) to most (class 3). But no matter which class, no TAN data file ever stands completely outside the interpretive act. In creating and publishing a TAN file you claim to have disclosed as best you can the assumptions behind your interpretive outlook; in using a TAN file, you hold the creator to that expectation.

Third, all core TAN files are useful. That is, the interpretive impluse is assumed to be coupled with an equally strong desire to make the data as useful to as many users as possible, even those who may not share your assumptions or interpretation. A creator of a transcription file, for example, should normalize and segment texts with a minimum of idiosyncracies, adopting when possible reference systems that are widely used so as to optimize the alignment process. Morphological files should depend whenever possible upon commonly accepted grammars and lexica. Alignment files should work with comprehensible categories of text reuse. No TAN file will always be useful to everyone, but it should be as useful to as many as possible, as frequently as possible. In creating a TAN file you claim to use common, shared conventions whenever possible, and to note any departures; in using a TAN file, you hold the creator to that expectation.

There are other important assumptions that can and should be declared in a TAN file, and they are addressed in the course of these guidelines.