Assumptions in the creation of TAN data

Assumptions in the creation of TAN data
Prev	Chapter 3. General underpinnings	Next

Assumptions in the creation of TAN data

All creators and users of TAN files are expected to share few basic assumptions.

First, all TAN-compliant data is to be understood as largely derivative. That is, data files express no originality or creativity independent of their sources (but see below about interpretation). A TAN file should be created with the intent of adhering as closely as possible to some model or archetype. For example, a transcription is assumed to replicate faithfully some earlier digital edition or text-bearing material object (e.g., stone, papyrus, manuscript, printed book for written text; audiovisual media for oral or performative texts). Morphological files and alignment files should describe as clearly and as reliably as possible their source transcriptions. In creating and publishing a TAN file you claim to have offered a good-faith representation or description of something; in using a TAN file, you hold the creator to that expectation.

Second, all core TAN files are interpretive. That is, they are permeated by editorial assumptions and opinions that might not be shared by everyone. If there is any resemblance of originality or creativity in a TAN file it is in that interpretive outlook. For example, if you edit a transcription file you must decide how to handle unusual letterforms and other visible marks. Your decisions will be influenced by your perspective on the original text and its native writing system, and how you interpret and use Unicode. If you write an alignment file, you must make decisions about what factors caused one text to be transformed into another. Lexicomorphological files require you to commit to one or more grammars and dictionaries, which adopt certain perspectives on language, and you must discern how best to handle cases of vagueness and ambiguity. No TAN file ever stands completely outside the interpretive act. In creating and publishing a TAN file you claim to have disclosed as best you can the assumptions behind your interpretive outlook; in using a TAN file, you hold the creator to that expectation.

Third, all core TAN files are applicable. That is, the interpretive impluse is assumed to be coupled with an equally strong desire to make the data as useful to as many users as possible, even those who may not share your assumptions or interpretation. TAN files are intended for intertextual comparison, so idiosyncrasies of a particular text-bearing object will be regarded by some users as either uninteresting or an obstacle. A creator of a transcription file should normalize and segment texts, adopting the most widely used reference systems, so as to optimize the alignment process. Morphological files should depend whenever possible upon commonly accepted grammars and lexica. Alignment files should work with comprehensible categories of text reuse. No TAN file will always be applicable to everyone, but it should be as suitable to as many as possible, for as many purposes as possible. In creating a TAN file you claim to use common, shared conventions whenever possible, and to note any departures; in using a TAN file, you hold the creator to that expectation.

Fourth, TAN data is to be considered accurate, but not necessarily precise or complete. For example, if a TAN-A file claims that the opening of Plato's Republic book 3 quotes from Homer's Iliad, the claim is true and accurate, but is neither precise nor complete. Parts of the opening of book 3 are certainly not quotations, and the whole of the Iliad is not quoted in the Republic. Or take a TAN-A-tok file. The token-for-token alignment of two texts might be selective, and focus only on the points of interest to the editor. Although the TAN formats permit a great deal of both precision and comprehensiveness, neither is mandated, except where explicitly noted by the TAN specifications. In creating a TAN file you claim to make accurate assertions; in using a TAN file, you should hold the creator to that expectation, but you must assess for yourself how precise and complete it is.

Prev	Up	Next
Format organization	Home	Core technology