Chapter 5. Class-1 TAN Files, Representations of Textual Objects (Scripta)

Chapter 5. Class-1 TAN Files, Representations of Textual Objects (Scripta)
Prev	Part II. Detailed Description	Next

General

(For more general principles and assumptions applying to all TAN files, not just class 1, see the section called “Design Principles”.)

Class-1 formats are designed for faithful but judiciously normalized digital transcriptions. Each TAN-T(EI) file is devoted exclusively to a single version of a single work found in a single scriptum (text-bearing object), segmented and uniquely labeled with a (preferably familiar) reference system.

Editors of TAN-T(EI) files should be able to read, write, and proofread texts in the languages of the transcriptions. They should understand the texts well enough to segment them and label them according to the conventions used for those works. They should be able to distinguish the text of a primary source from its editorial apparatus. They should be familiar with normalizing conventions for texts from the period, language, and culture. They should know how the transcription might be used in other contexts, especially translation studies or a study of quotations.

Editors need not understand everything about their texts, and they need not have any specialized skill in grammar or lexicography. They need not know the morphology of individual words, or how individual parts of the text have been translated. Those skills are more profitably spent editing other TAN formats.

TAN-T(EI) editors stand at the foundation level of the Text Alignment Network. Because other files will depend upon them, careful proofreading is important. Eliminating as many typographical errors as possible before publication will maximize the utility of a TAN-T(EI) file. On the other hand, TAN has been designed with the assumption that most files in circulation have typographical errors that can and should be corrected as they are found. If you are aware that a text needs proofreading, but you still want to make it available, simply leave a <comment> in the <to-do> part of the <head>.

If you are creating a TAN-T(EI) file, you are doing so primarily to facilitate alignment and annotation, which requires use of a suitable reference system (see reference systems). Transcription files should be segmented and labeled according to a reference system that is familiar and can be easily applied to other versions of the same text in other languages. If possible, semantic mileposts (clauses, sentences, paragraphs, chapters) should be prioritized over visual (lines, columns, pages, volumes). Any transcription can be furnished multiple reference systems, but it is advisable to do so on the basis of separate files, linked by <redivision>s in the <head>. See the section called “One Reference System”.

Domain model

Contributors and users of TAN files must sharply distinguish between a scriptum (text-bearing object) and a conceptual work, e.g., between a specific printed copy of the Iliad and the Iliad concieved generally. The former has materiality (digital files are treated here as being material) and the latter does not. Even though both are constitutively necessary for any transcription, the two are always differentiated in the TAN format: <source> and @src point to physical exemplars; <work>, @work, and <version> to the conceptual. Adherence to this distinction is quite important.

Some readers may be reminded at this point of the domain model defined by the Functional Requirements for Bibliographical Records (FRBR), which identifies in its Group 1 (Products of intellectual & artistic endeavor) four types of entities: work, expression, manifestation, and item. A work is "a distinct intellectual or artistic creation" and an expression is the conceptual, immaterial realization of a work. Both work and expression are terms for conceptual, non-material entities. A manifestation, on the other hand, is "the physical embodiment of an expression" and an item is a single exemplar of a manifestation.

	Note
	Quotations in this section come from International Federation of Library Associations and Institutions, Functional Requirements for Bibliographic Records: Final Report, amended and corrected (February 2009), http://www.ifla.org/VII/s13/frbr/.

Table 5.1. Examples of FRBR Group 1 Entities

Work	Expression	Manifestation	Item
Iliad	Caroline Alexander's English translation of the Iliad.	the print run identified with ISBN 978-0062046284	A specific copy
The Psalms	The (Hebrew) Masoretic Psalter	The 1820 printing of George Offor's edition of the Hebrew Psalms	Biblioteca Palatina Cod. Parm. 1699
A River Runs Through It	Norman MacClean's original version The 1992 film version	Print run ISBN 0226500608 Blue Ray disc UPC code 004339632533	Author's personal print copy Reference print CGB 7432-7438 (deposited in the Library of Congress)

Iliad

Caroline Alexander's English translation of the Iliad.

the print run identified with ISBN 978-0062046284

A specific copy

The Psalms

The (Hebrew) Masoretic Psalter

The 1820 printing of George Offor's edition of the Hebrew Psalms

Biblioteca Palatina Cod. Parm. 1699

A River Runs Through It

Norman MacClean's original version

The 1992 film version

Print run ISBN 0226500608

Blue Ray disc UPC code 004339632533

Author's personal print copy

Reference print CGB 7432-7438 (deposited in the Library of Congress)

TAN's domain model differs slightly. The most important difference is abandonment of FRBR's expressions, which was considered problematic in the development of sample TAN data. The term expressions was intended to describe a conceptual, non-material entity, but the FRBR guidelines defined and explained it in vague or material terms.

Note

"Expression encompasses, for example, the specific words, sentences, paragraphs, etc. that result from the realization of a work in the form of a text....defined, however, so as to exclude aspects of physical form, such as typeface and page layout, that are not integral to the intellectual or artistic realization of the work as such." (ibid., p. 19, emphasis added) That is, expression includes integral aspects of physical form (e.g., typeface that is integral to the realization). "Inasmuch as the form of expression is an inherent characteristic of the expression, any change in form (e.g., from alpha-numeric notation to spoken word) results in a new expression." (p. 20, emphasis added)

Even the very term expression and FRBR's preferred synonym, realization, imply materiality (without which nothing can be expressed or realized). Further, FRBR's expression does not easily handle creative adaptations of works that are themselves arguably works in their own right. For example, Euripides' Medea was adapted several centuries later by Seneca the Younger. Seneca's Medea is arguably merely an expression, but has itself been subject to various editions and performances, i.e., expressions. But FRBR does not accommodate expressions of expressions. If Seneca's Medea is treated as a work in its own right, its expression relationship to Euripides' origin is lost, since FRBR does not accommodate works that are expressions of other works.

In the TAN domain model, expression is altogether dropped. There is only one type of conceptual, non-material entity, namely, a work.

The term version in TAN is applied to a work that substantially follows but varies another work, e.g., translations and adaptations. But such versions are themselves still works. One work is indicated to be the version of another if a class-1 file through the <work> and <version> declarations.

As for material entities, FRBR's manifestation and item are combined in TAN through the term scriptum. A scriptum is a text-bearing object, e.g., book, manuscript, pamphlet, tombstone, traffic sign, digital file (digital media is interpreted as being material). When scriptum is used in a TAN file, it points either to a single physical item or to a set of physical items that are for all intents and purposes are indistinguishable (i.e., a scriptum reproduced mechanically). A scriptum that points to a manuscript points only to that one particular manuscript. But a scriptum that points to a printed book or a digital file is understood as applying to all copies of that printed book or digital file.

There is at present no formal mechanism to specify whether a scriptum points to one object or a set of objects. The distinction must be inferred from a scriptum's IRI + name pattern. In cases of potential ambiguity, it is up to creators of a TAN file to assign to the scriptum IRIs that avoid confusion. For example, to point to Edward Gibbon's personally annotated copy of the 1763 edition of Herodotus (now held by the Wren Library, Trinity College, Cambridge University), one should not use https://lccn.loc.gov/92189906 or http://www.worldcat.org/oclc/27188122, which point to the set of all copies. In this case, one may need to mint their own IRI, based on the Wren Library's acquisition number, RW.50.15.

In summary, the TAN domain model defines two kinds of entities: works and scripta. Works, which are immaterial, conceptual entities, may contain other works, or they may be versions of other works (or work-versions). Scripta, which are material entities, may contain other scripta, and they may refer either to a single object or to a set of copies. A work may be instantiated in many scripta, and similarly, any scriptum may contain many works. Most work-scriptum relationships can be inferred from the <head> of a class-1 file, and they may be expressed in a <TAN-A> file.

Table 5.2. Examples of TAN Entities

Work	Scriptum
Iliad Caroline Alexander's English translation of the Iliad.	the print run identified with ISBN 978-0062046284 a specific copy
The Psalms The (Hebrew) Masoretic Psalter	The 1820 printing of George Offor's edition of the Hebrew Psalms Biblioteca Palatina Cod. Parm. 1699
Norman MacClean's A River Runs Through It The 1992 film A River Runs Through It	Print run ISBN 0226500608 Author's personal print copy Blue Ray disc UPC code 004339632533 Reference print CGB 7432-7438 (deposited in the Library of Congress)

Work

Scriptum

Iliad

Caroline Alexander's English translation of the Iliad.

the print run identified with ISBN 978-0062046284

a specific copy

The Psalms

The (Hebrew) Masoretic Psalter

The 1820 printing of George Offor's edition of the Hebrew Psalms

Biblioteca Palatina Cod. Parm. 1699

Norman MacClean's A River Runs Through It

The 1992 film A River Runs Through It

Print run ISBN 0226500608

Author's personal print copy

Blue Ray disc UPC code 004339632533

Reference print CGB 7432-7438 (deposited in the Library of Congress)

One Version, One Work, One Object, One Reference System

Every TAN-T(EI) file must be restricted to a transcription of a single version of a single work found on a single scriptum, segmented and labeled according to a single reference system.

The principle above is critical to the the success of the network. It reduces the risk of confusion and simplifies the files. It follows the generally advisable principle, that master data should be disaggregated.

One Scriptum

Each TAN-T(EI) file must transcribe one and only one text-bearing object or scriptum. It may be a digital file, a book, a manuscript, a stone, a sign, or a bottlecap. If the object you've chosen has been made mechanically and is virtually indistinguishable from other objects created by the same process (e.g., copies of a printed book or copies of a digital file), then the entire set of copies (what some librarians call a manifestation) is to be regarded as the scriptum.

Identifying and naming a scriptum might require an editor's discernment and judgment. For example, some manuscripts have been split up, their parts now residing in multiple libraries around the world; other manuscripts are composites, made of several manuscripts. In such cases, you may need to define your scriptum in a way that might not match the way others define it. But the decision is your prerogative, not theirs. You have both the right and responsibility to define your object in the way that you think will most benefit users of your files.

The scriptum is declared via <source>, which either takes the IRI + name pattern, or points to a <scriptum> vocabulary item. It is a good idea to name your scriptum with an <IRI> value in the form of an http URL that points to a detailed entry in a library catalogue. Doing so allows users to retrieve extensive, structured bibliographical information. You also save yourself the hassle of having to write a detailed, structured bibliographical description. If a URL cannot be found for <IRI>, you may simply coin a tag URN or a UUID. Alternatively, if you find another TAN file that uses the same scriptum-source, incorporate its <name>s and <IRI>s with your own (multiple <name>s and <IRI>s are a virtue).

If you need to specify exactly where on a scriptum a work-version appears (e.g., page range), <comment> or <desc> should be used.

One Work

The transcription must be restricted to a single creative work, identified by <work> (part of the declarations section of <head>).

Many scripta have more than one work. Identifying the creative work you transcribe is, once again, your prerogative. Suppose the scriptum you have is a Bible. You define the work. Perhaps you wish to encode the entire Bible and treat it as a single work. Or maybe you wish to treat only the New Testament as the work, or the Tetraevengelion, or the Gospel of Matthew, or a specific episode in that gospel, or merely the Beatitudes. Use whichever work you like, but make sure that the TAN-T(EI) file contains nothing but the work you have declared. It should be a complete representation of what is found on the object, even if only partially preserved, and respect as far as is practical the order of the text in the scriptum.

The requirement to provide the entirety of the work-version on the scriptum is a significant departure from the fourth principle of the section called “Assumptions in the Creation of TAN Data”. Users should be able to assume that the transcription in a class-1 file covers the entirety of the work-version chosen, within the particular scriptum. If you are aware that the transcription is incomplete, leave a <comment> to that effect in the <head>'s <to-do>, identifying which portions are missing from the transcription.

Well-known works may have a suitable IRI already assigned to them, say by means of a DBPedia entry. Most works have not been assigned IRIs or are named in IRI vocabularies that are not well known. You may assign any work your own URN, through a UUID or a tag URN.

One Version

The transcription must be restricted to a single version of the creative work, identified perhaps by <version> (part of the declarations section of <head>). In most cases, <version> is unnecessary, because <work> in conjunction with <source> are in most cases sufficient to identify a particular work-version. But if the source carries multiple versions (e.g., a bilingual edition of a text), then <version> should be included, to specify which version has been transcribed. <version> can also be used to declare explicitly that the work mentioned in <version> is a version of the work mentioned in <work>.

If you have a scriptum with multiple versions of a work, and you wish to transcribe them all, each version should be in its own separate TAN-T(EI) file.

There may be cases where individual textual divisions are repeated, not so much because they represent a different version, but because they are variants that are integral to the work-version chosen. Creating a separate file for such individual cases would be both impractical and misleading. Standard TAN vocabulary for div types includes as a standard item variant, which may be use to wrap every variant in its own <div>, e.g.,

. . . . .
<div type="title" n="title">
   <div type="variant" n="orig">The Place</div>
   <div type="variant" n="subscript" xml:lang="grc">Ὁ Τόπος</div>
</div>
. . . . .

Notes should be included only if they are an integral part of the primary work (i.e., by the same author, not by a later editor). If you think the notes to a work are important, and legitimately a work in their own right, consider putting them in their own TAN-T(EI) file, or converting them to claims in a TAN-A file.

Very few work-versions have IRIs. It is advisable to assign a tag URN or a UUID. If the IRI you have used for <work> is in a namespace that you own or control, then you are entitled to modify it, and you may wish merely to add a suffix to the work IRI. For example, you might have tag:urn:example.com,2001:work:a defined for the work; a 1987 German translation might be specified as tag:urn:example.com,2001:work:a:ver:1987:deu.

One Reference System

Every TAN transcription must be segmented into a hierarchy of labeled divisions, defined in the <body> through <div>s and their @n values.

Those divisions, whenever possible, should align with the reference system that prevails for the work across different versions or translations, in what is sometimes called a canonical reference system. Because even the most familiar reference system admits degrees and dispute, the term canonical is problematic. It is avoided in these guidelines we refer simply to a work's reference system.

If you have your choice, preference should be given to reference systems that follow the semantic contours of the work, not the physical features of a particular scriptum. Chapter, paragraph, and sentence numbers are preferable to volume, page, and line numbers, because other versions of the work (e.g., translations, paraphrases) will only roughly, if at all, follow a reference system based on features found in a particular scriptum.

Sometimes a scriptum-based reference system is inescapable, or is the most common reference system for a work (e.g., Porphyry's commentary on the Categories). It is perfectly acceptable to adopt that system, but it may entail more labor during the alignment process.

If a given work has more than one common reference system (e.g., the works of Plato and Aristotle, which have two reference systems—logical and scriptum-oriented—both of which are standard and important), then the recommended practice is to create two class-1 files with identical transcriptions, each one structured by its own reference system. Place in each file a <redivision> pointing to the other. Under verbose validation, you will be notified if there are textual discrepancies between the transcriptions, and Schematron Quick Fixes will allow you to automatically update one text to match the other.

Having two or more alternatively divided editions can be quite useful. They could serve as the basis for reference cross-indexes, or to help convert other versions of the work from one reference system to the other.

If there is a good reference system, but the divisions are overly lengthy, you may introduce subdivisions. But there is no guarantee that the provisional subdivisions you introduce will be adopted by other editors who create or edit TAN versions of the same work. Editors working independently upon the same text and subdividing it, will likely produce discordant schemes. Class-2 formats provide a mechanism via <adjustments> to reconcile some basic differences. But a discordant scheme might be best handled simply by creating a copy, and restructuring it according to the preferred system, making sure related files refer to each other through <redivision>.

If a work does not have a reference system, or if you think that the ones that exist are inadequate or misguided, create one of your own. If you develop your own reference system, be sure to design it so that it can be easily applied to any version of the work, including translations. Prefer logical divisions of text over scriptum-based divisions.

TAN supports five major methods of numeration in reference systems:

Arabic numerals. 1, 2, 3, etc.
Roman numerals. Values up to 5000, utilizing i, v, x, l, c, d, and m, uppercase or lowercase, with liberal syntactic rules (within a roman numeral, any digit preceding one of a higher value will be deducted from the total value; all others are added).
Alphabetic sequences. The 26-letter Latin alphabet, with numbers higher than 26 (or any multiple of 26) beginning with the letter a incrementally repeated, e.g., y (25), z, (26), aa (27), bb (28), … aaa (53). Uppercase or lowercase allowed. (Note, this is not the hexavigesimal (base 26) system, where a is 0, b is 1, z is 25, aa is 00, ab is 01, etc.)
Arabic numerals + alphabetic sequences. Arabic numerals followed immediately by an alphabetic sequence. The second item is to be calculated as a subsequence of the first item, with the lack of a second item taking highest priority. E.g., 4, 4a, 4b, 4c....
Alphabetic sequences + Arabic numerals: As above, but with alphabetic sequence preceding Arabic numerals.

See tan:letter-to-number() and references there to TAN functions for converting numbering systems.

The TAN validation process attempts to convert all values of @n to Arabic numerals. Some values are ambiguously Roman numerals or alphabetic sequences. For example, c could mean 3 (alphabetic sequence) or 100 (Roman numeral). Such numerals are assumed to be Roman, unless you supply a <numerals> and assign @priority to specify letters (or roman).

Extra `@n` vocabulary

If you are using @n to label the names of books of the Bible or Surahs of the Qur'an, you will run into the issue of different conventions for @n. To avoid this long-standing problem, you may want to use extra TAN vocabulary for @n. If you include in the head of your TAN file <vocabulary which="bible eng"/>, then any non-numeric values of @n will be checked against the corresponding TAN-voc file (in this case, the TAN-voc file at /vocabularies/extra/n.bible.eng.tan-voc.xml). This, in turn, will will allow other files to refer to that <div> by any other <name> that is a synonym. For example, in a class-1 file pointing to the TAN English Bible vocabulary above, a <div type="book" n="matt">...</div> would be regarded as containing the work the Gospel of Matthew. Any class-2 file that refers to that class-1 file as a source may use any synonym listed in the extra vocabulary file n.bible.eng.tan-voc.xml, i.e., Mt, Mat, Matt, or Matthew (or their lowercase equivalents). An extra benefit of this method is that such <div>s are also marked as the works, identified by the <IRI>s of the target TAN vocabulary items.

If you use extra TAN vocabulary, it is recommended you include in the declarations section of your <head> an <n-alias>. This element, along with its @div-type, specifies exactly which types of <div>s are eligible for this kind of aliasing on @n. Supplying this element considerably speeds the validation process on long files.

The goal behind the extra vocabularies is to eliminate the need to worry about what abbreviations are used to name well-known, unnumbered <div>s. It is hoped that in future releases of TAN these extra vocabularies will grow in number and quality.

Extra TAN @n vocabularies:

Normalizing Transcriptions

You should declare how you have normalized the transcription via <adjustments> and its children, e.g., <normalization> or <replace>. (For suggestions on values of <IRI> for <normalization> see the section called “TAN keywords for types of normalizations (<normalization>)”.)

Generally speaking, normalization entails the suppression of things extraneous to or separable from the work-version you have chosen. You are encouraged to omit parenthetical editorial insertions (especially quotation references), stray handwritten remarks, discretionary word-breaking hyphens, editorial comments, inserted cross-references, and reference numerals (page numbers, section numbers, etc.). If chapter 4 of a text begins "4." or "IV" then leave out that labeling numeral—you've already indicated it in @n, so there's no need to clutter the transcription with it. Remember, scholars who use your file will be concerned with things like word-for-word alignments and lexico-morphological analysis, and putting in a modern editor's "4" might contaminate research results. For the same reason, you should resolve ligatures and correct unintended typographical errors.

The goal is a transcription whose text is free of the interpretive voice of later editors. You should remove from the text anything that is not part of the work proper and would interfere with detailed word-for-word alignment, or would require extra preprocessing or postprocessing work for other users. If you are breaking a transcription into individual lines, and you are required to break a word, do so with either the soft hyphen (), the zero-width space (), or the zero-width joiner (‍). TAN processors that handle the text within a leaf <div> will automatically normalize its space. If either of those two characters are found at the end then it will be deleted and the text from the next leaf <div> (if there is one) will immediately follow without intervening space; if those two characters do not occur at the end, then a space,  , will be added, and all other space will be normalized. For more details, see the section called “Space characters and normalization”.

In a digital source, variable lengths of special spacing marks (e.g., General Punctuation U+2000..U+200B) should be converted to ordinary spaces (see the section called “Unicode points not allowed”), and superscript combining Roman letters (U+0363..U+036F) should probably be converted to their non-combining counterparts. All Unicode must be normalized to NFC forms (see the section called “Unicode Normalization”).

Variant readings should not be transcribed. For example, a manuscript may have correctors' marks. Or a set of footnotes (or apparatus criticus) might provide an alternative reading. In those cases, each set of corrections should be moved to a separate TAN-T file, or rewritten as <claim>s of a TAN-A file.

In some ambiguous areas, you can use TAN-TEI both to normalize and to preserve what is in the scriptum. Suppose, for example, a manuscript has reference numerals that are sui generis. That is, these reference numbers do not correspond to the "canonical" reference scheme, and are scribal adjustments to the text's structure (sometimes mistaken). On the one hand, such reference numerals are metadata, and should arguably be deleted; on the other, they are part of the text, and witness to how a text was read and changed over time. A middle-ground approach would move these references to TAN-TEI's <milestone rend="[TEXT]">, substituting [TEXT] for the reference text. In that way, the numerals are properly removed from the main text, but the information is retained. Generally speaking, TEI's @rend is an excellent way to remove something from a transcription while keeping it in the file.

Overall, normalization is a difficult, understudied topic. Scholars are not in the habit of documenting everything they normalize, and sometimes have so internalized a set of normalizations that they are unaware of them. Not all decisions will be clear-cut. You may justly hesitate before normalizing orthography, punctuation, accentuation, or capitalization. Some aspects of Unicode that permit different conventions may need special consideration. You may need to deliberate on whether an unusual or rarely used Unicode character might be misinterpreted or hinder searches. Document any decisions in the <adjustments>. Whether you use <normalization> or <replace> is up to you. The former can be used to apply a class of changes to a vocabulary item. The latter provides a precise, regular-expression-based method of describing exactly what has been changed, and the order in which those changes took place. Note, a <replace> might help one to reconstruct the path that led from the input to the output, but not the reverse. If it is important to document exactly what the pre-normalized version of a text was like, use <predecessor> or a similar element available in the key links section of the <head> (see the section called “Other Related Files”) to point to the original.

If you find it very difficult to bring yourself to normalize to the depth advised above, try first making a (non-TAN) TEI file, and create the transcription you have in mind as the ideal. Once that is finished, create a second, TAN version, and be more aggressive in your normalization, with <see-also> pointing to the first approach. Users of your TAN transcription will be more interested in your TAN version than the TEI version, but you will have at least satisfied your craving to avoid normalizing.

Normalizing Annotations

The footnotes or endnotes in a scriptum should be normalized. Many, most, or all should likely be deleted. Before deciding, distinguish between those that are an intrinsic part of the work you're transcribing from those that aren't. Those that aren't can be removed, or they can be put into a separate TAN-T(EI) file, perhaps linking the two through <see-also>, and hopefully structuring both files with the same reference system, to facilitate alignment. Another way to approach the task is to convert some or all of the notes you're removing into <TAN-A> <claim>s.

Footnotes, endnotes, glosses, or marginalia that are intrinsic parts of the work present special challenges for encoding in general, and normalization in particular.

First is the issue of connecting an annotation to the text annotated. When we encounter a superscript number—a note signal—while reading the text of a printed book, we infer that we are being invited to find a companion footnote, and that footnote comments on the text we have just read. But specifically what text? Is it only the preceding word? Is it a word or phrase that occurs earlier in the sentence? Does the annotation cover earlier sentences, the entire paragraph, or even prior paragraphs? For some notes, identifying the text being annotated requires interpretation.

In a digital file, connecting an annotation to its text cannot be so vague; it requires a decision and a commitment. Here are three possible ways to approach annotations in a TAN file:

Use the <note> feature of TAN-TEI (see related TEI documentation). This will allow you to connect the annotation to merely an anchor in the text, i.e., to no text whatsover.
```
<div n="1" type="p">
   <p>The process occurred in New York, among other places.<ref rend="1"/>
      <note><p><ref rend="1"/>On New York, see: X.</p></note>
   </p>
</div>
```
Move each annotation into a <div> with a @type that implies that it is an annotation (e.g., scholium) and place it immediately after the <div> it annotates.
```
<div n="1" type="p">The process occurred in New York, among other places.</div>
<div n="n1" type="footnote">On New York, see: X.</div>
```
Note in the example above that n1 is used to make sure that 1 unambiguously points to only one <div>.

As #2, but also write a <TAN-A> file that more precisely connects each annotation to the text it annotates.

<claim verb="annotates">
   <subject src="text" ref="n1"/>
   <object src="text">
      <from-tok ref="1" val="The"/>
      <through-tok ref="1" val="York"/>
   </object>
</claim>

The first option is expeditious, and will allow you to be as precise or imprecise as you like. Validation is not affected, but you should be aware that the <note> will be treated as a constituent part of its parent <div>. The second option is also relatively easy, but it entails a decrease in precision. The third option provides immense precision, permits multiple annotations on the same text range, and allows notes to target overlapping ranges of text. But the task could be time-consuming, if only because you will need to determine the range of text targeted by each annotation, and the targeted text might be quite messy or vague. You will need to take stock of how precise and comprehensive you choose to make your connections. (See also accuracy, precision, and comprehensiveness.)

Remember that the note signals in the main text and in the footnote area are metadata meant to help readers link corresponding passages of texts, and in the spirit of normalizing should be deleted. In a TAN-TEI file you can replace a note signal with <ref> (see above).

Prev	Up	Next
Metadata (<head>)	Home	Class 1 Metadata

Extra @n vocabulary

Extra `@n` vocabulary