All TAN-compliant files, no matter the type or class, follow a common basic structure: (1) at least three processing instruction nodes, (2) a namespace node, and (3) a root element.
Processing instruction nodes: The first of
three required processing nodes is the standard declaration made in every XML file's
prolog: <?xml version="1.0" encoding="UTF-8"?>
After that come two
more processing instruction nodes specifying the two schema files required for validation
<?xml-model href="[PATH]/[ROOT-ELEMENT-NAME].rn[g OR c]"
type="application/relax-ng-compact-syntax"?>
<?xml-model href="[PATH]/[ROOT-ELEMENT-NAME].sch"
type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
The first processing instruction node points to the RELAX-NG schema that declares
the major, structural rules. The second points to the finely tuned rules, written in
Schematron. Both processing instructions are required. [PATH]
represents
the pathname to the schema file, whether local or on a server and
[ROOT-ELEMENT-NAME]
stands for the name of the root element (the
element that is the ancestor of all other elements in the document and the descendant
of none).
Note | |
---|---|
An exception to this rule is that a TAN-LM file may alternatively point to
|
It is your choice whether you use .rnc
or .rng
as
the extension for the RELAX-NG schema. The former is the compact syntax and the
latter, the XML format. They are equivalent. The schemas are written primarily in the
compact sequence, then converted to the XML format.
Some files admit different levels of validation, sorted into what Schematron calls
phases. TAN-A-div phases are termed basic
and verbose
, and
are chosen by specifying the phase in the prolog, e.g., <?xml-model
href="TAN-A-div.sch" phase="basic" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
. The verbose version
makes extra calculations that go beyond mere validation, and analyze the differences
between source files. In most cases, if you have not specified which phase you prefer
in the prolog, you will be prompted for a choice when you validate your file.
Master files are kept at the TAN git repository and website, but anyone may cache, save, serve, and use copies of the TAN schema files anywhere.
Namespace node: All TAN elements take the
namespace tag:textalign.net,2015:ns
. In most cases, this value is placed
in the root element. (The only exception are TAN-TEI transcription files, which take
as a default namespace http://www.tei-c.org/ns/1.0
everywhere but in
/TEI/head
, which takes the TAN namespace.) For more about namespaces,
see the section called “Namespaces”.
Root element: The name of the root element identifies the type of TAN file:
Each root element takes a mandatory @id
and @TAN-version
.
The root element takes only two mandatory children: <head>
and <body>
, the latter containing data and
the former, metadata (data about the data). The only exception to this rule are
TAN-TEI files, which take three children: <teiHeader>
, <head>
, and <text>
,
because the TEI header is inadequate for TAN purposes. See the section called “Transcriptions Using the Text Encoding Initiative (<TEI>
)”.
All TAN files may take one final optional child, <tail>
, a private use element that allows any
well-formed XML. Nothing in a TAN file should be dependent upon the <tail>
. That is, if you are editing
a TAN file and you add a <tail>
,
assume that it will be disregarded by other users. Similarly, you may delete any TAN
file's <tail>
without
consequence.
@id
and a TAN file's IRI
NameEvery TAN file requires in its root element an @id
. Its value, termed the TAN file's
IRI name, must take the form of a tag URN (see the section called “Tag URNs” for syntax). The file's IRI name is the primary way other
TAN files will refer to it.
The namespace of the current file's IRI name must match at least one namespace
in one <agent>
's <IRI>
value. This helps tie the
responsibility for the TAN file to at least one person. The first such <agent>
is called the key
agent.
In choosing a value for @id
you might borrow the filename, but you do not have to. Indeed, it is probably not
a good idea, since files are frequently renamed, often with good reason. A TAN
file's IRI name should not be changed, especially after publication, because the
name is supposed to be permanent and stable.
On occasion during editing, it will become clear that revisions are so deep
that the file is substantially different from how it began. If a previous version
has been published, then coining a new IRI name is advised,
to dissociate the file with its ancestry. You may always document the connection
by supplying a <see-also>
element in the <head>
,
specifying the <relationship>
between the two.
If you take someone else's data and alter it then you should not change the IRI name, even the namespace. To avoid
suggesting that the owner of that namespace is responsible for the revised file,
you should add yourself as an <agent>
and then document your alterations through
<change>
or @ed-when
and @ed-who
. You should also
probably add a <see-also>
element, pointing to a version of the file that predates your intervention.
The name of the version of a TAN file is identified by the most recent date in
a file's @when
, @ed-when
, or @when-accessed
. It is
important, therefore, whenever you change a TAN file that has already been
published to provide at least an edit stamp (the section called “Edit Stamp”) in the
part of the file you changed or in a <comment>
or <change>
, so that anyone validating a TAN file dependent
upon yours will be warned that changes have been made. The user may then either
continue to process the file (the changes may be minor on inconsequential) or
investigate the changes before deciding what to do.
Because the IRI name is stable, it is suitable for use outside of TAN, in, for example, RDFa, JSON-LD, and linked open data (see the section called “Identifiers and Their Use”).
The IRI name kept at @id
is
the only metadatum positioned outside <head>
. It is placed as rootward in the document as
possible to emphasize that it names the entire document.
@TAN-version
must be
1 dev
, indicating that the files have been made in light of the
development files of version one.