Text Alignment Network
The Text Alignment Network (TAN) is a framework that allows users, working
independently and collaboratively, to share, find, create, edit, and explore digital
texts and annotations.
A customized extension of
Text Encoding
Initiative (TEI) XML, TAN is particularly suited for organizing and
aligning texts with multiple versions (copies, translations, paraphrases), and for
creating and editing text annotations such as quotations, translation clusters
(word-to-word), and linguistic features.
The foundation of TAN is a suite of XML formats, each designed for a specific
task. The extensive valid routines maximize the syntactic and semantic
interoperability of texts, annotations, and language resources. TAN comes with
applications and utilities that open new frontiers in scholarly publishing, research,
and teaching.
Why use TAN?
Extensive error checking. Built-in TAN validation
rules go well beyond the customary error-checking performed by other formats. Files
linked in the network "talk" to each other, to let users know about changes and
updates. More than one hundred types of content-based errors are checked. Through
Schematron Quick Fixes, many of the problems can be corrected in a matter of
seconds.
Time-saving utilities. Enjoy enhanced editing
functions in Oxygen XML Editor's Author mode. Highly customizable TAN utilities help
you create, edit, and maintain TEI and TAN files. For example:
-
Body Builder: write rules to convert plain text or
Word docx files into a preferred TAN/TEI structure and markup.
-
Body Remodeler: incrementally restructure a text to
imitate an existing TAN/TEI file. In conjunction with Oxygen Author tools,
this utility can save hours of labor in creating a collection of many
versions of the same work.
-
Body Sync: update a TAN/TEI file so its
transcription exactly matches that of another TAN/TEI file.
-
TAN-A-lm Builder: generate lexico-morphological data
for a TAN/TEI file.
Pathbreaking applications. Core TAN applications,
written in XSLT, provide cutting-edge tools for textual research and analysis. For
example:
-
Diff+: identify, analyze, and visualize text
differences between any number of versions of a text.
-
Parabola: juxtapose in a single interactive HTML
page all the versions of a work, along with annotations.
-
Tangram: identify quotations, paraphrases, and
common text between two groups of texts.
Intuitive text referencing. Unlike TEI, HTML, or
other markup systems that rely heavily upon arbitrary identifiers that can be
difficult to navigate and maintain, TAN points to text portions using familiar
reference systems, or user-customized tokenization rules.
Application development. TAN is built upon an
extensive and robust XSLT function library, one of the few of its kind. Do you
already use
Natural Language Toolkit,
Classical Language Toolkit, or
comparable packages in programming languages to develop tools for textual and
linguistic research? Do you have to process, analyze, and transform texts that are
in
tree structures? With more than 250 public functions, covering a range of tasks, from
numerics to maps, checksums to tree manipulation, the TAN function library might have
everything you need, and more, and help you stay within an XML environment. Many TAN
functions are extremely useful, even outside of TEI or TAN.
Semantic Web. TAN was designed at the outset to
ensure that texts and their annotations would be rooted in the practices of the
Semantic Web. Unlike many other formats, whose attribute values are almost always
only human-readable, most TAN file components are tied to URIs, making them
suitable for use in Semantic Web applications.
Select TAN
libraries
Lexico-morphology
(rules for Latin, Greek, Syriac, Coptic, English, and TAN-A-lm files in Latin and
Greek)
Presentations
Publications
A number of end-use applications for TAN have been designed.
Listed
here are assorted output files from select TAN applications, past and present.
The output is intended as proof-of-concept, not as polished product, illustrative
of how
TAN files can be manipulated for publishing, studying, creating, and editing.
Sample output may contain known errors in content, structure, styling, and Javascript.
Files may be added, renamed, or deleted at any time. If the appearance looks irregular,
trying using a different browser.
All output content is believed to be available under a license that permits this type
of use.
Participation
Changes are made regularly to TAN, mainly in its
development
branch. If you have a TAN library, sharing it with other participants,
particularly via Git, will help developers test any changes that have been made to
the function library, and encourage others to contribute to your project.
The TAN project is by no means finished. This version TAN merely scratches the
surface of what is possible. New participants to test, use, and develop TAN's
schemas, functions, guidelines, and applications are welcome. Inquiries about
participation should be sent to the project director,
Joel Kalvesmaki, by email:
director
at
textalign.net
.
Previous versions