TAN Tutorial 1

Preparing a TEI Corpus for the Text Alignment Network

Session 4

Application

Objectives

  • Configure and use the TAN application Parabola.
  • Learn the basics of data types and regular expressions.
  • Work with class 2 adjustments.
  • Grouping works and sources via aliases.

TAN Applications

  • TAN utilities help you create and edit TAN files
  • TAN applications leverage your TAN files for research, teaching, publication
  • applications directory
  • Single XSLT file with companion configuration file
  • Parameters only, with ample documentation
  • Subdirectories hide the real work

Using XSLT

Running an Application

  • Read application documentation first!
  • Catalyzing document = ?
  • Oxygen: Configure Transformation Scenarios (ctrl+shift+T)
  • Command line: Saxonica instructions

Oxygen Notes

  • Project orientation
  • Frameworks are very powerful
  • Setup:
    1. Options > Preferences > Document Type Association > Locations
    2. Decide: global or project?
    3. Additional frameworks directories > Add
    4. Point to TAN 2021 root directory

Demonstration

TAN-A file from session 3
Method 1: TAN project
Method 2: tutorial project

Exercise 1

Features of Parabola output

XSLT and HTML

  • Many TAN applications generate output in HTML
  • HTML offers limitless possibilities that cannot be accounted for in TAN parameters
  • Javascript dependencies (local and libraries)
  • CSS dependencies
  • You might want something completely different
  • TAN apps provide proof of concept output
  • Advanced configuration or website integration may need specialized development

Configuring XSLT

Data types

There are many, many data types. Below are those you will encounter in the parameters used by TAN applications.
type example
string <xsl:param name="exstring" as="xs:string" select="'45'"/>
boolean <xsl:param name="exbool" as="xs:boolean" select="false()"/>
integer <xsl:param name="exint" as="xs:integer" select="45"/>
decimal <xsl:param name="exdec" as="xs:integer" select="45.0"/>
element <xsl:param name="exel" as="element" select="/*/tan:head"/>

Quantifiers and data types

Quantity Symbol Atomic type example Node type example
zero or one ? xs:string? element()?
exactly one none xs:boolean document-node()
zero or more * xs:dateTime* attribute()*
one or more + xs:integer+ comment()+

Demonstration

Parabola master file

Exercise 2

Configure Parabola

Regular expressions

  • Latin regula, "rule"
  • Patterned
  • Large, complicated topic
  • Learn enough to configure Parabola's parameters
Symbol Meaning
. any character
| or (union)
^ start of line
? zero or one
* zero or more
+ one or more
[ ] a class of characters
( ) a group
\w any word character
\W any nonword character
\s any of the four standard spacing characters: space (U+0020), tab (U+0009), newline (U+000A), carriage return (U+000D)
\S anything not a spacing character
Symbol Meaning
\d any digit (0-9)
\D anything not a digit
\n a new line (U+000A)
\r a carriage return line (U+000D)
\p{IsGujarati} any character from the Unicode block named Gujarati
^ beginning of a line or string (doesn't capture any characters)
$ end of a line or string (doesn't capture any characters)
\\ literal backslash (an escaped escape character)
\^ literal caret sign (must be escaped with the )
\$ literal dollar sign (escaped)
\( literal opening parenthesis (escaped)
\[ literal opening square bracket (escaped)

Special TAN extension

Symbol Meaning
\u{+b} all composites of the letter b: bᵇḃḅḇ⒝ⓑ㍴㏔㏝b𝐛𝑏𝒃𝒷𝓫𝔟𝕓𝖇𝖻𝗯𝘣𝙗𝚋
\u{.symbol} all characters that have "symbol" as part of their Unicode name
\u{!latin} all characters that do not have "latin" as part of their Unicode name
\u{.small.a!latin} all characters that have "small" and "a" as part of their Unicode name, but not "latin"
Limited support
Kalvesmaki, Joel. “A New \u: Extending XPath Regular Expressions for Unicode.” Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). https://doi.org/10.4242/BalisageVol25.Kalvesmaki01.

Demonstration

Parabola master file

Exercise 3

Configure Parabola regular expressions

Adjustments

  • Class 2 files
  • Prunes, reshapes sources, nonintrusively
  • Does not entail meaning
  • Can be applied selectively to sources
  • Four types

Adjustments

  1. skip
  2. rename
  3. equate
  4. reassign

Demonstration

TAN-A file from session 3
results

Exercise 4

Add TAN-A adjustments

Aliases

  • Available only in head/vocabulary-key
  • Allows you to assign one or more @xml:ids to a single id
  • @xml:id or @id (to allow punctuation)
  • TAN-A files: create sigla for families of manuscripts
  • Parabola: group your sources

Demonstration

Ring around the Rosie

Exercise 5

Configure aliases for Parabola

Recap

What we've learned

  • Configure and use the TAN application Parabola.
  • Learn the basics of data types and regular expressions.
  • Work with class 2 adjustments.
  • Grouping works and sources via aliases.

General discussion

Finish, evaluate exercises