TAN keywords for types of normalizations (<normalization>)

Definitive list of key terms used for normalizations to texts.

Master location: http://textalign.net/release/TAN-1-dev/TAN-key/normalizations.TAN-key.xml

Table 9.6. TAN keywords for types of normalizations

keywords (optional values of @which)IRIsComments
  • no hyphens

  • tag:textalign.net,2015:normalization:hyphens-discretionary-removed

Discretionary word-break line-end hyphens have been deleted.

  • norm space

  • tag:textalign.net,2015:normalization:space-typographer-converted

General Punctuation spaces (U+2000..U+200B) to regular space have been replaced with regular space. Equivalent to fn:replace(('[\x{2000} \x{2001} \x{2002} \x{2003} \x{2004} \x{2005} \x{2006} \x{2007} \x{2008} \x{2009} \x{200A} \x{200B}]',' ')

  • no note callouts

  • tag:textalign.net,2015:normalization:annotation-signals-removed

Footnote or endnote signals (frequently superscript numbers or letters) have been deleted.

  • no notes

  • tag:textalign.net,2015:normalization:annotation-content-removed

Footnotes or endnotes have been deleted.

  • no comments

  • tag:textalign.net,2015:normalization:comments-editorial-removed

Editorial comments have been deleted.

  • no pointers

  • tag:textalign.net,2015:normalization:pointers-reference-removed

Reference pointers to other texts, both internal (cross-references) and external (citations of primary or secondary sources) have been deleted.

  • no milestones

  • tag:textalign.net,2015:normalization:milestones-reference-removed

Reference milestones such as page numbers and section numbers have been deleted.

  • no ligatures

  • tag:textalign.net,2015:normalization:ligatures-converted

All ligatures have been converted into constituent letters.

  • no combining chars

  • tag:textalign.net,2015:normalization:letters-combining-converted

All combining letters (U+0363..U+036F) have been converted to their corresponding ASCII counterpart.

  • corrected spelling

  • tag:textalign.net,2015:normalization:orthography-corrected

All orthography (spelling) has been tacitly corrected to standard forms.

  • corrected punctuation

  • tag:textalign.net,2015:normalization:punctuation-corrected

All punctuation has been tacitly corrected to standard forms.

  • no punctuation

  • tag:textalign.net,2015:normalization:punctuation-removed

All punctuation has been removed.

  • corrected capitalization

  • tag:textalign.net,2015:normalization:capitalization-corrected

All letters have been tacitly capitalized according to standard forms.

  • changed to lowercase

  • tag:textalign.net,2015:normalization:case-upper-to-lower

All uppercase letters converted to lowercase.

  • changed to uppercase

  • tag:textalign.net,2015:normalization:case-lower-to-upper

All lowercase letters converted to uppercase.

  • no music

  • tag:textalign.net,2015:normalization:music-printed-removed

Printed music has been removed.

  • no prepunctuation space

  • tag:textalign.net,2015:normalization:space-prepunctuation-corrected

All prepunctuation space has been corrected according to standard forms.

  • normalized unicode

  • tag:textalign.net,2015:normalization:nfc

All non-NFC-compliant Unicode converted to normalized Unicode. Same effect as if applying normalize-unicode(().

  • converted html to tan

  • tag:textalign.net,2015:normalization:html-to-tan-t

HTML converted to TAN-T format