Creating TAN Metadata (<head>)

Creating TAN Metadata (<head>)
Prev	Chapter 2. Starting off with the TAN Format	Next

Creating TAN Metadata (`<head>`)

Now that we have explored various IRI vocabularies for concepts related to our files concerning Ring-a-ring-a-roses, we can now complete the metadata in our four TAN files. Let us start with the TAN-T file of the 1881 version:

<TAN-T xmlns="tag:textalign.net,2015:ns" TAN-version="2020" 
    id="tag:parkj@textalign.net,2015:ring01">
    <head>
        <name>TAN transcription of Ring a Ring o' Roses</name>
        <master-location 
            href="http://textalign.net/release/TAN-2020/examples/ring-o-roses.eng.1881.xml"/>
        <license licensor="park">
            <IRI>http://creativecommons.org/licenses/by/4.0/</IRI>
            <name>Attribution 4.0 International</name>
        </license>
        <work>
            <IRI>http://dbpedia.org/resource/Ring_a_Ring_o%27_Roses</IRI>
            <name>"Ring a Ring o' Roses" or "Ring Around the Rosie"</name>
        </work>
        <source>
            <IRI>http://lccn.loc.gov/12032709</IRI>
            <name>Kate Greenaway, Mother Goose, New York, G. Routledge and sons [1881]</name>
        </source>
        <vocabulary-key>
            <person xml:id="park">
                <IRI>tag:parkj@textalign.net,2015:self</IRI>
                <name>Jenny Park</name>
            </person>
            <div-type xml:id="line">
                <IRI>http://dbpedia.org/resource/Line_(poetry)</IRI>
                <name>line of poetry</name>
            </div-type>
            <role xml:id="creator">
                <IRI>http://schema.org/creator</IRI>
                <name xml:lang="eng">creator</name>
            </role>
        </vocabulary-key>
        <file-resp who="park"/>
        <resp roles="creator" who="park"/>
        <change when="2014-08-13" who="park">Started file</change>
        <to-do/>
    </head>
    . . . . . . .
</TAN-T>

<name>, the human readable counterpart to the @id that is inside the root element, can be anything. And we can supply more than one <name>, in case we wish to provide alternative names of the file in different spellings or languages.

One or more <master-location>s provide URLs where master versions of the file are kept (and maintained). We provide this as a courtesy to others who might be using our data. Anyone who validates their local copy of the file will be warned if it does not match the master version, and they will be told of the most recent changes. This lets us silently and conveniently notify other users of changes. We do not have to keep track of the users of our file, and users do not have to pester us with questions about what changed when.

<master-location> is mandatory only if we are finished with our to-do list, which is specified at <to-do>. If that element is empty, then we imply that we do not know of anything further that should be done to the file. Conversely, any elements in <to-do> specify what remains to be done, and details will be returned to other users. That way you can release data that is useful but not completely perfect, and let users know about its deficiencies.

One day the link in <master-location> will be dead. But perhaps a copy of our file will be in circulation in other quarters. The document @id in the root element provides a way to identify and find files, independent of links.

<license> specifies the license under which we are releasing our data. This element has nothing to do with the copyright of the source we have used (although, having been published in 1881, the book is clearly in the public domain). That is, we are specifying what rights are attached to the data, not its source, i.e., if we have placed additional strictures on the content in <body>. In this example, we have released the data under a creative commons license. The child element <IRI> specifies a Creative Commons IRI, and <name> is the human-readable form.

@licensor specifies who has granted the license, in this case our fictive Jenny Park (see below).

The conjunction of <IRI> and <name>, the IRI + name pattern, recurs throughout TAN files. They are used provide identifiers for vocabulary items. In an element that takes the IRI + name pattern, we may include as many children <IRI>s or <name>s as we like. But if we do so, we are stating that they are synonymous, i.e., that they all name the same thing. (Once again, an IRI is unique, so it should never be used to identify more than one thing.)

<work> uses the IRI + name pattern to name the work we have chosen to transcribe. <source> points, through its IRI + name pattern, to a computer- and human-readable description of the book we have chosen.

<vocabulary-key> contains vocabulary that we are using in our file. Inside, we can place more vocabulary items, and attach locally unique ids. For example, an IRI + name pattern is used for <person>, which identifies through a tag URN Jenny Park. The value of @xml:id allows us to use park any time we want to mention Jenny. In fact, we already have, at @licensor. Any mention of park will point to the appropriate item in <vocabulary-key>.

There are a few other parts of <vocabulary-key>. <div-type> specifies an IRI + name pattern for line divisions, and the value of @xml:id means that we can use line any time we want to invoke the concept. Similarly we have a <role>. The <IRI> value of <role> comes from the vocabulary of schema.org, which is maintained by Bing, Google, and Yahoo! in conjunction with the W3C (the nonprofit organization dedicated to universal Internet standards), but we could have used Dublin Core or some other IRI vocabulary describing behaviors, responsibilities, and roles.

After the <vocabulary-key>, we get into parts of the file that specify who did what, when. First is a <file-resp>, whose value of @who, park, indicates that Jenny Park is the one primarily responsible for the file. <resp> specifies further who was responsible for doing what.

Note

	Note
If you decide to modify someone else's TAN file, you should credit / blame yourself for the changes. Your first point of order should be to add a `<person>` to the `<vocabulary-key>`, identifying yourself. You can then either add a `<change>` (see below) or a `<resp>` (you might need to specify a `<role>` in the `<vocabulary-key>`). You should not change the document's `@id`, unless your changes are so significant that it becomes altogether a new document. TAN does not try to broker the age-old problem of determining when a thing that undergoes changes becomes something altogether different. Use your best intuition.

If you decide to modify someone else's TAN file, you should credit / blame yourself for the changes. Your first point of order should be to add a <person> to the <vocabulary-key>, identifying yourself. You can then either add a <change> (see below) or a <resp> (you might need to specify a <role> in the <vocabulary-key>). You should not change the document's @id, unless your changes are so significant that it becomes altogether a new document. TAN does not try to broker the age-old problem of determining when a thing that undergoes changes becomes something altogether different. Use your best intuition.

Remember that <head> is focused on the data, not its sources, so the claim that Jenny Park is the creator pertains only to the data. No inference should be made about who was responsible for the printed source. If someone wants to know anything about the book, they should pursue the IRI identifier we have provided under <source>.

<change> has attributes @when and @who to specify who made the change and when. The value of @when is always a date or a date + time, formatted according to the ISO standard syntax: [YYYY]-[MM]-[DD] or [YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]. @who always carries an IDref that points to a person or organization. <change> does not take the IRI + name pattern, or even any children at all.

So now we have finished one transcription file's metadata. The next one will look similar, but we'll take a couple of shortcuts:

<TAN-T xmlns="tag:textalign.net,2015:ns" TAN-version="2020" 
    id="tag:parkj@textalign.net,2015:ring02">
    <head>
      <name>TAN transcription of Ring around the Rosie</name>
      <master-location>ring-o-roses.eng.1987.xml</master-location>
      <license which="by 4.0" licensor="park"/>
      <work>
         <IRI>http://dbpedia.org/resource/Ring_a_Ring_o%27_Roses</IRI>
         <name>Ring around the Rosie</name>
      </work>
      <source>
         <IRI>http://lccn.loc.gov/87042504</IRI>
         <name>Mother Goose, from nursery to literature / by Gloria T. Delama, 1987.</name>
      </source>
      <adjustments>
         <normalization which="no hyphens"/>
      </adjustments>
      <vocabulary-key>
         <div-type xml:id="l" which="line (verse)"/>
         <person xml:id="park" roles="creator">
            <IRI>tag:parkj@textalign.net,2015:self</IRI>
            <name xml:lang="eng">Jenny Park</name>
         </person>
      </vocabulary-key>
      <resp roles="creator" who="park"/>
      <change when="2014-10-24" who="park">Started file</change>
      <comment when="2014-10-24" who="park">See p. 39 of source.</comment>
      <to-do/>
   </head>
   . . . . . .
</TAN-T>

In this example, <name>, <master-location>, and <source> have been modified to describe this file. Note, we haven't had to change <work>.

<license> looks different, but in reality it is identical to our previous example, and that is because the IRI + name pattern has been replaced with @which. You may replace any IRI + name pattern with @which; its value should match a <name> in customized or standard vocabulary (a TAN-voc file). In TAN's standard vocabulary for licenses (see the section called “TAN keywords for types of rights (<license>)”) is the following item:

<TAN-voc xmlns="tag:textalign.net,2015:ns" TAN-version="2020" 
   id="tag:textalign.net,2015:tan-voc:licenses">
    . . . . . . .
   <body affects-element="license">
      <item>
         <IRI>http://creativecommons.org/licenses/by/4.0/</IRI>
         <IRI>tag:textalign.net,2015:license:by/4.0/</IRI>
         <name>by 4.0</name>
         <desc>attribution 4.0 international</desc>
      </item>
    . . . . . . .
   </body>
</TAN-voc>

Because the validation rules for TAN-voc files require every <name> to be unique, that element can be treated as a unique identifier, similar to @xml:id. We could have repeated the <license> from the previous TAN-T file. But the @which method is much quicker and cleaner.

Before <vocabulary-key> comes a new element, <adjustments>, which contains a <normalization> statement whose @which says no hyphens. That too points to a standard TAN vocabulary for normalizations that provides an item with an IRI + name pattern for eliminating discretionary hyphens (see the section called “TAN keywords for types of normalizations (<normalization>)”):

<TAN-voc xmlns="tag:textalign.net,2015:ns" TAN-version="2020" id="tag:textalign.net,2015:tan-voc:normalizations">
    . . . . . . .
   <body affects-element="normalization">
      <item>
         <IRI>tag:textalign.net,2015:normalization:hyphens-discretionary-removed</IRI>
         <name>no hyphens</name>
         <desc>Discretionary word-break line-end hyphens have been deleted.</desc>
      </item>
    . . . . . . .
   </body>
</TAN-voc>

As you might have inferred, the element <normalization> specifies how we have changed the data, namely, that we have opted to remove word-break line-end hyphenation. In other transcriptions we could use <normalization> to declare other kinds of changes we felt compelled to make, such as removing editorial comments or footnote signals. A healthy list of <normalization>s is a courtesy to users of our data, some of whom might passionately care about keeping or removing line-end hyphenation.

Back to our example. <div-type> has a new value for @xml:id, the letter l, and in it too the IRI + name pattern has been replaced by @which, whose value, line (poetry), is a standard vocabulary item (see the section called “TAN keywords for types of divisions (<div-type>)”.

There is a also new <comment> element, which is built much the same as <change>. (A <change>, after all, is just a comment about what has been changed.)

That seems to be all there is. But if you've been attentive, you will have noticed that <role> from our first TAN-T file (inside <vocabulary-key>) is missing. That's because we don't need it, based on the same principle that lets us resolve @which. A vocabulary <name> can be invoked not only in @which, but in any attribute that points to values of @xml:id, in this case @roles. There is already a standard TAN vocabulary item with the <name> creator, so we can use it directly without having to go through an intermediate vocabulary item with an @xml:id. If we had defined something else in <vocabulary-key> with a @xml:id of creator, that item would take precedence and override the built-in TAN vocabulary. But we haven't, so the standard TAN vocabularies are the default.

Prev	Up	Next
The Principles of TAN Metadata (<head>)	Home	Building TAN Vocabulary