<head>
)Now that we have explored various IRI vocabularies for concepts related to our files concerning Ring-a-ring-a-roses, we can now complete the metadata in our four TAN files. Let us start with the TAN-T file of the 1881 version:
<TAN-T xmlns="tag:textalign.net,2015:ns" TAN-version="2020"
id="tag:parkj@textalign.net,2015:ring01">
<head>
<name>TAN transcription of Ring a Ring o' Roses</name>
<master-location
href="http://textalign.net/release/TAN-2020/examples/ring-o-roses.eng.1881.xml"/>
<license licensor="park">
<IRI>http://creativecommons.org/licenses/by/4.0/</IRI>
<name>Attribution 4.0 International</name>
</license>
<work>
<IRI>http://dbpedia.org/resource/Ring_a_Ring_o%27_Roses</IRI>
<name>"Ring a Ring o' Roses" or "Ring Around the Rosie"</name>
</work>
<source>
<IRI>http://lccn.loc.gov/12032709</IRI>
<name>Kate Greenaway, Mother Goose, New York, G. Routledge and sons [1881]</name>
</source>
<vocabulary-key>
<person xml:id="park">
<IRI>tag:parkj@textalign.net,2015:self</IRI>
<name>Jenny Park</name>
</person>
<div-type xml:id="line">
<IRI>http://dbpedia.org/resource/Line_(poetry)</IRI>
<name>line of poetry</name>
</div-type>
<role xml:id="creator">
<IRI>http://schema.org/creator</IRI>
<name xml:lang="eng">creator</name>
</role>
</vocabulary-key>
<file-resp who="park"/>
<resp roles="creator" who="park"/>
<change when="2014-08-13" who="park">Started file</change>
<to-do/>
</head>
. . . . . . .
</TAN-T>
<name>
, the human readable
counterpart to the @id
that is
inside the root element, can be anything. And we can supply more than one <name>
, in case we wish to provide
alternative names of the file in different spellings or languages.
One or more <master-location>
s provide URLs where master versions of the
file are kept (and maintained). We provide this as a courtesy to others who might be
using our data. Anyone who validates their local copy of the file will be warned if
it does not match the master version, and they will be told of the most recent
changes. This lets us silently and conveniently notify other users of changes. We do
not have to keep track of the users of our file, and users do not have to pester us
with questions about what changed when.
<master-location>
is mandatory only if we are finished with our to-do list, which is specified at
<to-do>
. If that element is
empty, then we imply that we do not know of anything further that should be done to
the file. Conversely, any elements in <to-do>
specify what remains to be done, and details will be
returned to other users. That way you can release data that is useful but not
completely perfect, and let users know about its deficiencies.
One day the link in <master-location>
will be dead. But perhaps a copy of our
file will be in circulation in other quarters. The document @id
in the root element provides a way to
identify and find files, independent of links.
<license>
specifies the
license under which we are releasing our data. This element has nothing to do with
the copyright of the source we have used (although, having been published in 1881,
the book is clearly in the public domain). That is, we are specifying what rights are
attached to the data, not its source, i.e., if we have placed additional strictures
on the content in <body>
. In this
example, we have released the data under a creative commons license. The child
element <IRI>
specifies a Creative
Commons IRI, and <name>
is the
human-readable form.
@licensor
specifies who
has granted the license, in this case our fictive Jenny Park (see below).
The conjunction of <IRI>
and
<name>
, the IRI + name pattern, recurs throughout TAN files. They are
used provide identifiers for vocabulary items. In an
element that takes the IRI + name pattern, we may include as many children
<IRI>
s or <name>
s as we like. But if we do so, we
are stating that they are synonymous, i.e., that they all name the same thing. (Once
again, an IRI is unique, so it should never be used to identify more than one
thing.)
<work>
uses the IRI + name
pattern to name the work we have chosen to transcribe. <source>
points, through its IRI +
name pattern, to a computer- and human-readable description of the book we have
chosen.
<vocabulary-key>
contains vocabulary that we are using in our file. Inside, we can place more
vocabulary items, and attach locally unique ids. For example, an IRI + name pattern
is used for <person>
, which
identifies through a tag URN Jenny Park. The value of @xml:id
allows us to use
park
any time we want to mention Jenny. In fact, we already have, at
@licensor
. Any
mention of park
will point to the appropriate item in <vocabulary-key>
.
There are a few other parts of <vocabulary-key>
. <div-type>
specifies an IRI + name pattern for line
divisions, and the value of @xml:id
means that we can use line
any time we want to
invoke the concept. Similarly we have a <role>
. The <IRI>
value of <role>
comes from the vocabulary of schema.org, which is maintained by Bing,
Google, and Yahoo! in conjunction with the W3C (the nonprofit organization dedicated
to universal Internet standards), but we could have used Dublin Core or some other
IRI vocabulary describing behaviors, responsibilities, and roles.
After the <vocabulary-key>
, we get into parts of the file that specify who
did what, when. First is a <file-resp>
, whose value of @who
, park
, indicates that
Jenny Park is the one primarily responsible for the file. <resp>
specifies further who was
responsible for doing what.
Note | |
---|---|
If you decide to modify someone else's TAN file, you should credit / blame
yourself for the changes. Your first point of order should be to add a
|
Remember that <head>
is
focused on the data, not its sources, so the claim that Jenny Park is the creator
pertains only to the data. No inference should be made about who was responsible for
the printed source. If someone wants to know anything about the book, they should
pursue the IRI identifier we have provided under <source>
.
<change>
has attributes
@when
and @who
to specify who made the change and
when. The value of @when
is always
a date or a date + time, formatted according to the ISO standard syntax:
[YYYY]-[MM]-[DD]
or [YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]
.
@who
always carries an IDref
that points to a person or organization. <change>
does not take the IRI + name pattern, or even any
children at all.
So now we have finished one transcription file's metadata. The next one will look similar, but we'll take a couple of shortcuts:
<TAN-T xmlns="tag:textalign.net,2015:ns" TAN-version="2020" id="tag:parkj@textalign.net,2015:ring02"> <head> <name>TAN transcription of Ring around the Rosie</name> <master-location>ring-o-roses.eng.1987.xml</master-location> <license which="by 4.0" licensor="park"/> <work> <IRI>http://dbpedia.org/resource/Ring_a_Ring_o%27_Roses</IRI> <name>Ring around the Rosie</name> </work> <source> <IRI>http://lccn.loc.gov/87042504</IRI> <name>Mother Goose, from nursery to literature / by Gloria T. Delama, 1987.</name> </source> <adjustments> <normalization which="no hyphens"/> </adjustments> <vocabulary-key> <div-type xml:id="l" which="line (verse)"/> <person xml:id="park" roles="creator"> <IRI>tag:parkj@textalign.net,2015:self</IRI> <name xml:lang="eng">Jenny Park</name> </person> </vocabulary-key> <resp roles="creator" who="park"/> <change when="2014-10-24" who="park">Started file</change> <comment when="2014-10-24" who="park">See p. 39 of source.</comment> <to-do/> </head> . . . . . . </TAN-T>
In this example, <name>
,
<master-location>
, and <source>
have been modified to describe this file. Note, we
haven't had to change <work>
.
<license>
looks different,
but in reality it is identical to our previous example, and that is because the IRI +
name pattern has been replaced with @which
. You may replace any IRI + name pattern with @which
; its value should match a
<name>
in customized or
standard vocabulary (a TAN-voc file). In TAN's standard vocabulary for licenses (see
the section called “TAN keywords for types of rights (<license>)”) is the following item:
<TAN-voc xmlns="tag:textalign.net,2015:ns" TAN-version="2020" id="tag:textalign.net,2015:tan-voc:licenses"> . . . . . . . <body affects-element="license"> <item> <IRI>http://creativecommons.org/licenses/by/4.0/</IRI> <IRI>tag:textalign.net,2015:license:by/4.0/</IRI> <name>by 4.0</name> <desc>attribution 4.0 international</desc> </item> . . . . . . . </body> </TAN-voc>
Because the validation rules for TAN-voc files require every <name>
to be unique, that element can
be treated as a unique identifier, similar to @xml:id
. We could have repeated the <license>
from the previous TAN-T
file. But the @which
method is
much quicker and cleaner.
Before <vocabulary-key>
comes a new element, <adjustments>
, which contains a
<normalization>
statement whose @which
says
no hyphens
. That too points to a standard TAN vocabulary for
normalizations that provides an item with an IRI + name pattern for eliminating
discretionary hyphens (see the section called “TAN keywords for types of normalizations (<normalization>)”):
<TAN-voc xmlns="tag:textalign.net,2015:ns" TAN-version="2020" id="tag:textalign.net,2015:tan-voc:normalizations"> . . . . . . . <body affects-element="normalization"> <item> <IRI>tag:textalign.net,2015:normalization:hyphens-discretionary-removed</IRI> <name>no hyphens</name> <desc>Discretionary word-break line-end hyphens have been deleted.</desc> </item> . . . . . . . </body> </TAN-voc>
As you might have inferred, the element <normalization>
specifies how
we have changed the data, namely, that we have opted to remove word-break line-end
hyphenation. In other transcriptions we could use <normalization>
to declare
other kinds of changes we felt compelled to make, such as removing editorial comments
or footnote signals. A healthy list of <normalization>
s is a courtesy to users of our data, some of
whom might passionately care about keeping or removing line-end hyphenation.
Back to our example. <div-type>
has a new value for @xml:id
, the letter l
, and
in it too the IRI + name pattern has been replaced by @which
, whose value, line (poetry)
, is a
standard vocabulary item (see the section called “TAN keywords for types of divisions (<div-type>)”.
There is a also new <comment>
element, which is built much the same as <change>
. (A <change>
, after all, is just a
comment about what has been changed.)
That seems to be all there is. But if you've been attentive, you will have noticed
that <role>
from our first TAN-T
file (inside <vocabulary-key>
) is missing. That's because we don't need it,
based on the same principle that lets us resolve @which
. A vocabulary <name>
can be invoked not only in @which
, but in any attribute that
points to values of @xml:id
, in
this case @roles
. There is
already a standard TAN vocabulary item with the <name>
creator
, so we can use it directly without having to go through an
intermediate vocabulary item with an @xml:id
. If we had defined something else in <vocabulary-key>
with a
@xml:id
of
creator
, that item would take precedence and override the built-in
TAN vocabulary. But we haven't, so the standard TAN vocabularies are the
default.