Annotation of storage/meta/meta-format.tex, revision 1.18
1.1 casties 1: \documentclass[a4paper]{article}
2:
3: \usepackage[latin1]{inputenc}
4: \usepackage[T1]{fontenc}
5: \usepackage{ae}
6: %\usepackage{times}
7: %\usepackage{courier}
8:
9: % create in-text links black (with PDF)
1.6 casties 10: \usepackage[colorlinks=true,linkcolor=black]{hyperref}
1.1 casties 11: % Format URLs nicely (without PDF)
1.6 casties 12: %\usepackage{url}
1.1 casties 13:
14:
15: \title{A simple metadata format for resource bundles}
16:
1.4 casties 17: \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess}
1.1 casties 18:
1.18 ! casties 19: \date{V1.3.2 of 2.4.2007}
1.1 casties 20:
21: \begin{document}
22:
23: \maketitle
24:
25: \tableofcontents
26:
27:
28: \section{File and directory names}
29: \label{sec:file-directory-names}
30:
31: File and directory names should not contain spaces. Allowed characters
32: in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen
33: ``-'', underscore ``\_'' and dot ``.''.
34:
1.12 casties 35: Files and directories with names that contain illegal characters must
36: be transformed to allowed names. A proposition for a simple
37: transformation rule is
38:
39: \begin{itemize}
40: \item whitespace characters (e.g. blank, tab, cr, lf) are replaced by
41: hyphens ``-''
42:
43: \item other illegal characters are replaced by underscores ``\_''.
44: \end{itemize}
45:
46: This rule does not provide a reversible mapping to the original
47: illegal file name and it does not provide a collision-free mapping,
48: i.e. two different illegal file names might be mapped to the same
49: allowed file name. Additional precautions for these cases must be
50: taken.
1.1 casties 51:
1.4 casties 52:
53: \section{Metadata files}
54: \label{sec:metadata-files}
55:
56: The metadata information is stored in the XML format documented below
57: in special files in the resource directory. Two forms of metadata
58: files are possible:
59: \begin{itemize}
60: \item a file named \texttt{index.meta} in a directory.
61:
1.16 casties 62: \item a file with the same name as the data file it describes and an
1.4 casties 63: additional extension \texttt{.meta}. For example metadata for the
1.16 casties 64: file \texttt{p0001.tif} would be in a file \texttt{p0001.tif.meta}.
1.4 casties 65: \end{itemize}
66:
67: The resource directory must contain an \texttt{index.meta} file with
1.16 casties 68: information about the resource as a whole. Subdirectories can
69: contain additional \texttt{index.meta} files.
1.4 casties 70:
71: Additional information about single data files that are part of the
72: resource can either be put in \texttt{file} tags in the
73: \texttt{index.meta} file or in separate \emph{filename}\texttt{.meta}
74: files for each data file. Information from the directory level file is
1.16 casties 75: inherited at the file level when it is not overwritten.
1.4 casties 76:
77:
1.1 casties 78: \section{Resource format}
79: \label{sec:mpiwg-doc}
80:
81: In this description elements marked ``optional'' need not be supplied
82: by the provider of the resource and may be absent in all versions of
83: the metadata file. Elements marked ``required'' must be supplied by
84: the provider of the resource. Elements marked ``deduced'' can be
85: supplied by the provider of the resource but can also be provided by
1.4 casties 86: automatic scripts later in the process, these elements must be present
1.1 casties 87: in the final file.
88:
1.12 casties 89: File and directory paths in the metadata file use the conventional
90: Unix file separator slash ``/''.
91:
1.11 casties 92: The outer container element is \texttt{resource}. It has the following
93: \textbf{attributes}:
94:
95: \begin{description}
1.12 casties 96: \item[type] sub-type of resource (e.g. ``ECHO'', ``MPIWG'') --
97: optional.
1.11 casties 98:
1.16 casties 99: \item[version] version number of metadata format (currently 1.2) --
1.11 casties 100: required.
101: \end{description}
102:
103: \noindent The allowed \textbf{elements} inside \texttt{resource} are:
1.1 casties 104:
105: \begin{description}
1.14 casties 106: \item[description] An informal textual description of the resource --
107: optional\footnote{At least one description of the resource's content
108: is required. The description can be an informal
109: \texttt{description} element or a descriptive element (like
110: \texttt{bib}) in a \texttt{meta} container.}.
1.1 casties 111:
112: \item[name] The filename of the resource (name of the directory this
113: file is contained in) -- required.
114:
115: \item[creator] The name of the project or person that created the
116: resource -- optional.
1.4 casties 117:
118: \item[archive-creation-date] The time and date the archive collection
119: was created -- deduced.
1.1 casties 120:
1.4 casties 121: \item[archive-storage-date] The time and date the archive was written
122: to permanent storage -- deduced (must not be set by the user).
1.1 casties 123:
124: \item[archive-path] The full path to the resource directory inside the
1.5 casties 125: whole archive collection, including the resource directory -- deduced.
1.12 casties 126:
127: \item[archive-id] The ID for this document in the archive --
1.16 casties 128: optional.
1.1 casties 129:
130: \item[derived-from] Container for the description of the original
131: resource if this resource is a modified version of another resource
132: -- optional.
133:
134: \begin{description}
1.12 casties 135: \item[archive-id] The ID of the original resource
1.16 casties 136: -- required (or archive-path).
1.12 casties 137:
1.1 casties 138: \item[archive-path] The full path to the original resource
1.16 casties 139: -- required (or archive-id).
140:
141: \item[description] An informal textual description of the relation
142: of this resource to the original resource -- optional.
143: \end{description}
144:
145: \item[used-by] Container for the description of modified resources
146: if this resource is the source of another resource
147: -- optional.
148:
149: \begin{description}
150: \item[archive-id] The ID of the derived resource
151: -- required (or archive-path).
152:
153: \item[archive-path] The full path to the derived resource
154: -- required (or archive-id).
1.1 casties 155:
156: \item[description] An informal textual description of the relation
157: of this resource to the original resource -- optional.
158: \end{description}
159:
160: \item[linked-with] Container for the description of another
161: resource when this resource is a linked copy of another resource
162: -- optional.
163:
164: \begin{description}
1.12 casties 165: \item[archive-id] The ID of the linked resource
1.16 casties 166: -- required (or archive-path).
1.12 casties 167:
1.1 casties 168: \item[archive-path] The full path to the linked resource
1.16 casties 169: -- required (or archive-id).
1.1 casties 170:
171: \item[description] An informal textual description of the relation
172: of this resource to the linked resource -- optional.
173: \end{description}
174:
1.12 casties 175: \item[media-type] \label{tag-media-type} The main media type of this
176: resource -- required.\\ The main media type can be overridden by
177: \texttt{media-type}s in subdirectories. Possible types are
178: \begin{itemize}
179: \item \texttt{image}
180:
181: \item \texttt{text}
182:
183: \item \texttt{audio}
184:
185: \item \texttt{video}
186:
187: \item \texttt{data} for other type of data
188: \end{itemize}
1.1 casties 189:
190: \item[meta] Additional metadata information about the resource --
191: optional.\\ For a description of additional metadata see below.
192:
193: \item[dir] Container for the description of a subdirectory -- required
194: (when there are subdirectories).\\ \texttt{dir} tags should not be
195: nested. Directories at lower levels are identified by their
196: \texttt{path}.
197:
198: \begin{description}
199: \item[description] An informal textual description of the
200: subdirectory -- optional.
201:
202: \item[name] The name of the subdirectory -- required.
203:
1.12 casties 204: \item[original-name] A text string associated with the directory as
205: original name -- optional. (E.g. if the data in this directory
206: came from an external source and had a name that had to be changed
207: according to section~\ref{sec:file-directory-names} but it should
208: be possible to reference the original name.)
209:
1.1 casties 210: \item[path] The directory path of this subdirectory relative to the
1.5 casties 211: resource's root directory (excluding the directory itself) --
212: required (may be empty or omitted if the directory is a direct
213: child of the resource's root directory).
1.1 casties 214:
215: \item[meta] Additional metadata information about the directory --
216: optional.\\ For a description of additional metadata see below.
217: \end{description}
218:
219: \item[file] Container for the description of a file -- deduced.\\
220: \texttt{file} tags should not be nested in \texttt{dir} tags. Files
221: at lower directory levels are identified by their \texttt{path}.
222:
223: \begin{description}
224: \item[description] An informal textual description of the
225: file -- optional.
226:
227: \item[name] The name of the file -- required.
228:
1.12 casties 229: \item[original-name] A text string associated with the file as
1.16 casties 230: original name -- optional. (e.g. if this file came from an
1.12 casties 231: external source and had a name that had to be changed according to
1.16 casties 232: section~\ref{sec:file-directory-names} it is possible
233: to preserve the original name.)
1.12 casties 234:
1.1 casties 235: \item[path] The directory path of this file relative to the
1.5 casties 236: resource's root directory (excluding the file itself) -- required
237: (may be empty or omitted if the file is in the resource's root
238: directory).
1.7 casties 239:
240: \item[date] The file's modification or creation date\footnote{The
241: preferred time and date format is ``YYYY/MM/DD HH:MM:SS''},
242: whichever is more recent -- optional.
1.1 casties 243:
244: \item[modification-date] The file's modification date -- optional.
245:
246: \item[creation-date] The file's creation date -- optional.
1.7 casties 247:
1.1 casties 248: \item[size] The file size -- deduced.
249:
250: \item[mime-type] The file's mime-type -- optional.
251:
252: \item[md5cs] MD5 checksum of the file content -- optional.
253:
254: \item[meta] Additional metadata information about the file --
255: optional. For a description of additional metadata see below.
256: \end{description}
257:
258: \end{description}
259:
260:
261:
262: \section{Additional metadata}
263: \label{sec:additional-metadata}
264:
265: All elements with \texttt{meta} tags can contain an arbitrary number
1.12 casties 266: of the following additional metadata elements.
267:
1.16 casties 268: \subsection{Workflow state}
1.12 casties 269: \label{sec:workflow-state}
270:
271: All additional metadata elements can have a \texttt{workflow-state}
272: \textbf{attribute}. This attribute reflects the state of the
273: corresponding metadata element. The possible values for the
274: \texttt{workflow-state} attribute are
275: \begin{itemize}
276: \item \texttt{preliminary} this information is preliminary. It must
277: be checked in further workflow steps.
278:
279: \item \texttt{inwork}
280:
281: \item \texttt{final}
282: \end{itemize}
283:
284: workflow states other than \texttt{preliminary} are part of the
285: workflow handling of the respective projects.
286:
287: Metadata elements can appear multiple times with different
288: \texttt{workflow-state} attributes. This enables metadata versioning.
289:
290:
291:
292: \subsection{Content type}
293: \label{sec:content-type}
294:
295: \begin{description}
296: \item[content-type] \label{tag-content-type} The content type of this
297: resource -- required.\\
298: The content type enables the choice of tools to manipulate and
299: display the resource. There should be a common list of content
300: types. For digital documents (books, manuscripts) this would be
301: "scanned document", for other image data "scanned
302: images".\footnote{The criterion for documents is a ordered
303: succession of image files (pages) and equal image size and
304: resolution throughout the images of a resource.}
305: \end{description}
306:
307:
1.1 casties 308:
1.4 casties 309: \subsection{Language}
310: \label{sec:lang}
311:
312: The language of a resource (e.g. a text) can be specified with a
313: \texttt{lang} tag. Languages have to be described using the
314: international codes for the representation of names of languages
315: either in two-letter form (ISO 639-1) or in three-letter form (ISO
316: 639-2). The entire catalogue of languages is documented on the page
317:
318: \url{http://www.loc.gov/standards/iso639-2/englangn.html}
319:
1.1 casties 320:
321: \subsection{DRI}
322: \label{sec:dri}
323:
324: The \emph{digital resource identifier} for the resource is specified
1.4 casties 325: in a \texttt{dri} element. Digital resource identifiers are documented
1.1 casties 326: on the page
327:
328: \url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.
329:
330:
1.4 casties 331:
332: \subsection{Collection context}
333: \label{sec:collection-context}
334:
1.15 casties 335: The context of a resource as part of a collection or part of a project
336: can be specified in the \texttt{context} element. The context element
337: can appear multiple times if the resource is part of multiple
338: collections or projects.
1.4 casties 339:
340: \begin{description}
1.5 casties 341: \item[context] information on collection or project context.
1.4 casties 342:
1.5 casties 343: \begin{description}
1.15 casties 344: \item[link] URL to additional context information -- optional.
1.5 casties 345:
1.15 casties 346: \item[name] Textual description of project or collection -- optional.
347:
348: \item[meta-datalink] description of external sources of canonical meta
349: information -- optional
350: \begin{description}
351: \item[db] \textbf{attribute} to identify different sets of meta data
352: links to the same resource -- optional
353:
354: \item[object] \textbf{attribute} to identify different objects or
355: parts of the same resource -- optional
356:
357: \item[label] textual label for the link -- optional
358:
359: \item[url] URL to present to the client -- optional
360:
361: \item[metadata-url] URL to an external server to be queried -- optional
362: \end{description}
363:
364: \item[meta-baselink] description of external server for canonical meta
365: information -- optional
366: \begin{description}
367: \item[db] \textbf{attribute} to identify different sets of meta data
368: links to the same resource -- optional
369:
370: \item[label] textual label for the link -- optional
371:
372: \item[url] URL to present to the client -- optional
373:
374: \item[metadata-url] URL to an external server to be queried --
375: required (the parameter \texttt{object=} with an object id has
376: to be appended to this URL)
377: \end{description}
1.5 casties 378: \end{description}
1.4 casties 379: \end{description}
1.5 casties 380:
1.4 casties 381:
382:
383:
1.1 casties 384: \subsection{Bibliographic information}
385: \label{sec:bibliographic-data}
386:
1.5 casties 387: Bibliographic information is presented in a \texttt{bib} container with
1.1 casties 388: a \texttt{type} parameter, giving the type of bibliographic resource.
1.4 casties 389: The \texttt{type} field can be repeated as a tag in the container.
390:
1.5 casties 391: The format is based on the ECHO scheme for bibliographic data (cf.
392: content workflow), the MPIWG ``Projektbibliografie'' and the format of
393: the commonly used program ``EndNote''.
394:
1.4 casties 395:
396: \subsubsection{Book}
397:
398: \begin{description}
399:
400: \item [bib type="book"] a published book.
401:
402: \begin{description}
403: \item [author] The author of the book.
404: \item [year] The year of publication.
405: \item [title] Title of the book.
406: \item [series-editor] Name of the series editor, if the book appears
407: in a series.
408: \item [series-title] Title of the serie, if the book appears in a
409: series.
410: \item [series-volume] Volume number, if the book appears in a
411: series.
412: \item [number-of-pages] Number of pages of the entire book.
413: \item [city] City where the book was published.
414: \item [publisher] Name of the publishing company
415: \item [edition] Edition of the book (e.g. third edition)
416: \item [number-of-volumes] Number of volumes, if the the book is
417: published in multiple volumes.
418: \item [translator] Name of the translator.
419: \item [isbn-issn]
1.18 ! casties 420: \item[call-number] Call number in holding library
! 421: \item[holding-library] Holding library
1.4 casties 422: \end{description}
423: \end{description}
424:
425: \subsubsection{In Book}
426:
427: \begin{description}
428: \item [bib type="inbook"] an article as part of a book.
429:
430: \begin{description}
431: \item [author] The author of the book.
432: \item [year] The year of publication.
433: \item [title] Title of the article.
434: \item [editor] Name of the book's editor.
435: \item [book-title] Title of the book.
436: \item [series-volume] Volume number, if the book appears in a
437: series.
438: \item [pages] Number of pages of the article.
439: \item [city] City where the book was published.
440: \item [publisher] Name of the publishing company
441: \item [edition] Edition of the book (e. g. third edition)
442: \item [series-author] Name of the series editor, if the book appears
443: in a series.
444: \item [series-title] Title of the series, if the book appears in a
445: series.
446: \item [number-of-volumes] Number of volumes, if the the book is
447: published in multiple volumes.
448: \item [translator] Name of the translator
449: \item [isbn-issn]
1.18 ! casties 450: \item[call-number] Call number in holding library
! 451: \item[holding-library] Holding library
1.4 casties 452: \end{description}
453: \end{description}
454:
455: \subsubsection{Proceedings}
456:
457: \begin{description}
458: \item [bib type="proceedings"] a conference proceedings publication.
459:
460: \begin{description}
461: \item [author] The author of the article.
462: \item [year] The year of publication.
463: \item [title] Title of the article.
464: \item [editor] Name of the book's editor.
465: \item [conference-name] Name of the conference the proceedings are
466: related to.
467: \item [volume] Volume number.
468: \item [pages] Number of pages of the article.
469: \item [date] Date of the conference the proceedings are related to.
470: \item [conference]-location City where the conference was held.
471: \item [publisher] Name of the publishing company
472: \item [edition] Edition of the book (e. g. third edition)
473: \item [series-editor] Name of the series editor, if the book appears
474: in a series.
475: \item [series-title] Title of the series, if the book appears in a
476: series.
477: \item [number-of-volumes] Number of volumes, if the the book is
478: published as multiple volumes.
479: \item [isbn-issn]
1.18 ! casties 480: \item[call-number] Call number in holding library
! 481: \item[holding-library] Holding library
1.4 casties 482: \end{description}
483: \end{description}
484:
485: \subsubsection{Edited Book}
486:
487: \begin{description}
488: \item[bib type="edited-book"] a book that is the edition of another
489: work.
490:
491: \begin{description}
492: \item [editor] Name of the editor of the book.
493: \item [year] The year of publication.
494: \item [title] Title of the book.
495: \item [series-editor] Name of the editor of the series the book is
496: part of.
497: \item [series-title] Title of the series, if the book is part of a
498: series.
499: \item [series-volume] Volume number, if the book appears in a series.
500: \item [number-of-pages] Number of pages of the article.
501: \item [city] City where the book was published.
502: \item [publisher] Name of the publishing company
503: \item [edition] Information about the edition (e.g. ``Repr. of the London ed. 1652'')
504: \item [number-of-volumes] Number of volumes, if the the book is
505: published as multiple volumes.
506: \item [isbn-issn]
1.18 ! casties 507: \item[call-number] Call number in holding library
! 508: \item[holding-library] Holding library
1.4 casties 509: \end{description}
510: \end{description}
511:
1.17 casties 512: \subsubsection{Journal Volume}
513:
514: \begin{description}
515: \item [bib type="journal-volume"] a volume of a scientific journal.
516: \begin{description}
517: \item [title] Name of the journal.
518: \item [editor] The editor of the journal.
519: \item [publisher] Name of the publishing company.
520: \item [city] City where the journal is published.
521: \item [year] The year of publication.
522: \item [volume] Volume number.
523: \item [numer-of-pages] Number of pages of the volume.
524: \item [isbn-issn]
1.18 ! casties 525: \item[call-number] Call number in holding library
! 526: \item[holding-library] Holding library
1.17 casties 527: \end{description}
528: \end{description}
529:
1.4 casties 530: \subsubsection{Journal Article}
531:
532: \begin{description}
533: \item [bib type="journal-article"] an article in a scientific journal.
534: \begin{description}
535: \item [author] The author of the article.
536: \item [year] The year of publication.
537: \item [title] Title of the article.
538: \item [journal] Name of the journal.
539: \item [volume] Volume number, if the journal appears in a series.
540: \item [issue] Number of the issue the article is part of.
541: \item [pages] Number of pages of the article.
542: \item [alternate-journal] Alternate Journal
543: \item [isbn-issn]
1.18 ! casties 544: \item[call-number] Call number in holding library
! 545: \item[holding-library] Holding library
1.4 casties 546: \end{description}
547: \end{description}
548:
549: \subsubsection{Magazine Article}
550:
551: \begin{description}
552: \item [bib type="magazine-article"] an article in a popular magazine.
553: \begin{description}
554: \item [author] The author of the book.
555: \item [year] The year of publication.
556: \item [title] Title of the article.
557: \item [magazine] Name of the magazine.
558: \item [volume] Volume number, if the book appears in a series.
559: \item [issue-number] Number of the issue the article is part of.
560: \item [pages Number] of pages of the article.
561: \item [date] Date when the article appeared.
1.18 ! casties 562: \item[call-number] Call number in holding library
! 563: \item[holding-library] Holding library
1.4 casties 564: \end{description}
565: \end{description}
566:
567: \subsubsection{Newspaper Article}
568:
569: \begin{description}
570: \item [bib type="newspaper-article"] an article in a newspaper.
571: \begin{description}
572: \item [author] The author of the article.
573: \item [year] The year of publication.
574: \item [title] Title of the article.
575: \item [Newspaper] Name of the newspaper the article appeared in.
576: \item [pages] Number of pages of the article.
577: \item [issue-date] Date of the issue the article is part of.
578: \item [city] City of the newspaper.
1.18 ! casties 579: \item[call-number] Call number in holding library
! 580: \item[holding-library] Holding library
1.4 casties 581: \end{description}
582: \end{description}
583:
584: \subsubsection{Thesis}
585:
586: \begin{description}
587: \item [bib type="thesis"] a master/doctorate/etc. thesis.
588: \begin{description}
589: \item [author] The author of the thesis.
590: \item [year] The year of publication.
591: \item [title] Title of the thesis.
592: \item [academic-department] Name of the academic department where
593: the thesis was handed in.
594: \item [number-of-pages] Number of pages of the thesis.
595: \item [city] City where the thesis was published.
596: \item [University] Name of the university where the thesis was
597: handed in.
598: \item [isbn-issn]
1.18 ! casties 599: \item[call-number] Call number in holding library
! 600: \item[holding-library] Holding library
1.4 casties 601: \end{description}
602: \end{description}
603:
604: \subsubsection{Report}
605:
606: \begin{description}
607: \item [bib type="report"] a scientific report.
608: \begin{description}
609: \item [author] The author of the report.
610: \item [year] The year of publication.
611: \item [title] Title of the report.
612: \item [pages] Number of pages of the report.
613: \item [date] Date when the report appeared.
614: \item [city] City where the book was published.
615: \item [institution] Institution where the report was produced.
616: \item [type] Type of report.
617: \item [report-number] Report number.
1.18 ! casties 618: \item[call-number] Call number in holding library
! 619: \item[holding-library] Holding library
1.4 casties 620: \end{description}
621: \end{description}
622:
1.5 casties 623: \subsubsection{Manuscript}
624:
625: \begin{description}
626: \item [bib type="manuscript"] a handwritten/typewritten manuscript.
627:
628: \begin{description}
629: \item [title] Title of the manuscript.
630: \item [author] The author of the text.
631: \item [location] Name of the library where the manuscript is
632: currently located.
633: \item [year] The year or century of publication.
634: \item [pages] Number of pages of the manuscript.
635: \item [signature] Signature of the manuscript.
636: \item [editorial-remarks] Remarks related to the online
637: publication of the manuscript. This could be notes about
638: annotations etc.
639: \item [description] This can be any kind of description.
640: \item [keywords] Keywords related to the manuscript.
1.18 ! casties 641: \item[call-number] Call number in holding library
! 642: \item[holding-library] Holding library
1.5 casties 643: \end{description}
644: \end{description}
645:
646:
1.4 casties 647: \subsubsection{Generic}
648:
649: \begin{description}
650: \item [bib type="generic"] a generic bibliographic type. This type
651: should only be used in rare cases.
652: \begin{description}
653: \item [author]
654: \item [year]
655: \item [title]
656: \item [secondary-author]
657: \item [secondary-title]
658: \item [volume]
659: \item [number]
660: \item [pages]
661: \item [date]
662: \item [place-published]
663: \item [publisher]
664: \item [edition]
665: \item [tertiary author]
666: \item [tertiary-title]
667: \item [number-of-volumes]
668: \item [type-of-work]
669: \item [subsidiary author]
670: \item [alternate-title]
671: \item [isbn-issn]
672: \item [call-number]
673: \item [label]
674: \item [keywords]
675: \item [abstract]
676: \item [notes]
677: \item [url]
1.5 casties 678: \end{description}
1.4 casties 679: \end{description}
680:
681:
682: \subsection{Architectural drawings}
683: \label{sec:doc}
684:
685: Specific information for architectural drawings is presented in a
1.5 casties 686: \texttt{doc} container with an additional \texttt{type} attribute
687: giving the type of drawing. All elements inside the container can
688: appear multiple times.
1.4 casties 689:
690: \begin{description}
1.5 casties 691:
692: \item[doc type="Architectural Drawing"] architectural drawing.
693:
694: \begin{description}
695: \item [person] last name and first name of a person, separated by a
696: comma. A further common name for the person can be put infront,
697: separated by a semicolon.
698: \item [location] Name of a place in its common notation. This can be
699: a city or a institution.
700: \item [date] This can be a year (or several years, separated by
701: commas) or a period (1706-1714). Years are noted with four digits.
702: \item [object] Short description of an object or signatures.
703: \item [keywords] Keywords related to the object.
704: \end{description}
1.4 casties 705: \end{description}
1.1 casties 706:
707:
1.10 casties 708: \subsection{Document structure (table of contents)}
1.1 casties 709: \label{sec:toc}
710:
1.4 casties 711: Information on the structure of a document like the division into
712: parts and chapters in the way of a table of contents is presented in a
713: \texttt{toc} container.
714:
715: The scheme allows multiple logical pages on a single page image
716: as it is often the case with scanned books or manuscripts. The scheme
717: also allows for ``loose'' numbering schemes with roman, arabic or
718: other page numbers consecutively or mixed and changes in the numbering
719: within the document.
720:
721: The flexibility comes from the fact that no additional assumptions
722: about the mapping between logical pages and page images are made in
723: the format. All mapping information is specified by the user.
724:
725: The logical page numbering or naming that can be presented to the user
726: is specified in the \texttt{name} tags while the physical numbering of
727: the page images is specified in the \texttt{index} or \texttt{url}
728: tags.
1.1 casties 729:
1.4 casties 730: \begin{description}
1.5 casties 731: \item[toc] container for document structure
732:
1.4 casties 733: \begin{description}
1.5 casties 734: \item[page] describes a single logical page
735:
736: \begin{description}
737: \item[name] the ``name'' of the logical page. This can be any string
738: like a page number (arabic, roman, etc.) or a special designation
739: like ``Table 5''.
740:
741: \item[index] the \texttt{digilib} index number\footnote{The index
742: number for digilib is the index in the alphabetical order of the
743: scan file names.} of the scan image of the page.
744:
745: \item[url] alternatively to the \texttt{digilib} index number the
746: full URL of the scan image of the page can be used.
747: \end{description}
1.4 casties 748:
1.5 casties 749: \item[chapter] describes a section or chapter of the text.
750: \texttt{chapter} elements can be nested.
1.1 casties 751:
1.4 casties 752: \begin{description}
1.5 casties 753: \item[name] the title of the chapter or section.
754:
755: \item[start] the beginning of a page range (usually the first page
756: of the chapter). The \texttt{start} element has an optional
757: \texttt{increment} attribute to indicate the number of logical
758: pages on a scan image.\footnote{This information is only needed by
759: additional tools that try to generate lists of all page and
760: image numbers.}
761:
762: \begin{description}
763: \item[name] the ``name'' of the first page (see \texttt{page}).
764:
765: \item[index] the index of the first page (see \texttt{page}).
766:
767: \item[url] the URL of the first page (see \texttt{page}).
768: \end{description}
769:
770: \item[end] the end of a page range (usually the last page of the
771: chapter).
772:
773: \begin{description}
774: \item[name] the ``name'' of the last page (see \texttt{page}).
775:
776: \item[index] the index of the last page (see \texttt{page}).
777:
778: \item[url] the URL of the last page (see \texttt{page}).
779: \end{description}
780:
781: \item[page] alternative (and additional) to
782: \texttt{start}/\texttt{end} page ranges single \texttt{page}
783: elements can be used inside \texttt{chapter}.
1.4 casties 784: \end{description}
785: \end{description}
786: \end{description}
787:
788: %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}
1.1 casties 789:
790:
1.12 casties 791: \subsection{Digital images}
1.1 casties 792: \label{sec:inform-scann-imag}
793:
794: Image files representing scanned images can have an \texttt{img}
795: container tag with information about the scan resolution and the size
796: of the original image. This information is used by the
797: \texttt{digilib} image viewing tool.
798:
799: Required is one of three possible sets of tags:
800:
801: \begin{description}
1.5 casties 802: \item[img] digital image information.
1.1 casties 803:
1.5 casties 804: \begin{description}
1.12 casties 805: \item[original-size-x] The width of the original
806: image -- required. \\
807: The unit of measure can be contained as parameter \texttt{unit},
808: the default is meter ``m''. The width to be considered is the
809: total width of the scanned area.
1.5 casties 810:
1.12 casties 811: \item[original-size-y] The height of the original image -- required.
1.5 casties 812:
1.12 casties 813: \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
1.5 casties 814:
1.12 casties 815: \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
1.5 casties 816: \end{description}
1.1 casties 817: \end{description}
818:
819: or
820:
821: \begin{description}
1.5 casties 822: \item[img] digital image information.
823:
824: \begin{description}
825: \item[original-dpi-x] The resolution of the hi-res scan in its width
1.12 casties 826: in pixels per inch -- required.
1.1 casties 827:
1.5 casties 828: \item[original-dpi-y] The resolution of the hi-res scan in its height
1.12 casties 829: in pixels per inch -- required.
830:
831: \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
832:
833: \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
1.5 casties 834: \end{description}
1.1 casties 835: \end{description}
836:
837: or
838:
839: \begin{description}
1.5 casties 840: \item[img] digital image information.
841:
842: \begin{description}
843: \item[original-dpi] The resolution of the hi-res scan in pixels per
1.12 casties 844: inch if the resolutions in width and height are the same -- required.
845:
846: \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
847:
848: \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
1.5 casties 849: \end{description}
1.1 casties 850: \end{description}
1.7 casties 851:
852:
1.10 casties 853:
1.12 casties 854: \subsection{Digital image acquisition}
1.10 casties 855: \label{sec:inform-about-image}
856:
857: A description of the technology used in the process of producing a
858: digital image.
859:
860: \begin{description}
861: \item[image-acquisition] description of the image production process
862: \begin{description}
1.12 casties 863: \item[device] acquisition device (e.g. ``flatbed scanner'')
1.10 casties 864:
1.12 casties 865: \item[image-type] type and color-depth of the image -- required (e.g. ``RGB 24
1.10 casties 866: bit'')
867:
868: \item[production-comment] additional textual information about the
869: production process
870: \end{description}
871: \end{description}
872:
873:
1.12 casties 874:
1.7 casties 875: \subsection{Full text with images}
876: \label{sec:full-text-with}
877:
1.12 casties 878: Full text in a XML format should be specified with a
879: \texttt{content-type}\footnote{see section~\ref{tag-content-type}
880: on page\pageref{tag-content-type}} ``fulltext''.
1.8 casties 881:
882: The relation between the full text and optional images of
883: whole pages or parts of pages must be specified in a
884: \texttt{text-tool} container.
885:
886: \begin{description}
887: \item[text-tool] representation of full text with images
888:
889: \begin{description}
890: \item[text-file] the file name of the full text file (with path
891: inside document directory)
1.12 casties 892:
1.8 casties 893: \item[page-images] the directory name of the directory containig the
1.12 casties 894: page image files (with path inside document directory)
1.8 casties 895:
896: \item[xslt-file] the file name of an additional XSL transformation
897: file
898:
899: \item[text-config] container for configuration options
1.10 casties 900: \begin{description}
901: \item[container-tag] the name of the text root element (default
902: ``text'')
903:
904: \item[ref-element-tag] the name of the element that is used as
905: unit of reference when results are presented
1.8 casties 906:
1.10 casties 907: \item[pagebreak-tag] the name of the element that indicates page
908: breaks (default ``pb'')
909: \end{description}
1.8 casties 910: \end{description}
911: \end{description}
1.7 casties 912:
1.1 casties 913:
914:
1.12 casties 915: \subsection{Copyright and access conditions}
916: \label{sec:access-conditions}
917:
918: If the access to a resource is bound to conditions for technical or legal
919: reasons then the conditions can be put in a \texttt{access-conditions}
1.16 casties 920: container. Other usage conditions like copyright can also be
1.12 casties 921: documented in this container.
922:
923: \begin{description}
924: \item[access-conditions] legal and technical conditions for access to
925: this resource
926:
927: \begin{description}
928: \item[attribution] The name or institution this resource should be
929: attributed to when it's publicly presented
930:
931: \begin{description}
932: \item[name] a name (free text)
933:
934: \item[url] a URL (with an optional \texttt{label} attribute to show
935: as text)
1.18 ! casties 936:
! 937: \item[description] more information (free text, e.g. holding
! 938: library call number)
1.12 casties 939: \end{description}
940:
1.16 casties 941: \item[copyright] the copyright holder and it's conditions
1.12 casties 942: \begin{description}
1.16 casties 943: \item[owner] the name of the copyright holder
1.12 casties 944: \begin{description}
945: \item[name] a name (free text)
946:
947: \item[url] a URL (with an optional \texttt{label} attribute to show
948: as text)
949: \end{description}
950:
951: \item[date] the date when the copyright was issued
952:
1.16 casties 953: \item[duration] the duration of the copyright term (if known)
1.12 casties 954:
955: \item[description] free-text field for special or additional
956: conditions
957: \end{description}
1.14 casties 958:
959:
960: \item[publish-metadata] metadata about this resource can be made
1.16 casties 961: freely available when this tag is present (otherwise metadata has
962: the same access conditions as the rest of the resource). Access to
963: the resource itself is regulated separately by the \texttt{access}
964: element.
1.12 casties 965:
1.16 casties 966: \item[access] conditions of access to this resource. Different
967: access types are specified by a \texttt{type} attribute:
1.12 casties 968: \begin{description}
1.16 casties 969: \item[type=group] access restricted to the members of this named
970: group. The method to identify a user belonging to a named group
971: is not specified in this document.
972: \begin{description}
973: \item[name] name of the group.
974:
975: \item[only-before] the access condition is only valid before the
976: given date (format: ``YYYY/MM/DD'').
977:
978: \item[only-after] the access condition is only valid after the
979: given date (format: ``YYYY/MM/DD'').
980: \end{description}
981:
982: \item[type=institution] access restricted to the members of this
983: institution. The method to identify a user to belong to the
984: institution is not specified in this document.
1.12 casties 985: \begin{description}
1.16 casties 986: \item[name] name of the group.
987:
988: \item[only-before] the access condition is only valid before the
989: given date (format: ``YYYY/MM/DD'').
990:
991: \item[only-after] the access condition is only valid after the
992: given date (format: ``YYYY/MM/DD'').
993: \end{description}
994:
995:
996: \item[type=subnet] access restricted to all computers with an
997: IP-address in this subnet.
998: \begin{description}
999: \item[range] subnet range defined in
1000: truncated-quad (e.g. ``141.14''), network-netmask
1001: (e.g. ``141.14.0.0/255.255.0.0''), or network-range
1002: (e.g. ``141.14.0.0/16'') notation.
1003:
1004: \item[only-before] the access condition is only valid before the
1005: given date (format: ``YYYY/MM/DD'').
1006:
1007: \item[only-after] the access condition is only valid after the
1008: given date (format: ``YYYY/MM/DD'').
1009: \end{description}
1010:
1.12 casties 1011:
1.16 casties 1012: \item[type=scientific] access to this resource should be restricted to
1013: scientific work
1014: \begin{description}
1015: \item[only-before] the access condition is only valid before the
1016: given date (format: ``YYYY/MM/DD'').
1017:
1018: \item[only-after] the access condition is only valid after the
1019: given date (format: ``YYYY/MM/DD'').
1.12 casties 1020: \end{description}
1.16 casties 1021:
1.12 casties 1022:
1.16 casties 1023: \item[type=free] access to this resource is not restricted
1024: \begin{description}
1025: \item[only-before] the access condition is only valid before the
1026: given date (format: ``YYYY/MM/DD'').
1.12 casties 1027:
1.16 casties 1028: \item[only-after] the access condition is only valid after the
1029: given date (format: ``YYYY/MM/DD'').
1030: \end{description}
1031:
1.12 casties 1032:
1.16 casties 1033: \item[type=special] if none of the above conditions seems appropriate,
1.12 casties 1034: a free-form text can be specified here.
1.16 casties 1035: \begin{description}
1036: \item[description] description of special access conditions.
1037:
1038: \item[only-before] the access condition is only valid before the
1039: given date (format: ``YYYY/MM/DD'').
1040:
1041: \item[only-after] the access condition is only valid after the
1042: given date (format: ``YYYY/MM/DD'').
1043: \end{description}
1044:
1.12 casties 1045: \end{description}
1046: \end{description}
1047: \end{description}
1048:
1049: \noindent
1.16 casties 1050: It should be noted that control over access to the resource has to be
1051: provided by additional technical measures. Access conditions in the
1052: metadata file only state that conditions \emph{should} be observed, it
1053: is not implied that they \emph{are} necessarily observed, as the
1054: enforcement of conditions depends on additional measures.
1.12 casties 1055:
1056:
1057:
1058: \subsection{Acquisition of raw-data}
1059: \label{sec:acqu-inform}
1060:
1061: Information about the acquisition source for raw data resources can be
1062: provided in an \texttt{acquisition} container.
1063:
1064: \begin{description}
1065: \item[acquisition] the acquisition source of this resource -- required
1066: for raw data.
1067: \begin{description}
1068: \item[provider] where this resource came from -- required
1069: \begin{description}
1070: \item[name] free-text name of the provider (institution or
1071: individual)
1072:
1073: \item[address] address of the provider
1074:
1075: \item[contact] contact person at the provider (i.e. name and email)
1076:
1077: \item[url] URL related to the provider
1.13 casties 1078:
1079: \item[provider-id] id of the provider (internally used) -- deduced
1.12 casties 1080: \end{description}
1081:
1082: \item[date] date of acquisition -- required
1083:
1084: \item[description] free-text description of the acquisition source or
1085: additional information
1086: \end{description}
1087: \end{description}
1088:
1089:
1090:
1091: \subsection{Documentary Films}
1092: \label{sec:documentary-films}
1093:
1094: Documentary films can be described using a \texttt{film-acquisition}
1095: container.
1096:
1097: \begin{description}
1098: \item[film-acquisition] description of a (documentary) film --
1099: required for documentary film
1100: \begin{description}
1101: \item[recording] specification of the recording process
1102: \begin{description}
1103: \item[author] the person or persons doing the recording
1104:
1105: \item[date] the date or time span when the film was recorded
1106:
1107: \item[location] the place where the film was recorded
1108:
1109: \item[device] recording device used (e.g. ``Sony CP-DV8 Camcorder'')
1110:
1111: \item[format] format of the recorded film -- required (e.g. ``DV
1112: 720x524 25fps interlaced'')
1113: \end{description}
1114:
1115: \item[description] free-form description of the recording and the
1116: content of the film
1117: \end{description}
1118: \end{description}
1119:
1120: (More information about the digitization step could be added in a
1121: \texttt{digitization} tag similar to the \texttt{recording} tag.)
1122:
1.1 casties 1123:
1124:
1125:
1.4 casties 1126: \section{Sample metadata files for ECHO resources}
1.1 casties 1127:
1.5 casties 1128: The following is a sample metadata index file for a directory containig a
1129: scanned document.
1130:
1131: \begin{small}
1.1 casties 1132: \begin{verbatim}
1.11 casties 1133: <resource type="ECHO" version="1.0">
1.5 casties 1134: <description>Fleck, 1980</description>
1135: <name>fleck.1980</name>
1136: <creator>University of Bern</creator>
1137: <archive-path>ubern/wiss-theorie</archive-path>
1138: <content-type>scanned images</content-type>
1139: <meta>
1140: <dri>echo23a45e2329x</dri>
1141: <lang>ger</lang>
1142: <bib type="book">
1143: <author>Fleck, Ludwik</author>
1144: <year>1980</year>
1145: <title>Entstehung und Entwicklung einer
1146: wissenschaftlichen Tatsache</title>
1147: <series-editor></series-editor>
1148: <series-title></series-title>
1149: <series-volume></series-volume>
1150: <number-of-pages></number-of-pages>
1151: <city>Frankfurt am Main</city>
1152: <publisher>Suhrkamp</publisher>
1153: <edition></edition>
1154: <number-of-volumes></number-of-volumes>
1155: <translator></translator>
1156: <isbn-issn></isbn-issn>
1157: <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>
1158: <abstract></abstract>
1159: </bib>
1160: </meta>
1161: <dir>
1162: <description>Scanned images (300dpi)</description>
1163: <name>img</name>
1164: </dir>
1.4 casties 1165: </resource>
1166: \end{verbatim}
1.5 casties 1167: \end{small}
1.4 casties 1168:
1.5 casties 1169: The following is a sample metadata file for a single image of an
1170: architectural drawing.
1.4 casties 1171:
1.5 casties 1172: \begin{small}
1.4 casties 1173: \begin{verbatim}
1.11 casties 1174: <resource type="ECHO" version="1.0">
1.5 casties 1175: <creator>Bibliotheca Hertziana</creator>
1176: <content-type>scanned images</content-type>
1177: <file>
1178: <name>00000271-asl-160-r-full.tif</name>
1179: <meta>
1180: <img>
1181: <original-dpi>315</original-dpi>
1182: </img>
1183: <dri>echo45a67bc4367d</dri>
1184: <lang>ita</lang>
1185: <doc type="Architectural Drawing">
1186: <person>Ciolli, Giacomo</person>
1187: <person>Urban VIII; Barberini, Maffeo</person>
1188: <location>Accademia di San Luca</location>
1189: <location>Roma</location>
1190: <date>1706</date>
1191: <object>Concorso Clementino</object>
1192: <object>Fontana Pubblica</object>
1193: <object>Brunnen</object>
1194: <object>ASL 160</object>
1195: <keywords></keywords>
1196: </doc>
1197: <context>
1198: <url>http://colosseum.biblhertz.it:8080/Lineamenta/
1199: 1033478408.39/1035196181.35/1035196204.09/1035394121.83
1200: </url>
1201: </context>
1202: </meta>
1203: </file>
1.2 casties 1204: </resource>
1.1 casties 1205: \end{verbatim}
1.5 casties 1206: \end{small}
1.1 casties 1207:
1208: \end{document}
1209:
1210: %%% Local Variables:
1211: %%% mode: latex
1212: %%% TeX-master: t
1213: %%% End:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>