1: \documentclass[a4paper]{article}
2:
3: \usepackage[latin1]{inputenc}
4: \usepackage[T1]{fontenc}
5: \usepackage{ae}
6: %\usepackage{times}
7: %\usepackage{courier}
8:
9: % create in-text links black (with PDF)
10: \usepackage[colorlinks=true,linkcolor=black]{hyperref}
11: % Format URLs nicely (without PDF)
12: %\usepackage{url}
13:
14:
15: \title{A simple metadata format for resource bundles}
16:
17: \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess}
18:
19: \date{V1.0 of 24.7.2003}
20:
21: \begin{document}
22:
23: \maketitle
24:
25: \tableofcontents
26:
27:
28: \section{File and directory names}
29: \label{sec:file-directory-names}
30:
31: File and directory names should not contain spaces. Allowed characters
32: in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen
33: ``-'', underscore ``\_'' and dot ``.''.
34:
35: File and directory paths in the metadata file use the conventional
36: Unix file separator slash ``/''.
37:
38:
39: \section{Metadata files}
40: \label{sec:metadata-files}
41:
42: The metadata information is stored in the XML format documented below
43: in special files in the resource directory. Two forms of metadata
44: files are possible:
45: \begin{itemize}
46: \item a file named \texttt{index.meta} in a directory.
47:
48: \item a file named like the data file it describes with an
49: additional extension \texttt{.meta}. For example metadata for the
50: file \texttt{0001.tif} would be in a file \texttt{0001.tif.meta}.
51: \end{itemize}
52:
53: The resource directory must contain an \texttt{index.meta} file with
54: information about the resource as a whole. Other directories can
55: contain \texttt{index.meta} files.
56:
57: Additional information about single data files that are part of the
58: resource can either be put in \texttt{file} tags in the
59: \texttt{index.meta} file or in separate \emph{filename}\texttt{.meta}
60: files for each data file. Information from the directory level file is
61: inherited at the file level.
62:
63:
64: \section{Resource format}
65: \label{sec:mpiwg-doc}
66:
67: In this description elements marked ``optional'' need not be supplied
68: by the provider of the resource and may be absent in all versions of
69: the metadata file. Elements marked ``required'' must be supplied by
70: the provider of the resource. Elements marked ``deduced'' can be
71: supplied by the provider of the resource but can also be provided by
72: automatic scripts later in the process, these elements must be present
73: in the final file.
74:
75: The outer container element is \texttt{resource}. Sub-types (``ECHO'',
76: ``MPIWG'') can be specified if necessary with a \texttt{type}
77: parameter. Its sub-elements are:
78:
79: \begin{description}
80: \item[description] An informal textual description of the
81: resource -- optional.
82:
83: \item[name] The filename of the resource (name of the directory this
84: file is contained in) -- required.
85:
86: \item[creator] The name of the project or person that created the
87: resource -- optional.
88:
89: \item[archive-creation-date] The time and date the archive collection
90: was created -- deduced.
91:
92: \item[archive-storage-date] The time and date the archive was written
93: to permanent storage -- deduced (must not be set by the user).
94:
95: \item[archive-path] The full path to the resource directory inside the
96: whole archive collection, including the resource directory -- deduced.
97:
98: \item[derived-from] Container for the description of the original
99: resource if this resource is a modified version of another resource
100: -- optional.
101:
102: \begin{description}
103: \item[archive-path] The full path to the original resource
104: --required.
105:
106: \item[description] An informal textual description of the relation
107: of this resource to the original resource -- optional.
108: \end{description}
109:
110: \item[linked-with] Container for the description of another
111: resource when this resource is a linked copy of another resource
112: -- optional.
113:
114: \begin{description}
115: \item[archive-path] The full path to the linked resource
116: --required.
117:
118: \item[description] An informal textual description of the relation
119: of this resource to the linked resource -- optional.
120: \end{description}
121:
122: \item[content-type] The content type of this resource -- required.\\
123: The content type enables the choice of tools to manipulate and
124: display the resource. There should be a common list of content
125: types. For digital documents (books, manuscripts) this would be
126: "scanned document", for other image data "scanned
127: images".\footnote{The criterion for documents is a ordered
128: succession of image files (pages) and equal image size and
129: resolution throughout the images of a resource.}
130:
131: \item[meta] Additional metadata information about the resource --
132: optional.\\ For a description of additional metadata see below.
133:
134: \item[dir] Container for the description of a subdirectory -- required
135: (when there are subdirectories).\\ \texttt{dir} tags should not be
136: nested. Directories at lower levels are identified by their
137: \texttt{path}.
138:
139: \begin{description}
140: \item[description] An informal textual description of the
141: subdirectory -- optional.
142:
143: \item[name] The name of the subdirectory -- required.
144:
145: \item[path] The directory path of this subdirectory relative to the
146: resource's root directory (excluding the directory itself) --
147: required (may be empty or omitted if the directory is a direct
148: child of the resource's root directory).
149:
150: \item[meta] Additional metadata information about the directory --
151: optional.\\ For a description of additional metadata see below.
152: \end{description}
153:
154: \item[file] Container for the description of a file -- deduced.\\
155: \texttt{file} tags should not be nested in \texttt{dir} tags. Files
156: at lower directory levels are identified by their \texttt{path}.
157:
158: \begin{description}
159: \item[description] An informal textual description of the
160: file -- optional.
161:
162: \item[name] The name of the file -- required.
163:
164: \item[path] The directory path of this file relative to the
165: resource's root directory (excluding the file itself) -- required
166: (may be empty or omitted if the file is in the resource's root
167: directory).
168:
169: \item[modification-date] The file's modification date -- optional.
170:
171: \item[creation-date] The file's creation date -- optional.
172:
173: \item[date] The file's creation date if is has not been modified --
174: optional.
175:
176: \item[size] The file size -- deduced.
177:
178: \item[mime-type] The file's mime-type -- optional.
179:
180: \item[md5cs] MD5 checksum of the file content -- optional.
181:
182: \item[meta] Additional metadata information about the file --
183: optional. For a description of additional metadata see below.
184: \end{description}
185:
186: \end{description}
187:
188:
189:
190: \section{Additional metadata}
191: \label{sec:additional-metadata}
192:
193: All elements with \texttt{meta} tags can contain an arbitrary number
194: of additional metadata elements.
195:
196: \subsection{Language}
197: \label{sec:lang}
198:
199: The language of a resource (e.g. a text) can be specified with a
200: \texttt{lang} tag. Languages have to be described using the
201: international codes for the representation of names of languages
202: either in two-letter form (ISO 639-1) or in three-letter form (ISO
203: 639-2). The entire catalogue of languages is documented on the page
204:
205: \url{http://www.loc.gov/standards/iso639-2/englangn.html}
206:
207:
208: \subsection{DRI}
209: \label{sec:dri}
210:
211: The \emph{digital resource identifier} for the resource is specified
212: in a \texttt{dri} element. Digital resource identifiers are documented
213: on the page
214:
215: \url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.
216:
217:
218:
219: \subsection{Collection context}
220: \label{sec:collection-context}
221:
222: The context of a resource as part of a collection or part of a project can be
223: specified in the \texttt{context} element. All elements in the
224: container can appear multiple times.
225:
226: \begin{description}
227: \item[context] information on collection or project context.
228:
229: \begin{description}
230: \item[link] URL to additional context information.
231:
232: \item[name] Textual description of project or collection.
233: \end{description}
234: \end{description}
235:
236:
237:
238:
239: \subsection{Bibliographic information}
240: \label{sec:bibliographic-data}
241:
242: Bibliographic information is presented in a \texttt{bib} container with
243: a \texttt{type} parameter, giving the type of bibliographic resource.
244: The \texttt{type} field can be repeated as a tag in the container.
245:
246: The format is based on the ECHO scheme for bibliographic data (cf.
247: content workflow), the MPIWG ``Projektbibliografie'' and the format of
248: the commonly used program ``EndNote''.
249:
250:
251: \subsubsection{Book}
252:
253: \begin{description}
254:
255: \item [bib type="book"] a published book.
256:
257: \begin{description}
258: \item [author] The author of the book.
259: \item [year] The year of publication.
260: \item [title] Title of the book.
261: \item [series-editor] Name of the series editor, if the book appears
262: in a series.
263: \item [series-title] Title of the serie, if the book appears in a
264: series.
265: \item [series-volume] Volume number, if the book appears in a
266: series.
267: \item [number-of-pages] Number of pages of the entire book.
268: \item [city] City where the book was published.
269: \item [publisher] Name of the publishing company
270: \item [edition] Edition of the book (e.g. third edition)
271: \item [number-of-volumes] Number of volumes, if the the book is
272: published in multiple volumes.
273: \item [translator] Name of the translator.
274: \item [isbn-issn]
275: \end{description}
276: \end{description}
277:
278: \subsubsection{In Book}
279:
280: \begin{description}
281: \item [bib type="inbook"] an article as part of a book.
282:
283: \begin{description}
284: \item [author] The author of the book.
285: \item [year] The year of publication.
286: \item [title] Title of the article.
287: \item [editor] Name of the book's editor.
288: \item [book-title] Title of the book.
289: \item [series-volume] Volume number, if the book appears in a
290: series.
291: \item [pages] Number of pages of the article.
292: \item [city] City where the book was published.
293: \item [publisher] Name of the publishing company
294: \item [edition] Edition of the book (e. g. third edition)
295: \item [series-author] Name of the series editor, if the book appears
296: in a series.
297: \item [series-title] Title of the series, if the book appears in a
298: series.
299: \item [number-of-volumes] Number of volumes, if the the book is
300: published in multiple volumes.
301: \item [translator] Name of the translator
302: \item [isbn-issn]
303: \end{description}
304: \end{description}
305:
306: \subsubsection{Proceedings}
307:
308: \begin{description}
309: \item [bib type="proceedings"] a conference proceedings publication.
310:
311: \begin{description}
312: \item [author] The author of the article.
313: \item [year] The year of publication.
314: \item [title] Title of the article.
315: \item [editor] Name of the book's editor.
316: \item [conference-name] Name of the conference the proceedings are
317: related to.
318: \item [volume] Volume number.
319: \item [pages] Number of pages of the article.
320: \item [date] Date of the conference the proceedings are related to.
321: \item [conference]-location City where the conference was held.
322: \item [publisher] Name of the publishing company
323: \item [edition] Edition of the book (e. g. third edition)
324: \item [series-editor] Name of the series editor, if the book appears
325: in a series.
326: \item [series-title] Title of the series, if the book appears in a
327: series.
328: \item [number-of-volumes] Number of volumes, if the the book is
329: published as multiple volumes.
330: \item [isbn-issn]
331: \end{description}
332: \end{description}
333:
334: \subsubsection{Edited Book}
335:
336: \begin{description}
337: \item[bib type="edited-book"] a book that is the edition of another
338: work.
339:
340: \begin{description}
341: \item [editor] Name of the editor of the book.
342: \item [year] The year of publication.
343: \item [title] Title of the book.
344: \item [series-editor] Name of the editor of the series the book is
345: part of.
346: \item [series-title] Title of the series, if the book is part of a
347: series.
348: \item [series-volume] Volume number, if the book appears in a series.
349: \item [number-of-pages] Number of pages of the article.
350: \item [city] City where the book was published.
351: \item [publisher] Name of the publishing company
352: \item [edition] Information about the edition (e.g. ``Repr. of the London ed. 1652'')
353: \item [number-of-volumes] Number of volumes, if the the book is
354: published as multiple volumes.
355: \item [isbn-issn]
356: \end{description}
357: \end{description}
358:
359: \subsubsection{Journal Article}
360:
361: \begin{description}
362: \item [bib type="journal-article"] an article in a scientific journal.
363: \begin{description}
364: \item [author] The author of the article.
365: \item [year] The year of publication.
366: \item [title] Title of the article.
367: \item [journal] Name of the journal.
368: \item [volume] Volume number, if the journal appears in a series.
369: \item [issue] Number of the issue the article is part of.
370: \item [pages] Number of pages of the article.
371: \item [alternate-journal] Alternate Journal
372: \item [isbn-issn]
373: \end{description}
374: \end{description}
375:
376: \subsubsection{Magazine Article}
377:
378: \begin{description}
379: \item [bib type="magazine-article"] an article in a popular magazine.
380: \begin{description}
381: \item [author] The author of the book.
382: \item [year] The year of publication.
383: \item [title] Title of the article.
384: \item [magazine] Name of the magazine.
385: \item [volume] Volume number, if the book appears in a series.
386: \item [issue-number] Number of the issue the article is part of.
387: \item [pages Number] of pages of the article.
388: \item [date] Date when the article appeared.
389: \end{description}
390: \end{description}
391:
392: \subsubsection{Newspaper Article}
393:
394: \begin{description}
395: \item [bib type="newspaper-article"] an article in a newspaper.
396: \begin{description}
397: \item [author] The author of the article.
398: \item [year] The year of publication.
399: \item [title] Title of the article.
400: \item [Newspaper] Name of the newspaper the article appeared in.
401: \item [pages] Number of pages of the article.
402: \item [issue-date] Date of the issue the article is part of.
403: \item [city] City of the newspaper.
404: \end{description}
405: \end{description}
406:
407: \subsubsection{Thesis}
408:
409: \begin{description}
410: \item [bib type="thesis"] a master/doctorate/etc. thesis.
411: \begin{description}
412: \item [author] The author of the thesis.
413: \item [year] The year of publication.
414: \item [title] Title of the thesis.
415: \item [academic-department] Name of the academic department where
416: the thesis was handed in.
417: \item [number-of-pages] Number of pages of the thesis.
418: \item [city] City where the thesis was published.
419: \item [University] Name of the university where the thesis was
420: handed in.
421: \item [isbn-issn]
422: \end{description}
423: \end{description}
424:
425: \subsubsection{Report}
426:
427: \begin{description}
428: \item [bib type="report"] a scientific report.
429: \begin{description}
430: \item [author] The author of the report.
431: \item [year] The year of publication.
432: \item [title] Title of the report.
433: \item [pages] Number of pages of the report.
434: \item [date] Date when the report appeared.
435: \item [city] City where the book was published.
436: \item [institution] Institution where the report was produced.
437: \item [type] Type of report.
438: \item [report-number] Report number.
439: \end{description}
440: \end{description}
441:
442: \subsubsection{Manuscript}
443:
444: \begin{description}
445: \item [bib type="manuscript"] a handwritten/typewritten manuscript.
446:
447: \begin{description}
448: \item [title] Title of the manuscript.
449: \item [author] The author of the text.
450: \item [location] Name of the library where the manuscript is
451: currently located.
452: \item [year] The year or century of publication.
453: \item [pages] Number of pages of the manuscript.
454: \item [signature] Signature of the manuscript.
455: \item [editorial-remarks] Remarks related to the online
456: publication of the manuscript. This could be notes about
457: annotations etc.
458: \item [description] This can be any kind of description.
459: \item [keywords] Keywords related to the manuscript.
460: \end{description}
461: \end{description}
462:
463:
464: \subsubsection{Generic}
465:
466: \begin{description}
467: \item [bib type="generic"] a generic bibliographic type. This type
468: should only be used in rare cases.
469: \begin{description}
470: \item [author]
471: \item [year]
472: \item [title]
473: \item [secondary-author]
474: \item [secondary-title]
475: \item [volume]
476: \item [number]
477: \item [pages]
478: \item [date]
479: \item [place-published]
480: \item [publisher]
481: \item [edition]
482: \item [tertiary author]
483: \item [tertiary-title]
484: \item [number-of-volumes]
485: \item [type-of-work]
486: \item [subsidiary author]
487: \item [alternate-title]
488: \item [isbn-issn]
489: \item [call-number]
490: \item [label]
491: \item [keywords]
492: \item [abstract]
493: \item [notes]
494: \item [url]
495: \end{description}
496: \end{description}
497:
498:
499: \subsection{Architectural drawings}
500: \label{sec:doc}
501:
502: Specific information for architectural drawings is presented in a
503: \texttt{doc} container with an additional \texttt{type} attribute
504: giving the type of drawing. All elements inside the container can
505: appear multiple times.
506:
507: \begin{description}
508:
509: \item[doc type="Architectural Drawing"] architectural drawing.
510:
511: \begin{description}
512: \item [person] last name and first name of a person, separated by a
513: comma. A further common name for the person can be put infront,
514: separated by a semicolon.
515: \item [location] Name of a place in its common notation. This can be
516: a city or a institution.
517: \item [date] This can be a year (or several years, separated by
518: commas) or a period (1706-1714). Years are noted with four digits.
519: \item [object] Short description of an object or signatures.
520: \item [keywords] Keywords related to the object.
521: \end{description}
522: \end{description}
523:
524:
525: \subsection{Information on the document structure (table of contents)}
526: \label{sec:toc}
527:
528: Information on the structure of a document like the division into
529: parts and chapters in the way of a table of contents is presented in a
530: \texttt{toc} container.
531:
532: The scheme allows multiple logical pages on a single page image
533: as it is often the case with scanned books or manuscripts. The scheme
534: also allows for ``loose'' numbering schemes with roman, arabic or
535: other page numbers consecutively or mixed and changes in the numbering
536: within the document.
537:
538: The flexibility comes from the fact that no additional assumptions
539: about the mapping between logical pages and page images are made in
540: the format. All mapping information is specified by the user.
541:
542: The logical page numbering or naming that can be presented to the user
543: is specified in the \texttt{name} tags while the physical numbering of
544: the page images is specified in the \texttt{index} or \texttt{url}
545: tags.
546:
547: \begin{description}
548: \item[toc] container for document structure
549:
550: \begin{description}
551: \item[page] describes a single logical page
552:
553: \begin{description}
554: \item[name] the ``name'' of the logical page. This can be any string
555: like a page number (arabic, roman, etc.) or a special designation
556: like ``Table 5''.
557:
558: \item[index] the \texttt{digilib} index number\footnote{The index
559: number for digilib is the index in the alphabetical order of the
560: scan file names.} of the scan image of the page.
561:
562: \item[url] alternatively to the \texttt{digilib} index number the
563: full URL of the scan image of the page can be used.
564: \end{description}
565:
566: \item[chapter] describes a section or chapter of the text.
567: \texttt{chapter} elements can be nested.
568:
569: \begin{description}
570: \item[name] the title of the chapter or section.
571:
572: \item[start] the beginning of a page range (usually the first page
573: of the chapter). The \texttt{start} element has an optional
574: \texttt{increment} attribute to indicate the number of logical
575: pages on a scan image.\footnote{This information is only needed by
576: additional tools that try to generate lists of all page and
577: image numbers.}
578:
579: \begin{description}
580: \item[name] the ``name'' of the first page (see \texttt{page}).
581:
582: \item[index] the index of the first page (see \texttt{page}).
583:
584: \item[url] the URL of the first page (see \texttt{page}).
585: \end{description}
586:
587: \item[end] the end of a page range (usually the last page of the
588: chapter).
589:
590: \begin{description}
591: \item[name] the ``name'' of the last page (see \texttt{page}).
592:
593: \item[index] the index of the last page (see \texttt{page}).
594:
595: \item[url] the URL of the last page (see \texttt{page}).
596: \end{description}
597:
598: \item[page] alternative (and additional) to
599: \texttt{start}/\texttt{end} page ranges single \texttt{page}
600: elements can be used inside \texttt{chapter}.
601: \end{description}
602: \end{description}
603: \end{description}
604:
605: %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}
606:
607:
608: \subsection{Information on scanned images}
609: \label{sec:inform-scann-imag}
610:
611: Image files representing scanned images can have an \texttt{img}
612: container tag with information about the scan resolution and the size
613: of the original image. This information is used by the
614: \texttt{digilib} image viewing tool.
615:
616: Required is one of three possible sets of tags:
617:
618: \begin{description}
619: \item[img] digital image information.
620:
621: \begin{description}
622: \item[original-size-x] The width of the original image. The unit of
623: measure can be contained as parameter \texttt{unit}, the default
624: is meter ``m''. The width to be considered is the total width of
625: the scanned area.
626:
627: \item[original-size-y] The height of the original image.
628:
629: \item[original-pixel-x] The width of the hi-res scan in pixels.
630:
631: \item[original-pixel-y] The height of the hi-res scan in pixels.
632: \end{description}
633: \end{description}
634:
635: or
636:
637: \begin{description}
638: \item[img] digital image information.
639:
640: \begin{description}
641: \item[original-dpi-x] The resolution of the hi-res scan in its width
642: in pixels per inch.
643:
644: \item[original-dpi-y] The resolution of the hi-res scan in its height
645: in pixels per inch.
646: \end{description}
647: \end{description}
648:
649: or
650:
651: \begin{description}
652: \item[img] digital image information.
653:
654: \begin{description}
655: \item[original-dpi] The resolution of the hi-res scan in pixels per
656: inch if the resolutions in width and height are the same.
657: \end{description}
658: \end{description}
659:
660:
661: \subsection{Access restrictions}
662: \label{sec:access-restrictions}
663:
664: If the access to a resource is restricted for technical or legal
665: reasons then the restrictions can be put in a
666: \texttt{access-restrictions} container. The format of the information
667: inside the container has to be further specified.
668:
669:
670: \section{Sample metadata files for ECHO resources}
671:
672: The following is a sample metadata index file for a directory containig a
673: scanned document.
674:
675: \begin{small}
676: \begin{verbatim}
677: <resource type="ECHO">
678: <description>Fleck, 1980</description>
679: <name>fleck.1980</name>
680: <creator>University of Bern</creator>
681: <archive-path>ubern/wiss-theorie</archive-path>
682: <content-type>scanned images</content-type>
683: <meta>
684: <dri>echo23a45e2329x</dri>
685: <lang>ger</lang>
686: <bib type="book">
687: <author>Fleck, Ludwik</author>
688: <year>1980</year>
689: <title>Entstehung und Entwicklung einer
690: wissenschaftlichen Tatsache</title>
691: <series-editor></series-editor>
692: <series-title></series-title>
693: <series-volume></series-volume>
694: <number-of-pages></number-of-pages>
695: <city>Frankfurt am Main</city>
696: <publisher>Suhrkamp</publisher>
697: <edition></edition>
698: <number-of-volumes></number-of-volumes>
699: <translator></translator>
700: <isbn-issn></isbn-issn>
701: <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>
702: <abstract></abstract>
703: </bib>
704: </meta>
705: <dir>
706: <description>Scanned images (300dpi)</description>
707: <name>img</name>
708: </dir>
709: </resource>
710: \end{verbatim}
711: \end{small}
712:
713: The following is a sample metadata file for a single image of an
714: architectural drawing.
715:
716: \begin{small}
717: \begin{verbatim}
718: <resource type="ECHO">
719: <creator>Bibliotheca Hertziana</creator>
720: <content-type>scanned images</content-type>
721: <file>
722: <name>00000271-asl-160-r-full.tif</name>
723: <meta>
724: <img>
725: <original-dpi>315</original-dpi>
726: </img>
727: <dri>echo45a67bc4367d</dri>
728: <lang>ita</lang>
729: <doc type="Architectural Drawing">
730: <person>Ciolli, Giacomo</person>
731: <person>Urban VIII; Barberini, Maffeo</person>
732: <location>Accademia di San Luca</location>
733: <location>Roma</location>
734: <date>1706</date>
735: <object>Concorso Clementino</object>
736: <object>Fontana Pubblica</object>
737: <object>Brunnen</object>
738: <object>ASL 160</object>
739: <keywords></keywords>
740: </doc>
741: <context>
742: <url>http://colosseum.biblhertz.it:8080/Lineamenta/
743: 1033478408.39/1035196181.35/1035196204.09/1035394121.83
744: </url>
745: </context>
746: </meta>
747: </file>
748: </resource>
749: \end{verbatim}
750: \end{small}
751:
752: \end{document}
753:
754: %%% Local Variables:
755: %%% mode: latex
756: %%% TeX-master: t
757: %%% End:
758:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>