version 1.3, 2003/07/01 17:51:40
|
version 1.4, 2003/07/23 10:35:06
|
Line 7
|
Line 7
|
%\usepackage{courier} |
%\usepackage{courier} |
|
|
% create in-text links black (with PDF) |
% create in-text links black (with PDF) |
\usepackage[colorlinks=true,linkcolor=black]{hyperref} |
%\usepackage[colorlinks=true,linkcolor=black]{hyperref} |
% Format URLs nicely (without PDF) |
% Format URLs nicely (without PDF) |
%\usepackage{url} |
\usepackage{url} |
|
|
|
|
\title{A simple metadata format for resource bundles} |
\title{A simple metadata format for resource bundles} |
|
|
\author{Robert Casties, Dirk Wintergrün, Christoph Liess} |
\author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess} |
|
|
\date{V0.2.2 of \today} |
\date{V0.3pre2 of \today} |
|
|
\begin{document} |
\begin{document} |
|
|
Line 35 in filenames are only the alphanumeric s
|
Line 35 in filenames are only the alphanumeric s
|
File and directory paths in the metadata file use the conventional |
File and directory paths in the metadata file use the conventional |
Unix file separator slash ``/''. |
Unix file separator slash ``/''. |
|
|
|
|
|
\section{Metadata files} |
|
\label{sec:metadata-files} |
|
|
|
The metadata information is stored in the XML format documented below |
|
in special files in the resource directory. Two forms of metadata |
|
files are possible: |
|
\begin{itemize} |
|
\item a file named \texttt{index.meta} in a directory. |
|
|
|
\item a file named like the data file it describes with an |
|
additional extension \texttt{.meta}. For example metadata for the |
|
file \texttt{0001.tif} would be in a file \texttt{0001.tif.meta}. |
|
\end{itemize} |
|
|
|
The resource directory must contain an \texttt{index.meta} file with |
|
information about the resource as a whole. Other directories can |
|
contain \texttt{index.meta} files. |
|
|
|
Additional information about single data files that are part of the |
|
resource can either be put in \texttt{file} tags in the |
|
\texttt{index.meta} file or in separate \emph{filename}\texttt{.meta} |
|
files for each data file. Information from the directory level file is |
|
inherited at the file level. |
|
|
|
|
\section{Resource format} |
\section{Resource format} |
\label{sec:mpiwg-doc} |
\label{sec:mpiwg-doc} |
|
|
Line 43 by the provider of the resource and may
|
Line 69 by the provider of the resource and may
|
the metadata file. Elements marked ``required'' must be supplied by |
the metadata file. Elements marked ``required'' must be supplied by |
the provider of the resource. Elements marked ``deduced'' can be |
the provider of the resource. Elements marked ``deduced'' can be |
supplied by the provider of the resource but can also be provided by |
supplied by the provider of the resource but can also be provided by |
automatic scripts later in the process, the elements must be present |
automatic scripts later in the process, these elements must be present |
in the final file. |
in the final file. |
|
|
The outer container is named \texttt{resource}. Sub-types (``ECHO'', |
The outer container element is \texttt{resource}. Sub-types (``ECHO'', |
``MPIWG'') can be specified if necessary with a \texttt{type} |
``MPIWG'') can be specified if necessary with a \texttt{type} |
parameter. Its sub-elements are: |
parameter. Its sub-elements are: |
|
|
Line 60 parameter. Its sub-elements are:
|
Line 86 parameter. Its sub-elements are:
|
\item[creator] The name of the project or person that created the |
\item[creator] The name of the project or person that created the |
resource -- optional. |
resource -- optional. |
|
|
\item[archive-creation-date] The time and date the archive was created |
\item[archive-creation-date] The time and date the archive collection |
-- deduced. |
was created -- deduced. |
|
|
|
\item[archive-storage-date] The time and date the archive was written |
|
to permanent storage -- deduced (must not be set by the user). |
|
|
\item[archive-path] The full path to the resource directory inside the |
\item[archive-path] The full path to the resource directory inside the |
whole archive collection -- deduced. |
whole archive collection -- deduced. |
Line 164 parameter. Its sub-elements are:
|
Line 193 parameter. Its sub-elements are:
|
All elements with \texttt{meta} tags can contain an arbitrary number |
All elements with \texttt{meta} tags can contain an arbitrary number |
of additional metadata elements. |
of additional metadata elements. |
|
|
|
\subsection{Language} |
|
\label{sec:lang} |
|
|
|
The language of a resource (e.g. a text) can be specified with a |
|
\texttt{lang} tag. Languages have to be described using the |
|
international codes for the representation of names of languages |
|
either in two-letter form (ISO 639-1) or in three-letter form (ISO |
|
639-2). The entire catalogue of languages is documented on the page |
|
|
|
\url{http://www.loc.gov/standards/iso639-2/englangn.html} |
|
|
|
|
\subsection{DRI} |
\subsection{DRI} |
\label{sec:dri} |
\label{sec:dri} |
|
|
The \emph{digital resource identifier} for the resource is specified |
The \emph{digital resource identifier} for the resource is specified |
with a \texttt{dri} tag. Digital resource identifiers are documented |
in a \texttt{dri} element. Digital resource identifiers are documented |
on the page |
on the page |
|
|
\url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}. |
\url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}. |
|
|
|
|
|
|
|
\subsection{Collection context} |
|
\label{sec:collection-context} |
|
|
|
The context of a resource as part of a collection or part of a project can be |
|
specified in the \texttt{context} element: |
|
|
|
\begin{description} |
|
\item[link] URL to additional context information. |
|
|
|
\item[name] Textual description of project or collection. |
|
\end{description} |
|
\noindent multiple \texttt{link} or \texttt{name} elements are |
|
possible. |
|
|
|
|
|
|
\subsection{Bibliographic information} |
\subsection{Bibliographic information} |
\label{sec:bibliographic-data} |
\label{sec:bibliographic-data} |
|
|
Line 182 Bibliographic information in the format
|
Line 239 Bibliographic information in the format
|
bibliographic data (cf. content workflow) or the MPIWG |
bibliographic data (cf. content workflow) or the MPIWG |
``Projektbibliografie'' is presented in a \texttt{bib} container with |
``Projektbibliografie'' is presented in a \texttt{bib} container with |
a \texttt{type} parameter, giving the type of bibliographic resource. |
a \texttt{type} parameter, giving the type of bibliographic resource. |
The \texttt{type} field is repeated as a tag in the container. The |
The \texttt{type} field can be repeated as a tag in the container. |
tags have the variable ``human-readable'' field names. |
|
|
|
|
\subsubsection{Book} |
|
|
|
\begin{description} |
|
|
|
\item [bib type="book"] a published book. |
|
|
|
\begin{description} |
|
\item [author] The author of the book. |
|
\item [year] The year of publication. |
|
\item [title] Title of the book. |
|
\item [series-editor] Name of the series editor, if the book appears |
|
in a series. |
|
\item [series-title] Title of the serie, if the book appears in a |
|
series. |
|
\item [series-volume] Volume number, if the book appears in a |
|
series. |
|
\item [number-of-pages] Number of pages of the entire book. |
|
\item [city] City where the book was published. |
|
\item [publisher] Name of the publishing company |
|
\item [edition] Edition of the book (e.g. third edition) |
|
\item [number-of-volumes] Number of volumes, if the the book is |
|
published in multiple volumes. |
|
\item [translator] Name of the translator. |
|
\item [isbn-issn] |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{In Book} |
|
|
|
\begin{description} |
|
\item [bib type="inbook"] an article as part of a book. |
|
|
|
\begin{description} |
|
\item [author] The author of the book. |
|
\item [year] The year of publication. |
|
\item [title] Title of the article. |
|
\item [editor] Name of the book's editor. |
|
\item [book-title] Title of the book. |
|
\item [series-volume] Volume number, if the book appears in a |
|
series. |
|
\item [pages] Number of pages of the article. |
|
\item [city] City where the book was published. |
|
\item [publisher] Name of the publishing company |
|
\item [edition] Edition of the book (e. g. third edition) |
|
\item [series-author] Name of the series editor, if the book appears |
|
in a series. |
|
\item [series-title] Title of the series, if the book appears in a |
|
series. |
|
\item [number-of-volumes] Number of volumes, if the the book is |
|
published in multiple volumes. |
|
\item [translator] Name of the translator |
|
\item [isbn-issn] |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Proceedings} |
|
|
|
\begin{description} |
|
\item [bib type="proceedings"] a conference proceedings publication. |
|
|
|
\begin{description} |
|
\item [author] The author of the article. |
|
\item [year] The year of publication. |
|
\item [title] Title of the article. |
|
\item [editor] Name of the book's editor. |
|
\item [conference-name] Name of the conference the proceedings are |
|
related to. |
|
\item [volume] Volume number. |
|
\item [pages] Number of pages of the article. |
|
\item [date] Date of the conference the proceedings are related to. |
|
\item [conference]-location City where the conference was held. |
|
\item [publisher] Name of the publishing company |
|
\item [edition] Edition of the book (e. g. third edition) |
|
\item [series-editor] Name of the series editor, if the book appears |
|
in a series. |
|
\item [series-title] Title of the series, if the book appears in a |
|
series. |
|
\item [number-of-volumes] Number of volumes, if the the book is |
|
published as multiple volumes. |
|
\item [isbn-issn] |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Edited Book} |
|
|
|
\begin{description} |
|
\item[bib type="edited-book"] a book that is the edition of another |
|
work. |
|
|
|
\begin{description} |
|
\item [editor] Name of the editor of the book. |
|
\item [year] The year of publication. |
|
\item [title] Title of the book. |
|
\item [series-editor] Name of the editor of the series the book is |
|
part of. |
|
\item [series-title] Title of the series, if the book is part of a |
|
series. |
|
\item [series-volume] Volume number, if the book appears in a series. |
|
\item [number-of-pages] Number of pages of the article. |
|
\item [city] City where the book was published. |
|
\item [publisher] Name of the publishing company |
|
\item [edition] Information about the edition (e.g. ``Repr. of the London ed. 1652'') |
|
\item [number-of-volumes] Number of volumes, if the the book is |
|
published as multiple volumes. |
|
\item [isbn-issn] |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Journal Article} |
|
|
|
\begin{description} |
|
\item [bib type="journal-article"] an article in a scientific journal. |
|
\begin{description} |
|
\item [author] The author of the article. |
|
\item [year] The year of publication. |
|
\item [title] Title of the article. |
|
\item [journal] Name of the journal. |
|
\item [volume] Volume number, if the journal appears in a series. |
|
\item [issue] Number of the issue the article is part of. |
|
\item [pages] Number of pages of the article. |
|
\item [alternate-journal] Alternate Journal |
|
\item [isbn-issn] |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Magazine Article} |
|
|
|
\begin{description} |
|
\item [bib type="magazine-article"] an article in a popular magazine. |
|
\begin{description} |
|
\item [author] The author of the book. |
|
\item [year] The year of publication. |
|
\item [title] Title of the article. |
|
\item [magazine] Name of the magazine. |
|
\item [volume] Volume number, if the book appears in a series. |
|
\item [issue-number] Number of the issue the article is part of. |
|
\item [pages Number] of pages of the article. |
|
\item [date] Date when the article appeared. |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Newspaper Article} |
|
|
|
\begin{description} |
|
\item [bib type="newspaper-article"] an article in a newspaper. |
|
\begin{description} |
|
\item [author] The author of the article. |
|
\item [year] The year of publication. |
|
\item [title] Title of the article. |
|
\item [Newspaper] Name of the newspaper the article appeared in. |
|
\item [pages] Number of pages of the article. |
|
\item [issue-date] Date of the issue the article is part of. |
|
\item [city] City of the newspaper. |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Thesis} |
|
|
|
\begin{description} |
|
\item [bib type="thesis"] a master/doctorate/etc. thesis. |
|
\begin{description} |
|
\item [author] The author of the thesis. |
|
\item [year] The year of publication. |
|
\item [title] Title of the thesis. |
|
\item [academic-department] Name of the academic department where |
|
the thesis was handed in. |
|
\item [number-of-pages] Number of pages of the thesis. |
|
\item [city] City where the thesis was published. |
|
\item [University] Name of the university where the thesis was |
|
handed in. |
|
\item [isbn-issn] |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Report} |
|
|
|
\begin{description} |
|
\item [bib type="report"] a scientific report. |
|
\begin{description} |
|
\item [author] The author of the report. |
|
\item [year] The year of publication. |
|
\item [title] Title of the report. |
|
\item [pages] Number of pages of the report. |
|
\item [date] Date when the report appeared. |
|
\item [city] City where the book was published. |
|
\item [institution] Institution where the report was produced. |
|
\item [type] Type of report. |
|
\item [report-number] Report number. |
|
\end{description} |
|
\end{description} |
|
|
|
\subsubsection{Generic} |
|
|
|
\begin{description} |
|
\item [bib type="generic"] a generic bibliographic type. This type |
|
should only be used in rare cases. |
|
\begin{description} |
|
\item [author] |
|
\item [year] |
|
\item [title] |
|
\item [secondary-author] |
|
\item [secondary-title] |
|
\item [volume] |
|
\item [number] |
|
\item [pages] |
|
\item [date] |
|
\item [place-published] |
|
\item [publisher] |
|
\item [edition] |
|
\item [tertiary author] |
|
\item [tertiary-title] |
|
\item [number-of-volumes] |
|
\item [type-of-work] |
|
\item [subsidiary author] |
|
\item [alternate-title] |
|
\item [isbn-issn] |
|
\item [call-number] |
|
\item [label] |
|
\item [keywords] |
|
\item [abstract] |
|
\item [notes] |
|
\item [url] |
|
\end{description} |
|
\end{description} |
|
|
|
|
|
\subsection{Architectural drawings} |
|
\label{sec:doc} |
|
|
|
Specific information for architectural drawings is presented in a |
|
\texttt{doc} container. All elements can appear multiple times. |
|
|
|
\begin{description} |
|
\item [person] last name and first name of a person, separated by a |
|
comma. A further common name for the person can be put infront, |
|
separated by a semicolon. |
|
\item [location] Name of a place in its common notation. This can |
|
be a city or a institution. |
|
\item [date] This can be a year (or several years, separated by commas) or a period |
|
(1706-1714). Years are noted with four digits. |
|
\item [object] Short description of an object or signatures. |
|
\item [keywords] Keywords related to the object. |
|
\end{description} |
|
|
|
|
\subsection{Information on the document structure (table of contents)} |
\subsection{Information on the document structure (table of contents)} |
\label{sec:toc} |
\label{sec:toc} |
|
|
Document structure information like a table of contents for a scanned |
Information on the structure of a document like the division into |
document is presented in a \texttt{toc} container. The format to be |
parts and chapters in the way of a table of contents is presented in a |
used has to be further specified. The format could be based on the so |
\texttt{toc} container. |
called ``LiSe-XML'' format. For a detailed description and an |
|
exemplary set of TOC information see: |
The scheme allows multiple logical pages on a single page image |
|
as it is often the case with scanned books or manuscripts. The scheme |
|
also allows for ``loose'' numbering schemes with roman, arabic or |
|
other page numbers consecutively or mixed and changes in the numbering |
|
within the document. |
|
|
|
The flexibility comes from the fact that no additional assumptions |
|
about the mapping between logical pages and page images are made in |
|
the format. All mapping information is specified by the user. |
|
|
|
The logical page numbering or naming that can be presented to the user |
|
is specified in the \texttt{name} tags while the physical numbering of |
|
the page images is specified in the \texttt{index} or \texttt{url} |
|
tags. |
|
|
\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} |
\begin{description} |
|
\item[page] describes a single logical page |
|
\begin{description} |
|
\item[name] the ``name'' of the logical page. This can be any string |
|
like a page number (arabic, roman, etc.) or a special designation |
|
like ``Table 5''. |
|
|
|
\item[index] the \texttt{digilib} index number\footnote{The index |
|
number for digilib is the index in the alphabetical order of the |
|
scan file names.} of the scan image of the page. |
|
|
|
\item[url] alternatively to the \texttt{digilib} index number the |
|
full URL of the scan image of the page can be used. |
|
\end{description} |
|
|
\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TSlise/lise_downloads/deimel1929.xml} |
\item[chapter] describes a section or chapter of the text. |
|
\texttt{chapter} elements can be nested. |
|
\begin{description} |
|
\item[name] the title of the chapter or section. |
|
|
|
\item[start] the beginning of a page range (usually the first page |
|
of the chapter). The \texttt{start} element has an optional |
|
\texttt{increment} attribute to indicate the number of logical |
|
pages on a scan image.\footnote{This information is only needed by |
|
additional tools that try to generate lists of all page and |
|
image numbers.} |
|
\begin{description} |
|
\item[name] the ``name'' of the first page (see \texttt{page}). |
|
|
|
\item[index] the index of the first page (see \texttt{page}). |
|
|
|
\item[url] the URL of the first page (see \texttt{page}). |
|
\end{description} |
|
|
|
\item[end] the end of a page range (usually the last page of the |
|
chapter). |
|
\begin{description} |
|
\item[name] the ``name'' of the last page (see \texttt{page}). |
|
|
|
\item[index] the index of the last page (see \texttt{page}). |
|
|
|
\item[url] the URL of the last page (see \texttt{page}). |
|
\end{description} |
|
|
|
\item[page] alternative (and additional) to |
|
\texttt{start}/\texttt{end} page ranges single \texttt{page} |
|
elements can be used inside \texttt{chapter}. |
|
\end{description} |
|
\end{description} |
|
|
|
%%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} |
|
|
|
|
\subsection{Information on scanned images} |
\subsection{Information on scanned images} |
Line 250 reasons then the restrictions can be put
|
Line 612 reasons then the restrictions can be put
|
inside the container has to be further specified. |
inside the container has to be further specified. |
|
|
|
|
\section{Sample metadata file for an ECHO resource} |
\section{Sample metadata files for ECHO resources} |
|
|
The following is the sample structure for a scanned document resource. |
|
|
|
|
The following is a sample structures for a scanned document. |
\begin{verbatim} |
\begin{verbatim} |
<resource type="ECHO"> |
<resource type="ECHO"> |
<description></description> |
<description>Fleck, 1980</description> |
<name>fleck.1980</name> |
<name>fleck.1980</name> |
<creator>University of Bern</creator> |
<creator>University of Bern</creator> |
<archive-creation-date></archive-creation-date> |
|
<archive-path>ubern/wiss-theorie</archive-path> |
<archive-path>ubern/wiss-theorie</archive-path> |
<content-type>scanned images</content-type> |
<content-type>scanned images</content-type> |
<meta> |
<meta> |
<dri>echo23a45e2329x</dri> |
<dri>echo23a45e2329x</dri> |
|
<lang>ger</lang> |
<bib type="book"> |
<bib type="book"> |
<author>Fleck, Ludwik</author> |
<author>Fleck, Ludwik</author> |
<year>1980</year> |
<year>1980</year> |
<title>Entstehung und Entwicklung einer |
<title>Entstehung und Entwicklung einer |
wissenschaftlichen Tatsache</title> |
wissenschaftlichen Tatsache</title> |
<series_editor></series_editor> |
<series-editor></series-editor> |
<series_title></series_title> |
<series-title></series-title> |
<series_volume></series_volume> |
<series-volume></series-volume> |
<number_of_pages></number_of_pages> |
<number-of-pages></number-of-pages> |
<city>Frankfurt am Main</city> |
<city>Frankfurt am Main</city> |
<publisher>Suhrkamp</publisher> |
<publisher>Suhrkamp</publisher> |
<edition></edition> |
<edition></edition> |
<number_of_volumes></number_of_volumes> |
<number-of-volumes></number-of-volumes> |
<translator></translator> |
<translator></translator> |
<isbn></isbn> |
<isbn></isbn> |
<keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords> |
<keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords> |
Line 286 The following is the sample structure fo
|
Line 647 The following is the sample structure fo
|
<dir> |
<dir> |
<description>Scanned images (300dpi)</description> |
<description>Scanned images (300dpi)</description> |
<name>img</name> |
<name>img</name> |
<path></path> |
|
<meta></meta> |
|
</dir> |
</dir> |
</resource> |
</resource> |
\end{verbatim} |
\end{verbatim} |
|
|
|
The following is a sample metadata structure for an architectural |
|
drawing. |
|
|
|
\begin{verbatim} |
|
<resource type="ECHO"> |
|
<creator>Bibliotheca Hertziana</creator> |
|
<content-type>scanned images</content-type> |
|
<file> |
|
<name>00000271-asl-160-r-full.tif</name> |
|
<meta> |
|
<img> |
|
<original-dpi>315</original-dpi> |
|
</img> |
|
<dri>echo45a67bc4367d</dri> |
|
<lang>ita</lang> |
|
<doc type="Architectural Drawing"> |
|
<person>Ciolli, Giacomo</person> |
|
<person>Urban VIII; Barberini, Maffeo</person> |
|
<location>Accademia di San Luca</location> |
|
<location>Roma</location> |
|
<date>1706</date> |
|
<object>Concorso Clementino</object> |
|
<object>Fontana Pubblica</object> |
|
<object>Brunnen</object> |
|
<object>ASL 160</object> |
|
<keywords></keywords> |
|
</doc> |
|
<collection-context> |
|
<url>http://colosseum.biblhertz.it:8080/Lineamenta/ |
|
1033478408.39/1035196181.35/1035196204.09/1035394121.83 |
|
</url> |
|
</collection-context> |
|
</meta> |
|
</file> |
|
</resource> |
|
\end{verbatim} |
|
|
\end{document} |
\end{document} |
|
|
%%% Local Variables: |
%%% Local Variables: |