Diff for /storage/meta/meta-format.tex between versions 1.3 and 1.21

version 1.3, 2003/07/01 17:51:40 version 1.21, 2010/01/28 17:51:15
Line 14 Line 14
   
 \title{A simple metadata format for resource bundles}  \title{A simple metadata format for resource bundles}
   
 \author{Robert Casties, Dirk Wintergrün, Christoph Liess}  \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess}
   
 \date{V0.2.2 of \today}  \date{V1.3.5 of 28.1.2010}
   
 \begin{document}  \begin{document}
   
Line 32  File and directory names should not cont Line 32  File and directory names should not cont
 in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen  in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen
 ``-'', underscore ``\_'' and dot ``.''.  ``-'', underscore ``\_'' and dot ``.''.
   
 File and directory paths in the metadata file use the conventional  Files and directories with names that contain illegal characters must
 Unix file separator slash ``/''.  be transformed to allowed names. A proposition for a simple
   transformation rule is
   
   \begin{itemize}
   \item whitespace characters (e.g. blank, tab, cr, lf) are replaced by
     hyphens ``-''
   
   \item other illegal characters are replaced by underscores ``\_''.
   \end{itemize}
   
   This rule does not provide a reversible mapping to the original
   illegal file name and it does not provide a collision-free mapping,
   i.e. two different illegal file names might be mapped to the same
   allowed file name. Additional precautions for these cases must be
   taken.
   
   
   \section{Metadata files}
   \label{sec:metadata-files}
   
   The metadata information is stored in the XML format documented below
   in special files in the resource directory. Two forms of metadata
   files are possible:
   \begin{itemize}
   \item a file named \texttt{index.meta} in a directory.
   
   \item a file with the same name as the data file it describes and an
     additional extension \texttt{.meta}. For example metadata for the
     file \texttt{p0001.tif} would be in a file \texttt{p0001.tif.meta}.
   \end{itemize}
   
   The resource directory must contain an \texttt{index.meta} file with
   information about the resource as a whole. Subdirectories can
   contain additional \texttt{index.meta} files.
   
   Additional information about single data files that are part of the
   resource can either be put in \texttt{file} tags in the
   \texttt{index.meta} file or in separate \emph{filename}\texttt{.meta}
   files for each data file. Information from the directory level file is
   inherited at the file level when it is not overwritten.
   
   
 \section{Resource format}  \section{Resource format}
 \label{sec:mpiwg-doc}  \label{sec:mpiwg-doc}
Line 43  by the provider of the resource and may Line 83  by the provider of the resource and may
 the metadata file. Elements marked ``required'' must be supplied by  the metadata file. Elements marked ``required'' must be supplied by
 the provider of the resource. Elements marked ``deduced'' can be  the provider of the resource. Elements marked ``deduced'' can be
 supplied by the provider of the resource but can also be provided by  supplied by the provider of the resource but can also be provided by
 automatic scripts later in the process, the elements must be present  automatic scripts later in the process, these elements must be present
 in the final file.  in the final file.
   
 The outer container is named \texttt{resource}. Sub-types (``ECHO'',  File and directory paths in the metadata file use the conventional
 ``MPIWG'') can be specified if necessary with a \texttt{type}  Unix file separator slash ``/''.
 parameter. Its sub-elements are:  
   The outer container element is \texttt{resource}. It has the following
   \textbf{attributes}:
   
 \begin{description}  \begin{description}
 \item[description] An informal textual description of the  \item[type] sub-type of resource (e.g. ``ECHO'', ``MPIWG'') --
   resource -- optional.    optional.
     
   \item[version] version number of metadata format (currently 1.2) --
     required.
   \end{description}
   
   \noindent The allowed \textbf{elements} inside \texttt{resource} are:
   
   \begin{description}
   \item[description] An informal textual description of the resource --
     optional\footnote{At least one description of the resource's content
       is required. The description can be an informal
       \texttt{description} element or a descriptive element (like
       \texttt{bib}) in a \texttt{meta} container.}.
   
 \item[name] The filename of the resource (name of the directory this  \item[name] The filename of the resource (name of the directory this
   file is contained in) -- required.    file is contained in) -- required.
Line 60  parameter. Its sub-elements are: Line 115  parameter. Its sub-elements are:
 \item[creator] The name of the project or person that created the  \item[creator] The name of the project or person that created the
   resource -- optional.    resource -- optional.
   
 \item[archive-creation-date] The time and date the archive was created  \item[archive-creation-date] The time and date the archive collection
   -- deduced.    was created -- deduced.
   
   \item[archive-storage-date] The time and date the archive was written
     to permanent storage -- deduced (must not be set by the user).
   
 \item[archive-path] The full path to the resource directory inside the  \item[archive-path] The full path to the resource directory inside the
   whole archive collection -- deduced.    whole archive collection, including the resource directory -- deduced.
   
   \item[archive-id] The ID for this document in the archive --
     optional.
       
 \item[derived-from] Container for the description of the original  \item[derived-from] Container for the description of the original
   resource if this resource is a modified version of another resource    resource if this resource is a modified version of another resource
   -- optional.    -- optional.
   
   \begin{description}    \begin{description}
     \item[archive-id] The ID of the original resource
       -- required (or archive-path).
   
   \item[archive-path] The full path to the original resource    \item[archive-path] The full path to the original resource
     --required.      -- required (or archive-id).
   
     \item[description] An informal textual description of the relation
     of this resource to the original resource -- optional.
     \end{description}
     
   \item[used-by] Container for the description of modified resources
     if this resource is the source of another resource
     -- optional.
   
     \begin{description}
     \item[archive-id] The ID of the derived resource
       -- required (or archive-path).
   
     \item[archive-path] The full path to the derived resource
       -- required (or archive-id).
   
   \item[description] An informal textual description of the relation    \item[description] An informal textual description of the relation
   of this resource to the original resource -- optional.    of this resource to the original resource -- optional.
Line 83  parameter. Its sub-elements are: Line 162  parameter. Its sub-elements are:
   -- optional.    -- optional.
   
   \begin{description}    \begin{description}
     \item[archive-id] The ID of the linked resource
       -- required (or archive-path).
   
   \item[archive-path] The full path to the linked resource    \item[archive-path] The full path to the linked resource
     --required.      -- required (or archive-id).
   
   \item[description] An informal textual description of the relation    \item[description] An informal textual description of the relation
   of this resource to the linked resource -- optional.    of this resource to the linked resource -- optional.
   \end{description}    \end{description}
       
 \item[content-type] The content type of this resource -- required.\\  \item[media-type] \label{tag-media-type} The main media type of this
   The content type enables the choice of tools to manipulate and    resource -- required.\\ The main media type can be overridden by
   display the resource. There should be a common list of content    \texttt{media-type}s in subdirectories. Possible types are
   types. For digital documents (books, manuscripts) this would be    \begin{itemize}
   "scanned document", for other image data "scanned    \item \texttt{image}
   images".\footnote{The criterion for documents is a ordered  
     succession of image files (pages) and equal image size and    \item \texttt{text}
     resolution throughout the images of a resource.}  
     \item \texttt{audio}
   
     \item \texttt{video}
   
     \item \texttt{data} for other type of data
     \end{itemize}
       
 \item[meta] Additional metadata information about the resource --  \item[meta] Additional metadata information about the resource --
   optional.\\ For a description of additional metadata see below.    optional.\\ For a description of additional metadata see below.
Line 113  parameter. Its sub-elements are: Line 201  parameter. Its sub-elements are:
   
   \item[name] The name of the subdirectory -- required.    \item[name] The name of the subdirectory -- required.
           
     \item[original-name] A text string associated with the directory as
       original name -- optional. (E.g. if the data in this directory
       came from an external source and had a name that had to be changed
       according to section~\ref{sec:file-directory-names} but it should
       be possible to reference the original name.)
       
   \item[path] The directory path of this subdirectory relative to the    \item[path] The directory path of this subdirectory relative to the
     resource's root directory (containing the directory itself) --      resource's root directory (excluding the directory itself) --
     required (may be identical to \texttt{name} or omitted if the      required (may be empty or omitted if the directory is a direct
     directory is a direct child of the resource's root directory).      child of the resource's root directory).
           
   \item[meta] Additional metadata information about the directory --    \item[meta] Additional metadata information about the directory --
     optional.\\ For a description of additional metadata see below.      optional.\\ For a description of additional metadata see below.
Line 132  parameter. Its sub-elements are: Line 226  parameter. Its sub-elements are:
   
   \item[name] The name of the file -- required.    \item[name] The name of the file -- required.
           
     \item[original-name] A text string associated with the file as
       original name -- optional. (e.g. if this file came from an
       external source and had a name that had to be changed according to
       section~\ref{sec:file-directory-names} it is possible
       to preserve the original name.)
       
   \item[path] The directory path of this file relative to the    \item[path] The directory path of this file relative to the
     resource's root directory (containing the file itself) -- required      resource's root directory (excluding the file itself) -- required
     (may be identical to \texttt{name} or omitted if the file is in the      (may be empty or omitted if the file is in the resource's root
     resource's root directory).      directory).
       
     \item[date] The file's modification or creation date\footnote{The
         preferred time and date format is ``YYYY/MM/DD HH:MM:SS''},
       whichever is more recent -- optional.
   
   \item[modification-date] The file's modification date -- optional.    \item[modification-date] The file's modification date -- optional.
   
   \item[creation-date] The file's creation date -- optional.    \item[creation-date] The file's creation date -- optional.
   
   \item[date] The file's creation date if is has not been modified --  
     optional.  
   
   \item[size] The file size -- deduced.    \item[size] The file size -- deduced.
           
   \item[mime-type] The file's mime-type -- optional.    \item[mime-type] The file's mime-type -- optional.
Line 162  parameter. Its sub-elements are: Line 263  parameter. Its sub-elements are:
 \label{sec:additional-metadata}  \label{sec:additional-metadata}
   
 All elements with \texttt{meta} tags can contain an arbitrary number  All elements with \texttt{meta} tags can contain an arbitrary number
 of additional metadata elements.  of the following additional metadata elements.
   
   \subsection{Workflow state}
   \label{sec:workflow-state}
   
   All additional metadata elements can have a \texttt{workflow-state}
   \textbf{attribute}. This attribute reflects the state of the
   corresponding metadata element. The possible values for the
   \texttt{workflow-state} attribute are
   \begin{itemize}
   \item \texttt{preliminary} this information is preliminary. It must
     be checked in further workflow steps.
   
   \item \texttt{inwork}
   
   \item \texttt{final}
   \end{itemize}
   
   workflow states other than \texttt{preliminary} are part of the
   workflow handling of the respective projects.
   
   Metadata elements can appear multiple times with different
   \texttt{workflow-state} attributes. This enables metadata versioning.
   
   
   
   \subsection{Content type}
   \label{sec:content-type}
   
   \begin{description}
   \item[content-type] \label{tag-content-type} The content type of this
     resource -- required.\\
     The content type enables the choice of tools to manipulate and
     display the resource. There should be a common list of content
     types. For digital documents (books, manuscripts) this would be
     "scanned document", for other image data "scanned
     images".\footnote{The criterion for documents is a ordered
       succession of image files (pages) and equal image size and
       resolution throughout the images of a resource.}
   \end{description}  
   
   
   
   \subsection{Language}
   \label{sec:lang}
   
   The language of a resource (e.g. a text) can be specified with a
   \texttt{lang} tag. Languages have to be described using the
   international codes for the representation of names of languages
   either in two-letter form (ISO 639-1) or in three-letter form (ISO
   639-2).  The entire catalogue of languages is documented on the page
   
   \url{http://www.loc.gov/standards/iso639-2/englangn.html}
   
   
 \subsection{DRI}  \subsection{DRI}
 \label{sec:dri}  \label{sec:dri}
   
 The \emph{digital resource identifier} for the resource is specified  The \emph{digital resource identifier} for the resource is specified
 with a \texttt{dri} tag. Digital resource identifiers are documented  in a \texttt{dri} element. Digital resource identifiers are documented
 on the page  on the page
   
 \url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.  \url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.
   
   
   
   \subsection{Collection context}
   \label{sec:collection-context}
   
   The context of a resource as part of a collection or part of a project
   can be specified in the \texttt{context} element. The context element
   can appear multiple times if the resource is part of multiple
   collections or projects.
   
   \begin{description}
   \item[context] information on collection or project context.
   
     \begin{description}
     \item[link] URL to additional context information -- optional.
       
     \item[name] Textual description of project or collection -- optional.
   
     \item[meta-datalink] description of external sources of canonical meta
       information -- optional
       \begin{description}
       \item[db] \textbf{attribute} to identify different sets of meta data
         links to the same resource -- optional
   
       \item[object] \textbf{attribute} to identify different objects or
         parts of the same resource -- optional
   
       \item[label] textual label for the link -- optional
   
       \item[url] URL to present to the client -- optional
   
       \item[metadata-url] URL to an external server to be queried -- optional
       \end{description}
   
     \item[meta-baselink] description of external server for canonical meta
       information -- optional
       \begin{description}
       \item[db] \textbf{attribute} to identify different sets of meta data
         links to the same resource -- optional
   
       \item[label] textual label for the link -- optional
   
       \item[url] URL to present to the client -- optional
         
       \item[metadata-url] URL to an external server to be queried --
         required (the parameter \texttt{object=} with an object id has
         to be appended to this URL)
       \end{description}
     \end{description}
   \end{description}
   
   
   
   
 \subsection{Bibliographic information}  \subsection{Bibliographic information}
 \label{sec:bibliographic-data}  \label{sec:bibliographic-data}
   
 Bibliographic information in the format of the ECHO scheme for  Bibliographic information is presented in a \texttt{bib} container with
 bibliographic data (cf. content workflow) or the MPIWG  
 ``Projektbibliografie'' is presented in a \texttt{bib} container with  
 a \texttt{type} parameter, giving the type of bibliographic resource.  a \texttt{type} parameter, giving the type of bibliographic resource.
 The \texttt{type} field is repeated as a tag in the container. The  The \texttt{type} field can be repeated as a tag in the container.
 tags have the variable ``human-readable'' field names.  
   The format is based on the ECHO scheme for bibliographic data (cf.
   content workflow), the MPIWG ``Projektbibliografie'' and the format of
   the commonly used program ``EndNote''.
   
   
   \subsubsection{Book}
   
   \begin{description}
   
   \item [bib type="book"] a published book.
   
     \begin{description}
     \item [author] The author of the book.
     \item [year] The year of publication.
     \item [title] Title of the book.
     \item [series-editor] Name of the series editor, if the book appears
       in a series.
     \item [series-title] Title of the serie, if the book appears in a
       series.
     \item [series-volume] Volume number, if the book appears in a
       series.
     \item [number-of-pages] Number of pages of the entire book.
     \item [city] City where the book was published.
     \item [publisher] Name of the publishing company
     \item [edition] Edition of the book (e.g. third edition)
     \item [number-of-volumes] Number of volumes, if the the book is
       published in multiple volumes.
     \item [translator] Name of the translator.
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{In Book}
   
   \begin{description}
   \item [bib type="inbook"] an article as part of a book.
   
     \begin{description}
     \item [author] The author of the book.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [editor] Name of the book's editor.
     \item [book-title] Title of the book.
     \item [series-volume] Volume number, if the book appears in a
       series.
     \item [pages] Number of pages of the article.
     \item [city] City where the book was published.
     \item [publisher] Name of the publishing company
     \item [edition] Edition of the book (e. g. third edition)
     \item [series-author] Name of the series editor, if the book appears
       in a series.
     \item [series-title] Title of the series, if the book appears in a
       series.
     \item [number-of-volumes] Number of volumes, if the the book is
       published in multiple volumes.
     \item [translator] Name of the translator
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Proceedings}
   
   \begin{description}
   \item [bib type="proceedings"] a conference proceedings publication.
   
     \begin{description}
     \item [author] The author of the article.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [editor] Name of the book's editor.
     \item [conference-name] Name of the conference the proceedings are
       related to.
     \item [volume] Volume number.
     \item [pages] Number of pages of the article.
     \item [date] Date of the conference the proceedings are related to.
     \item [conference]-location City where the conference was held.
     \item [publisher] Name of the publishing company
     \item [edition] Edition of the book (e. g. third edition)
     \item [series-editor] Name of the series editor, if the book appears
       in a series.
     \item [series-title] Title of the series, if the book appears in a
       series.
     \item [number-of-volumes] Number of volumes, if the the book is
       published as multiple volumes.
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Edited Book}
   
   \begin{description}
   \item[bib type="edited-book"] a book that is the edition of another
     work.
   
     \begin{description}
     \item [editor] Name of the editor of the book.
     \item [year] The year of publication.
     \item [title] Title of the book.
     \item [series-editor] Name of the editor of the series the book is
       part of.
     \item [series-title] Title of the series, if the book is part of a
       series.
     \item [series-volume] Volume number, if the book appears in a series.
     \item [number-of-pages] Number of pages of the article.
     \item [city] City where the book was published.
     \item [publisher] Name of the publishing company
     \item [edition] Information about the edition (e.g. ``Repr. of the London ed. 1652'')
     \item [number-of-volumes] Number of volumes, if the the book is
       published as multiple volumes.
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Journal Volume}
   
   \begin{description}
   \item [bib type="journal-volume"] a volume of a scientific journal.
     \begin{description}
     \item [title] Name of the journal.
     \item [editor] The editor of the journal.
     \item [publisher] Name of the publishing company.
     \item [city] City where the journal is published.
     \item [year] The year of publication.
     \item [volume] Volume number.
     \item [numer-of-pages] Number of pages of the volume.
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Journal Article}
   
   \begin{description}
   \item [bib type="journal-article"] an article in a scientific journal.
     \begin{description}
     \item [author] The author of the article.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [journal] Name of the journal.
     \item [volume] Volume number, if the journal appears in a series.
     \item [issue] Number of the issue the article is part of.
     \item [pages] Number of pages of the article.
     \item [alternate-journal] Alternate Journal
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Magazine Article}
   
   \begin{description}
   \item [bib type="magazine-article"] an article in a popular magazine.
     \begin{description}
     \item [author] The author of the book.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [magazine] Name of the magazine.
     \item [volume] Volume number, if the book appears in a series.
     \item [issue-number] Number of the issue the article is part of.
     \item [pages Number] of pages of the article.
     \item [date] Date when the article appeared.
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Newspaper Article}
   
   \begin{description}
   \item [bib type="newspaper-article"] an article in a newspaper.
     \begin{description}
     \item [author] The author of the article.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [Newspaper] Name of the newspaper the article appeared in.
     \item [pages] Number of pages of the article.
     \item [issue-date] Date of the issue the article is part of.
     \item [city] City of the newspaper.
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
 \subsection{Information on the document structure (table of contents)}  \subsubsection{Thesis}
   
   \begin{description}
   \item [bib type="thesis"] a master/doctorate/etc. thesis.
     \begin{description}
     \item [author] The author of the thesis.
     \item [year] The year of publication.
     \item [title] Title of the thesis.
     \item [academic-department] Name of the academic department where
       the thesis was handed in.
     \item [number-of-pages] Number of pages of the thesis.
     \item [city] City where the thesis was published.
     \item [University] Name of the university where the thesis was
       handed in.
     \item [isbn-issn]
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Report}
   
   \begin{description}
   \item [bib type="report"] a scientific report.
     \begin{description}
     \item [author] The author of the report.
     \item [year] The year of publication.
     \item [title] Title of the report.
     \item [pages] Number of pages of the report.
     \item [date] Date when the report appeared.
     \item [city] City where the book was published.
     \item [institution] Institution where the report was produced.
     \item [type] Type of report.
     \item [report-number] Report number.
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   \subsubsection{Manuscript}
   
   \begin{description}
   \item [bib type="manuscript"] a handwritten/typewritten manuscript.
   
     \begin{description}
     \item [title] Title of the manuscript.
     \item [author] The author of the text.
     \item [location] Name of the library where the manuscript is
       currently located.
     \item [year] The year or century of publication.
     \item [pages] Number of pages of the manuscript.
     \item [signature] Signature of the manuscript.
     \item [editorial-remarks] Remarks related to the online
       publication of the manuscript. This could be notes about
       annotations etc.
     \item [description] This can be any kind of description.
     \item [keywords] Keywords related to the manuscript.
     \item[call-number] Call number in holding library
     \item[holding-library] Holding library
     \end{description}
   \end{description}
   
   
   \subsubsection{Correspondence}
   
   \begin{description}
   \item [bib type="correspondence"] a piece of correspondence e.g. letter, telegram, in the following called ``letter''
   
     \begin{description}
     \item[type] The type of correspondence, e.g. ``letter'', ``postcard'', ``telegram'', ``letter draft''
     \item [author] The author/sender of the letter.
     \item [recipient] The recipient of the letter.
     \item [date] normalised date of the letter.
     \item [date-range-end] end of range of uncertain dating -- optional.
     \item [date-original] the date in its original form as noted on the letter -- optional.
     \item [place] place where the letter was written/sent.
     \item [title] Title of the letter -- optional.
     \item[incipit] The opening phrase of the letter -- optional.
     \item[excipit] The closing phrase of the letter -- optional.
     \item [pages] Number of pages of the manuscript.
     \item [signature] Canonical signature/call number of the manuscript.
     \item [description] This can be any kind of description.
     \item [keywords] Keywords related to the manuscript.
     \item[call-number] Call number in the current holding library
     \item[holding-library] current holding library
     \end{description}
   \end{description}
   
   
   \subsubsection{Generic}
   
   \begin{description}
   \item [bib type="generic"] a generic bibliographic type. This type
     should only be used in rare cases.
     \begin{description}
     \item [author]
     \item [year]
     \item [title]
     \item [secondary-author]
     \item [secondary-title]
     \item [volume]
     \item [number]
     \item [pages]
     \item [date]
     \item [place-published]
     \item [publisher]
     \item [edition]
     \item [tertiary author]
     \item [tertiary-title]
     \item [number-of-volumes]
     \item [type-of-work]
     \item [subsidiary author]
     \item [alternate-title]
     \item [isbn-issn]
     \item [call-number]
     \item [label]
     \item [keywords]
     \item [abstract]
     \item [notes]
     \item [url]
     \end{description}
   \end{description}
   
   
   \subsection{Architectural drawings}
   \label{sec:doc}
   
   Specific information for architectural drawings is presented in a
   \texttt{doc} container with an additional \texttt{type} attribute
   giving the type of drawing. All elements inside the container can
   appear multiple times.
   
   \begin{description}
   
   \item[doc type="Architectural Drawing"] architectural drawing.
   
     \begin{description}
     \item [person] last name and first name of a person, separated by a
       comma. A further common name for the person can be put infront,
       separated by a semicolon.
     \item [location] Name of a place in its common notation. This can be
       a city or a institution.
     \item [date] This can be a year (or several years, separated by
       commas) or a period (1706-1714). Years are noted with four digits.
     \item [object] Short description of an object or signatures.
     \item [keywords] Keywords related to the object.
   \end{description}
   \end{description}
   
   
   \subsection{Document structure (table of contents)}
 \label{sec:toc}  \label{sec:toc}
   
 Document structure information like a table of contents for a scanned  Information on the structure of a document like the division into
 document is presented in a \texttt{toc} container. The format to be  parts and chapters in the way of a table of contents is presented in a
 used has to be further specified. The format could be based on the so  \texttt{toc} container. 
 called ``LiSe-XML'' format. For a detailed description and an  
 exemplary set of TOC information see:  The scheme allows multiple logical pages on a single page image
   as it is often the case with scanned books or manuscripts. The scheme
   also allows for ``loose'' numbering schemes with roman, arabic or
   other page numbers consecutively or mixed and changes in the numbering
   within the document.
   
   The flexibility comes from the fact that no additional assumptions
   about the mapping between logical pages and page images are made in
   the format. All mapping information is specified by the user.
   
   The logical page numbering or naming that can be presented to the user
   is specified in the \texttt{name} tags while the physical numbering of
   the page images is specified in the \texttt{index} or \texttt{url}
   tags.
   
 \url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}  \begin{description}
   \item[toc] container for document structure
   
     \begin{description}
     \item[page] describes a single logical page
   
 \url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TSlise/lise_downloads/deimel1929.xml}      \begin{description}
       \item[name] the ``name'' of the logical page. This can be any string
         like a page number (arabic, roman, etc.) or a special designation
         like ``Table 5''.
         
       \item[index] the \texttt{digilib} index number\footnote{The index
           number for digilib is the index in the alphabetical order of the
           scan file names.} of the scan image of the page.
         
       \item[url] alternatively to the \texttt{digilib} index number the
         full URL of the scan image of the page can be used.
       \end{description}
   
     \item[chapter] describes a section or chapter of the text.
       \texttt{chapter} elements can be nested.
   
       \begin{description}
       \item[name] the title of the chapter or section.
         
       \item[start] the beginning of a page range (usually the first page
         of the chapter). The \texttt{start} element has an optional
         \texttt{increment} attribute to indicate the number of logical
         pages on a scan image.\footnote{This information is only needed by
           additional tools that try to generate lists of all page and
           image numbers.}
   
         \begin{description}
         \item[name] the ``name'' of the first page (see \texttt{page}).
   
 \subsection{Information on scanned images}        \item[index] the index of the first page (see \texttt{page}).
           
         \item[url] the URL of the first page (see \texttt{page}).
         \end{description}
         
       \item[end] the end of a page range (usually the last page of the
         chapter).
   
         \begin{description}
         \item[name] the ``name'' of the last page (see \texttt{page}).
           
         \item[index] the index of the last page (see \texttt{page}).
           
         \item[url] the URL of the last page (see \texttt{page}).
         \end{description}
         
       \item[page] alternative (and additional) to
         \texttt{start}/\texttt{end} page ranges single \texttt{page}
         elements can be used inside \texttt{chapter}.
       \end{description}
     \end{description}
   \end{description}
   
   %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}
   
   
   \subsection{Digital images}
 \label{sec:inform-scann-imag}  \label{sec:inform-scann-imag}
   
 Image files representing scanned images can have an \texttt{img}  Image files representing scanned images can have an \texttt{img}
Line 211  of the original image. This information Line 825  of the original image. This information
 Required is one of three possible sets of tags:  Required is one of three possible sets of tags:
   
 \begin{description}  \begin{description}
 \item[original-size-x] The width of the original image. The unit of  \item[img] digital image information.
   measure can be contained as parameter \texttt{unit}, the default is  
   meter ``m''. The width to be considered is the total width of the    \begin{description}
   scanned area.    \item[original-size-x] The width of the original
       image -- required. \\
       The unit of measure can be contained as parameter \texttt{unit},
       the default is meter ``m''. The width to be considered is the
       total width of the scanned area.
   
 \item[original-size-y] The height of the original image.    \item[original-size-y] The height of the original image -- required.
   
 \item[original-pixel-x] The width of the hi-res scan in pixels.    \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
   
 \item[original-pixel-y] The height of the hi-res scan in pixels.    \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
     \end{description}
 \end{description}  \end{description}
   
 or  or
   
 \begin{description}  \begin{description}
   \item[img] digital image information.
   
     \begin{description}
 \item[original-dpi-x] The resolution of the hi-res scan in its width  \item[original-dpi-x] The resolution of the hi-res scan in its width
   in pixels per inch.      in pixels per inch -- required.
   
 \item[original-dpi-y] The resolution of the hi-res scan in its height  \item[original-dpi-y] The resolution of the hi-res scan in its height
   in pixels per inch.      in pixels per inch -- required.
   
     \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
       
     \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
     \end{description}
 \end{description}  \end{description}
   
 or  or
   
 \begin{description}  \begin{description}
   \item[img] digital image information.
   
     \begin{description}
 \item[original-dpi] The resolution of the hi-res scan in pixels per  \item[original-dpi] The resolution of the hi-res scan in pixels per
   inch if the resolutions in width and height are the same.      inch if the resolutions in width and height are the same -- required.
   
     \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
       
     \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
     \end{description}
 \end{description}  \end{description}
   
   
 \subsection{Access restrictions}  
 \label{sec:access-restrictions}  
   
 If the access to a resource is restricted for technical or legal  \subsection{Digital image acquisition}
 reasons then the restrictions can be put in a  \label{sec:inform-about-image}
 \texttt{access-restrictions} container. The format of the information  
 inside the container has to be further specified.  A description of the technology used in the process of producing a
   digital image.
   
   \begin{description}
   \item[image-acquisition] description of the image production process
     \begin{description}
     \item[device] acquisition device (e.g. ``flatbed scanner'') 
   
     \item[image-type] type and color-depth of the image -- required (e.g. ``RGB 24
       bit'')
   
     \item[production-comment] additional textual information about the
       production process
     \end{description}
   \end{description}
   
   
   
   \subsection{Full text with images}
   \label{sec:full-text-with}
   
   Full text in a XML format should be specified with a
   \texttt{content-type}\footnote{see section~\ref{tag-content-type}
   on page\pageref{tag-content-type}} ``fulltext''.
   
   The relation between the full text and optional images of
   whole pages or parts of pages must be specified in a
   \texttt{texttool} container.
   
   \begin{description}
   \item[texttool] representation of full text with images
     
     \begin{description}
     \item[text] the file name of the full text file (with path
       inside document directory)
       
     \item[text-url-path] a characteristic part of the URL with which the
       full text can be retrieved (the form and content of this element
       is dependent on the specific text retrieval mechanism)
   
     \item[image] the directory name of the directory containig the
       page image files (with path inside document directory)
   
     \item[xslt] the file name of an additional XSL transformation
       file
   
     \item[pagebreak] the name of the element that indicates page breaks
       (default ``pb'')
     \end{description}
   \end{description}
   
   
   
   \subsection{Copyright and access conditions}
   \label{sec:access-conditions}
   
   If the access to a resource is bound to conditions for technical or legal
   reasons then the conditions can be put in a \texttt{access-conditions}
   container. Other usage conditions like copyright can also be
   documented in this container.
   
   \begin{description}
   \item[access-conditions] legal and technical conditions for access to
     this resource
   
     \begin{description}
     \item[attribution] The name or institution this resource should be
       attributed to when it's publicly presented
   
       \begin{description}
       \item[name] a name (free text)
   
       \item[url] a URL (with an optional \texttt{label} attribute to show
         as text)
   
       \item[description] more information (free text, e.g. holding
         library call number)
       \end{description}
   
     \item[copyright] the copyright holder and it's conditions
       \begin{description}
       \item[owner] the name of the copyright holder
         \begin{description}
         \item[name] a name (free text)
   
         \item[url] a URL (with an optional \texttt{label} attribute to show
           as text)
         \end{description}
   
 \section{Sample metadata file for an ECHO resource}      \item[date] the date when the copyright was issued
   
 The following is the sample structure for a scanned document resource.      \item[duration] the duration of the copyright term (if known)
   
       \item[description] free-text field for special or additional
         conditions
       \end{description}
   
   
     \item[publish-metadata] metadata about this resource can be made
       freely available when this tag is present (otherwise metadata has
       the same access conditions as the rest of the resource). Access to
       the resource itself is regulated separately by the \texttt{access}
       element.
   
     \item[access] conditions of access to this resource. Different
       access types are specified by a \texttt{type} attribute:
       \begin{description}
       \item[type=group] access restricted to the members of this named
         group. The method to identify a user belonging to a named group
         is not specified in this document.
         \begin{description}
         \item[name] name of the group.
   
         \item[only-before] the access condition is only valid before the
           given date (format: ``YYYY/MM/DD'').
   
         \item[only-after] the access condition is only valid after the
           given date (format: ``YYYY/MM/DD'').
         \end{description}
       
       \item[type=institution] access restricted to the members of this
         institution. The method to identify a user to belong to the
         institution is not specified in this document.
         \begin{description}
         \item[name] name of the group.
   
         \item[only-before] the access condition is only valid before the
           given date (format: ``YYYY/MM/DD'').
   
         \item[only-after] the access condition is only valid after the
           given date (format: ``YYYY/MM/DD'').
         \end{description}
       
   
       \item[type=subnet] access restricted to all computers with an
         IP-address in this subnet.
         \begin{description}
         \item[range] subnet range defined in
         truncated-quad (e.g. ``141.14''), network-netmask
         (e.g. ``141.14.0.0/255.255.0.0''), or network-range
         (e.g. ``141.14.0.0/16'') notation.
   
         \item[only-before] the access condition is only valid before the
           given date (format: ``YYYY/MM/DD'').
   
         \item[only-after] the access condition is only valid after the
           given date (format: ``YYYY/MM/DD'').
         \end{description}
       
           
       \item[type=scientific] access to this resource should be restricted to
         scientific work
         \begin{description}
         \item[only-before] the access condition is only valid before the
           given date (format: ``YYYY/MM/DD'').
   
         \item[only-after] the access condition is only valid after the
           given date (format: ``YYYY/MM/DD'').
         \end{description}
       
   
       \item[type=free] access to this resource is not restricted
         \begin{description}
         \item[only-before] the access condition is only valid before the
           given date (format: ``YYYY/MM/DD'').
   
         \item[only-after] the access condition is only valid after the
           given date (format: ``YYYY/MM/DD'').
         \end{description}
       
         
       \item[type=special] if none of the above conditions seems appropriate,
         a free-form text can be specified here.
         \begin{description}
         \item[description] description of special access conditions.
   
         \item[only-before] the access condition is only valid before the
           given date (format: ``YYYY/MM/DD'').
   
         \item[only-after] the access condition is only valid after the
           given date (format: ``YYYY/MM/DD'').
         \end{description}
       
       \end{description}
     \end{description}
   \end{description}
   
   \noindent
   It should be noted that control over access to the resource has to be
   provided by additional technical measures. Access conditions in the
   metadata file only state that conditions \emph{should} be observed, it
   is not implied that they \emph{are} necessarily observed, as the
   enforcement of conditions depends on additional measures.
   
   
   
   \subsection{Acquisition of raw-data}
   \label{sec:acqu-inform}
   
   Information about the acquisition source for raw data resources can be
   provided in an \texttt{acquisition} container.
   
   \begin{description}
   \item[acquisition] the acquisition source of this resource -- required
     for raw data.
     \begin{description}
     \item[provider] where this resource came from -- required
       \begin{description}
       \item[name] free-text name of the provider (institution or
         individual)
   
       \item[address] address of the provider
   
       \item[contact] contact person at the provider (i.e. name and email)
   
       \item[url] URL related to the provider
   
       \item[provider-id] id of the provider (internally used) -- deduced
       \end{description}
   
     \item[date] date of acquisition -- required
   
     \item[description] free-text description of the acquisition source or
     additional information
     \end{description}
   \end{description}
   
   
   
   \subsection{Documentary Films}
   \label{sec:documentary-films}
   
   Documentary films can be described using a \texttt{film-acquisition}
   container.
   
   \begin{description}
   \item[film-acquisition] description of a (documentary) film --
     required for documentary film
     \begin{description}
     \item[recording] specification of the recording process
       \begin{description}
       \item[author] the person or persons doing the recording
   
       \item[date] the date or time span when the film was recorded
   
       \item[location] the place where the film was recorded
   
       \item[device] recording device used (e.g. ``Sony CP-DV8 Camcorder'')
         
       \item[format] format of the recorded film -- required (e.g. ``DV
         720x524 25fps interlaced'')
       \end{description}
    
     \item[description] free-form description of the recording and the
       content of the film
     \end{description}
   \end{description}
   
   (More information about the digitization step could be added in a
   \texttt{digitization} tag similar to the \texttt{recording} tag.)
   
   
   
   
   \section{Sample metadata files for ECHO resources}
   
   The following is a sample metadata index file for a directory containig a
   scanned document.
   
   \begin{small}
 \begin{verbatim}  \begin{verbatim}
 <resource type="ECHO">  <resource type="ECHO" version="1.0">
     <description></description>    <description>Fleck, 1980</description>
     <name>fleck.1980</name>      <name>fleck.1980</name>
     <creator>University of Bern</creator>      <creator>University of Bern</creator>
     <archive-creation-date></archive-creation-date>  
     <archive-path>ubern/wiss-theorie</archive-path>      <archive-path>ubern/wiss-theorie</archive-path>
     <content-type>scanned images</content-type>      <content-type>scanned images</content-type>
     <meta>      <meta>
         <dri>echo23a45e2329x</dri>          <dri>echo23a45e2329x</dri>
       <lang>ger</lang>
         <bib type="book">          <bib type="book">
             <author>Fleck, Ludwik</author>              <author>Fleck, Ludwik</author>
             <year>1980</year>              <year>1980</year>
             <title>Entstehung und Entwicklung einer               <title>Entstehung und Entwicklung einer 
                    wissenschaftlichen Tatsache</title>                     wissenschaftlichen Tatsache</title>
             <series_editor></series_editor>        <series-editor></series-editor>
             <series_title></series_title>        <series-title></series-title>
             <series_volume></series_volume>        <series-volume></series-volume>
             <number_of_pages></number_of_pages>        <number-of-pages></number-of-pages>
             <city>Frankfurt am Main</city>              <city>Frankfurt am Main</city>
             <publisher>Suhrkamp</publisher>              <publisher>Suhrkamp</publisher>
             <edition></edition>              <edition></edition>
             <number_of_volumes></number_of_volumes>        <number-of-volumes></number-of-volumes>
             <translator></translator>              <translator></translator>
             <isbn></isbn>        <isbn-issn></isbn-issn>
             <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>              <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>
             <abstract></abstract>              <abstract></abstract>
         </bib>          </bib>
Line 286  The following is the sample structure fo Line 1182  The following is the sample structure fo
     <dir>      <dir>
          <description>Scanned images (300dpi)</description>           <description>Scanned images (300dpi)</description>
          <name>img</name>           <name>img</name>
          <path></path>  
          <meta></meta>  
     </dir>      </dir>
 </resource>  </resource>
 \end{verbatim}  \end{verbatim}
   \end{small}
   
   The following is a sample metadata file for a single image of an
   architectural drawing.
   
   \begin{small}
   \begin{verbatim}
   <resource type="ECHO" version="1.0">
     <creator>Bibliotheca Hertziana</creator>
     <content-type>scanned images</content-type>
     <file>
       <name>00000271-asl-160-r-full.tif</name>
       <meta>
         <img>
           <original-dpi>315</original-dpi>
         </img>
         <dri>echo45a67bc4367d</dri>
         <lang>ita</lang>
         <doc type="Architectural Drawing">
           <person>Ciolli, Giacomo</person>
           <person>Urban VIII; Barberini, Maffeo</person>
           <location>Accademia di San Luca</location>
           <location>Roma</location>
           <date>1706</date>
           <object>Concorso Clementino</object>
           <object>Fontana Pubblica</object>
           <object>Brunnen</object>
           <object>ASL 160</object>
           <keywords></keywords>
         </doc>
         <context>
            <url>http://colosseum.biblhertz.it:8080/Lineamenta/
            1033478408.39/1035196181.35/1035196204.09/1035394121.83
            </url>
         </context>
       </meta>
     </file>
   </resource>
   \end{verbatim}
   \end{small}
   
 \end{document}  \end{document}
   

Removed from v.1.3  
changed lines
  Added in v.1.21


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>