File:  [Repository] / storage / meta / meta-format.tex
Revision 1.3: download - view: text, annotated - select for diffs - revision graph
Tue Jul 1 17:51:40 2003 UTC (20 years, 10 months ago) by casties
Branches: MAIN
CVS tags: HEAD
clarified <path> element

\documentclass[a4paper]{article}

\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{ae}
%\usepackage{times}
%\usepackage{courier}

% create in-text links black (with PDF)
\usepackage[colorlinks=true,linkcolor=black]{hyperref}
% Format URLs nicely (without PDF)
%\usepackage{url}


\title{A simple metadata format for resource bundles}

\author{Robert Casties, Dirk Wintergrün, Christoph Liess}

\date{V0.2.2 of \today}

\begin{document}

\maketitle

\tableofcontents


\section{File and directory names}
\label{sec:file-directory-names}

File and directory names should not contain spaces. Allowed characters
in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen
``-'', underscore ``\_'' and dot ``.''.

File and directory paths in the metadata file use the conventional
Unix file separator slash ``/''.

\section{Resource format}
\label{sec:mpiwg-doc}

In this description elements marked ``optional'' need not be supplied
by the provider of the resource and may be absent in all versions of
the metadata file. Elements marked ``required'' must be supplied by
the provider of the resource. Elements marked ``deduced'' can be
supplied by the provider of the resource but can also be provided by
automatic scripts later in the process, the elements must be present
in the final file.

The outer container is named \texttt{resource}. Sub-types (``ECHO'',
``MPIWG'') can be specified if necessary with a \texttt{type}
parameter. Its sub-elements are:

\begin{description}
\item[description] An informal textual description of the
  resource -- optional.

\item[name] The filename of the resource (name of the directory this
  file is contained in) -- required.

\item[creator] The name of the project or person that created the
  resource -- optional.

\item[archive-creation-date] The time and date the archive was created
  -- deduced.

\item[archive-path] The full path to the resource directory inside the
  whole archive collection -- deduced.
  
\item[derived-from] Container for the description of the original
  resource if this resource is a modified version of another resource
  -- optional.

  \begin{description}
  \item[archive-path] The full path to the original resource
    --required.

  \item[description] An informal textual description of the relation
  of this resource to the original resource -- optional.
  \end{description}
  
\item[linked-with] Container for the description of another
  resource when this resource is a linked copy of another resource
  -- optional.

  \begin{description}
  \item[archive-path] The full path to the linked resource
    --required.

  \item[description] An informal textual description of the relation
  of this resource to the linked resource -- optional.
  \end{description}
  
\item[content-type] The content type of this resource -- required.\\
  The content type enables the choice of tools to manipulate and
  display the resource. There should be a common list of content
  types. For digital documents (books, manuscripts) this would be
  "scanned document", for other image data "scanned
  images".\footnote{The criterion for documents is a ordered
    succession of image files (pages) and equal image size and
    resolution throughout the images of a resource.}
  
\item[meta] Additional metadata information about the resource --
  optional.\\ For a description of additional metadata see below.

\item[dir] Container for the description of a subdirectory -- required
  (when there are subdirectories).\\ \texttt{dir} tags should not be
  nested. Directories at lower levels are identified by their
  \texttt{path}.

  \begin{description}
  \item[description] An informal textual description of the
    subdirectory -- optional.

  \item[name] The name of the subdirectory -- required.
    
  \item[path] The directory path of this subdirectory relative to the
    resource's root directory (containing the directory itself) --
    required (may be identical to \texttt{name} or omitted if the
    directory is a direct child of the resource's root directory).
    
  \item[meta] Additional metadata information about the directory --
    optional.\\ For a description of additional metadata see below.
  \end{description}
  
\item[file] Container for the description of a file -- deduced.\\
  \texttt{file} tags should not be nested in \texttt{dir} tags. Files
  at lower directory levels are identified by their \texttt{path}.

  \begin{description}
  \item[description] An informal textual description of the
    file -- optional.

  \item[name] The name of the file -- required.
    
  \item[path] The directory path of this file relative to the
    resource's root directory (containing the file itself) -- required
    (may be identical to \texttt{name} or omitted if the file is in the
    resource's root directory).

  \item[modification-date] The file's modification date -- optional.

  \item[creation-date] The file's creation date -- optional.

  \item[date] The file's creation date if is has not been modified --
    optional.

  \item[size] The file size -- deduced.
    
  \item[mime-type] The file's mime-type -- optional.

  \item[md5cs] MD5 checksum of the file content -- optional.
    
  \item[meta] Additional metadata information about the file --
    optional. For a description of additional metadata see below.
  \end{description}
  
\end{description}



\section{Additional metadata}
\label{sec:additional-metadata}

All elements with \texttt{meta} tags can contain an arbitrary number
of additional metadata elements.


\subsection{DRI}
\label{sec:dri}

The \emph{digital resource identifier} for the resource is specified
with a \texttt{dri} tag. Digital resource identifiers are documented
on the page

\url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.


\subsection{Bibliographic information}
\label{sec:bibliographic-data}

Bibliographic information in the format of the ECHO scheme for
bibliographic data (cf. content workflow) or the MPIWG
``Projektbibliografie'' is presented in a \texttt{bib} container with
a \texttt{type} parameter, giving the type of bibliographic resource.
The \texttt{type} field is repeated as a tag in the container. The
tags have the variable ``human-readable'' field names.


\subsection{Information on the document structure (table of contents)}
\label{sec:toc}

Document structure information like a table of contents for a scanned
document is presented in a \texttt{toc} container. The format to be
used has to be further specified. The format could be based on the so
called ``LiSe-XML'' format. For a detailed description and an
exemplary set of TOC information see:

\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}

\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TSlise/lise_downloads/deimel1929.xml}


\subsection{Information on scanned images}
\label{sec:inform-scann-imag}

Image files representing scanned images can have an \texttt{img}
container tag with information about the scan resolution and the size
of the original image. This information is used by the
\texttt{digilib} image viewing tool.

Required is one of three possible sets of tags:

\begin{description}
\item[original-size-x] The width of the original image. The unit of
  measure can be contained as parameter \texttt{unit}, the default is
  meter ``m''. The width to be considered is the total width of the
  scanned area.

\item[original-size-y] The height of the original image.

\item[original-pixel-x] The width of the hi-res scan in pixels.

\item[original-pixel-y] The height of the hi-res scan in pixels.
\end{description}

or

\begin{description}
\item[original-dpi-x] The resolution of the hi-res scan in its width
  in pixels per inch.

\item[original-dpi-y] The resolution of the hi-res scan in its height
  in pixels per inch.
\end{description}

or

\begin{description}
\item[original-dpi] The resolution of the hi-res scan in pixels per
  inch if the resolutions in width and height are the same.
\end{description}


\subsection{Access restrictions}
\label{sec:access-restrictions}

If the access to a resource is restricted for technical or legal
reasons then the restrictions can be put in a
\texttt{access-restrictions} container. The format of the information
inside the container has to be further specified.


\section{Sample metadata file for an ECHO resource}

The following is the sample structure for a scanned document resource.

\begin{verbatim}
<resource type="ECHO">
    <description></description>
    <name>fleck.1980</name>
    <creator>University of Bern</creator>
    <archive-creation-date></archive-creation-date>
    <archive-path>ubern/wiss-theorie</archive-path>
    <content-type>scanned images</content-type>
    <meta>
        <dri>echo23a45e2329x</dri>
        <bib type="book">
            <author>Fleck, Ludwik</author>
            <year>1980</year>
            <title>Entstehung und Entwicklung einer 
                   wissenschaftlichen Tatsache</title>
            <series_editor></series_editor>
            <series_title></series_title>
            <series_volume></series_volume>
            <number_of_pages></number_of_pages>
            <city>Frankfurt am Main</city>
            <publisher>Suhrkamp</publisher>
            <edition></edition>
            <number_of_volumes></number_of_volumes>
            <translator></translator>
            <isbn></isbn>
            <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>
            <abstract></abstract>
        </bib>
    </meta>
    <dir>
         <description>Scanned images (300dpi)</description>
         <name>img</name>
         <path></path>
         <meta></meta>
    </dir>
</resource>
\end{verbatim}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: 

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>