Diff for /storage/meta/meta-format.tex between versions 1.9 and 1.15

version 1.9, 2003/09/01 11:00:08 version 1.15, 2004/07/16 13:45:49
Line 16 Line 16
   
 \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess}  \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess}
   
 \date{V1.0.2a of 20.8.2003}  \date{V1.2 of 16.7.2004}
   
 \begin{document}  \begin{document}
   
Line 32  File and directory names should not cont Line 32  File and directory names should not cont
 in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen  in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen
 ``-'', underscore ``\_'' and dot ``.''.  ``-'', underscore ``\_'' and dot ``.''.
   
 File and directory paths in the metadata file use the conventional  Files and directories with names that contain illegal characters must
 Unix file separator slash ``/''.  be transformed to allowed names. A proposition for a simple
   transformation rule is
   
   \begin{itemize}
   \item whitespace characters (e.g. blank, tab, cr, lf) are replaced by
     hyphens ``-''
   
   \item other illegal characters are replaced by underscores ``\_''.
   \end{itemize}
   
   This rule does not provide a reversible mapping to the original
   illegal file name and it does not provide a collision-free mapping,
   i.e. two different illegal file names might be mapped to the same
   allowed file name. Additional precautions for these cases must be
   taken.
   
   
 \section{Metadata files}  \section{Metadata files}
Line 72  supplied by the provider of the resource Line 86  supplied by the provider of the resource
 automatic scripts later in the process, these elements must be present  automatic scripts later in the process, these elements must be present
 in the final file.  in the final file.
   
 The outer container element is \texttt{resource}. Sub-types (``ECHO'',  File and directory paths in the metadata file use the conventional
 ``MPIWG'') can be specified if necessary with a \texttt{type}  Unix file separator slash ``/''.
 parameter. Its sub-elements are:  
   The outer container element is \texttt{resource}. It has the following
   \textbf{attributes}:
   
 \begin{description}  \begin{description}
 \item[description] An informal textual description of the  \item[type] sub-type of resource (e.g. ``ECHO'', ``MPIWG'') --
   resource -- optional.    optional.
     
   \item[version] version number of metadata format (currently 1.1) --
     required.
   \end{description}
   
   \noindent The allowed \textbf{elements} inside \texttt{resource} are:
   
   \begin{description}
   \item[description] An informal textual description of the resource --
     optional\footnote{At least one description of the resource's content
       is required. The description can be an informal
       \texttt{description} element or a descriptive element (like
       \texttt{bib}) in a \texttt{meta} container.}.
   
 \item[name] The filename of the resource (name of the directory this  \item[name] The filename of the resource (name of the directory this
   file is contained in) -- required.    file is contained in) -- required.
Line 95  parameter. Its sub-elements are: Line 124  parameter. Its sub-elements are:
 \item[archive-path] The full path to the resource directory inside the  \item[archive-path] The full path to the resource directory inside the
   whole archive collection, including the resource directory -- deduced.    whole archive collection, including the resource directory -- deduced.
       
   \item[archive-id] The ID for this document in the archive --
     required.
     
 \item[derived-from] Container for the description of the original  \item[derived-from] Container for the description of the original
   resource if this resource is a modified version of another resource    resource if this resource is a modified version of another resource
   -- optional.    -- optional.
   
   \begin{description}    \begin{description}
   \item[archive-path] The full path to the original resource    \item[archive-id] The ID of the original resource
     --required.      --required.
   
     \item[archive-path] The full path to the original resource
       -- deduced.
   
   \item[description] An informal textual description of the relation    \item[description] An informal textual description of the relation
   of this resource to the original resource -- optional.    of this resource to the original resource -- optional.
   \end{description}    \end{description}
Line 112  parameter. Its sub-elements are: Line 147  parameter. Its sub-elements are:
   -- optional.    -- optional.
   
   \begin{description}    \begin{description}
   \item[archive-path] The full path to the linked resource    \item[archive-id] The ID of the linked resource
     --required.      --required.
   
     \item[archive-path] The full path to the linked resource
       -- deduced.
   
   \item[description] An informal textual description of the relation    \item[description] An informal textual description of the relation
   of this resource to the linked resource -- optional.    of this resource to the linked resource -- optional.
   \end{description}    \end{description}
       
 \item[content-type] The content type of this resource -- required.\\  \item[media-type] \label{tag-media-type} The main media type of this
   The content type enables the choice of tools to manipulate and    resource -- required.\\ The main media type can be overridden by
   display the resource. There should be a common list of content    \texttt{media-type}s in subdirectories. Possible types are
   types. For digital documents (books, manuscripts) this would be    \begin{itemize}
   "scanned document", for other image data "scanned    \item \texttt{image}
   images".\footnote{The criterion for documents is a ordered  
     succession of image files (pages) and equal image size and    \item \texttt{text}
     resolution throughout the images of a resource.}  
     \item \texttt{audio}
   
     \item \texttt{video}
   
     \item \texttt{data} for other type of data
     \end{itemize}
       
 \item[meta] Additional metadata information about the resource --  \item[meta] Additional metadata information about the resource --
   optional.\\ For a description of additional metadata see below.    optional.\\ For a description of additional metadata see below.
Line 142  parameter. Its sub-elements are: Line 186  parameter. Its sub-elements are:
   
   \item[name] The name of the subdirectory -- required.    \item[name] The name of the subdirectory -- required.
           
     \item[original-name] A text string associated with the directory as
       original name -- optional. (E.g. if the data in this directory
       came from an external source and had a name that had to be changed
       according to section~\ref{sec:file-directory-names} but it should
       be possible to reference the original name.)
       
   \item[path] The directory path of this subdirectory relative to the    \item[path] The directory path of this subdirectory relative to the
     resource's root directory (excluding the directory itself) --      resource's root directory (excluding the directory itself) --
     required (may be empty or omitted if the directory is a direct      required (may be empty or omitted if the directory is a direct
Line 161  parameter. Its sub-elements are: Line 211  parameter. Its sub-elements are:
   
   \item[name] The name of the file -- required.    \item[name] The name of the file -- required.
           
     \item[original-name] A text string associated with the file as
       original name -- optional. (E.g. if this file came from an
       external source and had a name that had to be changed according to
       section~\ref{sec:file-directory-names} but it should be possible
       to reference the original name.)
       
   \item[path] The directory path of this file relative to the    \item[path] The directory path of this file relative to the
     resource's root directory (excluding the file itself) -- required      resource's root directory (excluding the file itself) -- required
     (may be empty or omitted if the file is in the resource's root      (may be empty or omitted if the file is in the resource's root
Line 192  parameter. Its sub-elements are: Line 248  parameter. Its sub-elements are:
 \label{sec:additional-metadata}  \label{sec:additional-metadata}
   
 All elements with \texttt{meta} tags can contain an arbitrary number  All elements with \texttt{meta} tags can contain an arbitrary number
 of additional metadata elements.  of the following additional metadata elements.
   
   \subsection{workflow state}
   \label{sec:workflow-state}
   
   All additional metadata elements can have a \texttt{workflow-state}
   \textbf{attribute}. This attribute reflects the state of the
   corresponding metadata element. The possible values for the
   \texttt{workflow-state} attribute are
   \begin{itemize}
   \item \texttt{preliminary} this information is preliminary. It must
     be checked in further workflow steps.
   
   \item \texttt{inwork}
   
   \item \texttt{final}
   \end{itemize}
   
   workflow states other than \texttt{preliminary} are part of the
   workflow handling of the respective projects.
   
   Metadata elements can appear multiple times with different
   \texttt{workflow-state} attributes. This enables metadata versioning.
   
   
   
   \subsection{Content type}
   \label{sec:content-type}
   
   \begin{description}
   \item[content-type] \label{tag-content-type} The content type of this
     resource -- required.\\
     The content type enables the choice of tools to manipulate and
     display the resource. There should be a common list of content
     types. For digital documents (books, manuscripts) this would be
     "scanned document", for other image data "scanned
     images".\footnote{The criterion for documents is a ordered
       succession of image files (pages) and equal image size and
       resolution throughout the images of a resource.}
   \end{description}  
   
   
   
 \subsection{Language}  \subsection{Language}
 \label{sec:lang}  \label{sec:lang}
Line 220  on the page Line 317  on the page
 \subsection{Collection context}  \subsection{Collection context}
 \label{sec:collection-context}  \label{sec:collection-context}
   
 The context of a resource as part of a collection or part of a project can be  The context of a resource as part of a collection or part of a project
 specified in the \texttt{context} element. All elements in the  can be specified in the \texttt{context} element. The context element
 container can appear multiple times.  can appear multiple times if the resource is part of multiple
   collections or projects.
   
 \begin{description}  \begin{description}
 \item[context] information on collection or project context.  \item[context] information on collection or project context.
   
   \begin{description}    \begin{description}
   \item[link] URL to additional context information.    \item[link] URL to additional context information -- optional.
           
   \item[name] Textual description of project or collection.    \item[name] Textual description of project or collection -- optional.
   
     \item[meta-datalink] description of external sources of canonical meta
       information -- optional
       \begin{description}
       \item[db] \textbf{attribute} to identify different sets of meta data
         links to the same resource -- optional
   
       \item[object] \textbf{attribute} to identify different objects or
         parts of the same resource -- optional
   
       \item[label] textual label for the link -- optional
   
       \item[url] URL to present to the client -- optional
   
       \item[metadata-url] URL to an external server to be queried -- optional
       \end{description}
   
     \item[meta-baselink] description of external server for canonical meta
       information -- optional
       \begin{description}
       \item[db] \textbf{attribute} to identify different sets of meta data
         links to the same resource -- optional
   
       \item[label] textual label for the link -- optional
   
       \item[url] URL to present to the client -- optional
         
       \item[metadata-url] URL to an external server to be queried --
         required (the parameter \texttt{object=} with an object id has
         to be appended to this URL)
       \end{description}
   \end{description}    \end{description}
 \end{description}  \end{description}
   
Line 523  appear multiple times. Line 652  appear multiple times.
 \end{description}  \end{description}
   
   
 \subsection{Information on the document structure (table of contents)}  \subsection{Document structure (table of contents)}
 \label{sec:toc}  \label{sec:toc}
   
 Information on the structure of a document like the division into  Information on the structure of a document like the division into
Line 606  tags. Line 735  tags.
 %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}  %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}
   
   
 \subsection{Information on scanned images}  \subsection{Digital images}
 \label{sec:inform-scann-imag}  \label{sec:inform-scann-imag}
   
 Image files representing scanned images can have an \texttt{img}  Image files representing scanned images can have an \texttt{img}
Line 620  Required is one of three possible sets o Line 749  Required is one of three possible sets o
 \item[img] digital image information.  \item[img] digital image information.
   
   \begin{description}    \begin{description}
   \item[original-size-x] The width of the original image. The unit of    \item[original-size-x] The width of the original
     measure can be contained as parameter \texttt{unit}, the default      image -- required. \\
     is meter ``m''. The width to be considered is the total width of      The unit of measure can be contained as parameter \texttt{unit},
     the scanned area.      the default is meter ``m''. The width to be considered is the
       total width of the scanned area.
           
   \item[original-size-y] The height of the original image.    \item[original-size-y] The height of the original image -- required.
           
   \item[original-pixel-x] The width of the hi-res scan in pixels.    \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
           
   \item[original-pixel-y] The height of the hi-res scan in pixels.    \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
   \end{description}    \end{description}
 \end{description}  \end{description}
   
Line 640  or Line 770  or
   
   \begin{description}    \begin{description}
   \item[original-dpi-x] The resolution of the hi-res scan in its width    \item[original-dpi-x] The resolution of the hi-res scan in its width
     in pixels per inch.      in pixels per inch -- required.
   
   \item[original-dpi-y] The resolution of the hi-res scan in its height    \item[original-dpi-y] The resolution of the hi-res scan in its height
     in pixels per inch.      in pixels per inch -- required.
   
     \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
       
     \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
   \end{description}    \end{description}
 \end{description}  \end{description}
   
Line 654  or Line 788  or
   
   \begin{description}    \begin{description}
   \item[original-dpi] The resolution of the hi-res scan in pixels per    \item[original-dpi] The resolution of the hi-res scan in pixels per
     inch if the resolutions in width and height are the same.      inch if the resolutions in width and height are the same -- required.
   
     \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced.
       
     \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced.
   \end{description}    \end{description}
 \end{description}  \end{description}
   
   
   
   \subsection{Digital image acquisition}
   \label{sec:inform-about-image}
   
   A description of the technology used in the process of producing a
   digital image.
   
   \begin{description}
   \item[image-acquisition] description of the image production process
     \begin{description}
     \item[device] acquisition device (e.g. ``flatbed scanner'') 
   
     \item[image-type] type and color-depth of the image -- required (e.g. ``RGB 24
       bit'')
   
     \item[production-comment] additional textual information about the
       production process
     \end{description}
   \end{description}
   
   
   
 \subsection{Full text with images}  \subsection{Full text with images}
 \label{sec:full-text-with}  \label{sec:full-text-with}
   
 Full text in a XML format will be specified with a  Full text in a XML format should be specified with a
 \texttt{content-type} ``fulltext''.  \texttt{content-type}\footnote{see section~\ref{tag-content-type}
   on page\pageref{tag-content-type}} ``fulltext''.
   
 The relation between the full text and optional images of  The relation between the full text and optional images of
 whole pages or parts of pages must be specified in a  whole pages or parts of pages must be specified in a
Line 677  whole pages or parts of pages must be sp Line 838  whole pages or parts of pages must be sp
     inside document directory)      inside document directory)
   
   \item[page-images] the directory name of the directory containig the    \item[page-images] the directory name of the directory containig the
     page image files (with path      page image files (with path inside document directory)
     inside document directory)  
   
   \item[xslt-file] the file name of an additional XSL transformation    \item[xslt-file] the file name of an additional XSL transformation
     file      file
   
   \item[text-config] container for configuration options    \item[text-config] container for configuration options
       \begin{description}
       \item[container-tag] the name of the text root element (default
         ``text'')
   
   \item[container-tag] the name of the text root element (default ``text'')      \item[ref-element-tag] the name of the element that is used as
         unit of reference when results are presented
   \item[ref-element-tag] the name of the element that is used as unit of  
     reference when results are presented  
           
   \item[pagebreak-tag] the name of the element that indicates page    \item[pagebreak-tag] the name of the element that indicates page
     breaks (default ``pb'')      breaks (default ``pb'')
   \end{description}    \end{description}
 \end{description}  \end{description}
   \end{description}
   
   
   
   \subsection{Copyright and access conditions}
   \label{sec:access-conditions}
   
   If the access to a resource is bound to conditions for technical or legal
   reasons then the conditions can be put in a \texttt{access-conditions}
   container. Other access rights conditions like copyright can also be
   documented in this container.
   
   \begin{description}
   \item[access-conditions] legal and technical conditions for access to
     this resource
   
     \begin{description}
     \item[attribution] The name or institution this resource should be
       attributed to when it's publicly presented
   
       \begin{description}
       \item[name] a name (free text)
   
       \item[url] a URL (with an optional \texttt{label} attribute to show
         as text)
       \end{description}
   
     \item[copyright] the copyright owner and it's conditions
       \begin{description}
       \item[owner] the name of the copyright owner
         \begin{description}
         \item[name] a name (free text)
   
         \item[url] a URL (with an optional \texttt{label} attribute to show
           as text)
         \end{description}
   
       \item[date] the date when the copyright was issued
   
       \item[duration] the duration of the copyright (if known)
   
       \item[description] free-text field for special or additional
         conditions
       \end{description}
   
   
     \item[publish-metadata] metadata about this resource can be made
     freely available when this tag is present. Access to the resource
     itself is regulated separately by the \texttt{access} element.
   
     \item[access] conditions of access to this resource
       \begin{description}
       \item[internal] access should be restricted to a group of users. The
         type of group is defined by one of the following
         \begin{description}
         \item[institution] the members of this institution. The method
           to identify a user to belong to the institution is not
           specified in this document.
   
         \item[subnet] all computers with an IP-address in this subnet. The
           subnet is defined in ``truncated-quad'' (e.g. ``141.14'') or
           ``adress/netmask'' (e.g. ``141.14.0.0/255.255.0.0'') notation.
           
         \item[group] the members of this named group. The method to
           identify a user to belong to a named group is not specified in
           this document.
         \end{description}
   
       \item[scientific] access to this resource should be restricted to
         scientific work
   
       \item[free] access to this resource is not restricted
         
       \item[special] if none of the above conditions seems appropriate,
         a free-form text can be specified here.
       \end{description}
     \end{description}
   \end{description}
   
   \noindent
   It should be noted that control over the access to the resource has to
   be provided by additional technical measures. Access conditions in the
   metadata file only state that conditions \emph{should} be observed,
   not that they \emph{are} necessarily observed, as the enforcement of
   conditions depends on additional technical measures.
   
   
   
   \subsection{Acquisition of raw-data}
   \label{sec:acqu-inform}
   
   Information about the acquisition source for raw data resources can be
   provided in an \texttt{acquisition} container.
   
   \begin{description}
   \item[acquisition] the acquisition source of this resource -- required
     for raw data.
     \begin{description}
     \item[provider] where this resource came from -- required
       \begin{description}
       \item[name] free-text name of the provider (institution or
         individual)
   
       \item[address] address of the provider
   
       \item[contact] contact person at the provider (i.e. name and email)
   
       \item[url] URL related to the provider
   
       \item[provider-id] id of the provider (internally used) -- deduced
       \end{description}
   
     \item[date] date of acquisition -- required
   
     \item[description] free-text description of the acquisition source or
     additional information
     \end{description}
   \end{description}
   
   
   
   \subsection{Documentary Films}
   \label{sec:documentary-films}
   
   Documentary films can be described using a \texttt{film-acquisition}
   container.
   
   \begin{description}
   \item[film-acquisition] description of a (documentary) film --
     required for documentary film
     \begin{description}
     \item[recording] specification of the recording process
       \begin{description}
       \item[author] the person or persons doing the recording
   
       \item[date] the date or time span when the film was recorded
   
       \item[location] the place where the film was recorded
   
       \item[device] recording device used (e.g. ``Sony CP-DV8 Camcorder'')
         
       \item[format] format of the recorded film -- required (e.g. ``DV
         720x524 25fps interlaced'')
       \end{description}
    
     \item[description] free-form description of the recording and the
       content of the film
     \end{description}
   \end{description}
   
   (More information about the digitization step could be added in a
   \texttt{digitization} tag similar to the \texttt{recording} tag.)
   
 \subsection{Access restrictions}  
 \label{sec:access-restrictions}  
   
 If the access to a resource is restricted for technical or legal  
 reasons then the restrictions can be put in a  
 \texttt{access-restrictions} container. The format of the information  
 inside the container has to be further specified.  
   
   
 \section{Sample metadata files for ECHO resources}  \section{Sample metadata files for ECHO resources}
Line 713  scanned document. Line 1018  scanned document.
   
 \begin{small}  \begin{small}
 \begin{verbatim}  \begin{verbatim}
 <resource type="ECHO">  <resource type="ECHO" version="1.0">
   <description>Fleck, 1980</description>    <description>Fleck, 1980</description>
   <name>fleck.1980</name>    <name>fleck.1980</name>
   <creator>University of Bern</creator>    <creator>University of Bern</creator>
Line 754  architectural drawing. Line 1059  architectural drawing.
   
 \begin{small}  \begin{small}
 \begin{verbatim}  \begin{verbatim}
 <resource type="ECHO">  <resource type="ECHO" version="1.0">
   <creator>Bibliotheca Hertziana</creator>    <creator>Bibliotheca Hertziana</creator>
   <content-type>scanned images</content-type>    <content-type>scanned images</content-type>
   <file>    <file>

Removed from v.1.9  
changed lines
  Added in v.1.15


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>