version 1.9, 2003/09/01 11:00:08
|
version 1.15, 2004/07/16 13:45:49
|
Line 16
|
Line 16
|
|
|
\author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess} |
\author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess} |
|
|
\date{V1.0.2a of 20.8.2003} |
\date{V1.2 of 16.7.2004} |
|
|
\begin{document} |
\begin{document} |
|
|
Line 32 File and directory names should not cont
|
Line 32 File and directory names should not cont
|
in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen |
in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen |
``-'', underscore ``\_'' and dot ``.''. |
``-'', underscore ``\_'' and dot ``.''. |
|
|
File and directory paths in the metadata file use the conventional |
Files and directories with names that contain illegal characters must |
Unix file separator slash ``/''. |
be transformed to allowed names. A proposition for a simple |
|
transformation rule is |
|
|
|
\begin{itemize} |
|
\item whitespace characters (e.g. blank, tab, cr, lf) are replaced by |
|
hyphens ``-'' |
|
|
|
\item other illegal characters are replaced by underscores ``\_''. |
|
\end{itemize} |
|
|
|
This rule does not provide a reversible mapping to the original |
|
illegal file name and it does not provide a collision-free mapping, |
|
i.e. two different illegal file names might be mapped to the same |
|
allowed file name. Additional precautions for these cases must be |
|
taken. |
|
|
|
|
\section{Metadata files} |
\section{Metadata files} |
Line 72 supplied by the provider of the resource
|
Line 86 supplied by the provider of the resource
|
automatic scripts later in the process, these elements must be present |
automatic scripts later in the process, these elements must be present |
in the final file. |
in the final file. |
|
|
The outer container element is \texttt{resource}. Sub-types (``ECHO'', |
File and directory paths in the metadata file use the conventional |
``MPIWG'') can be specified if necessary with a \texttt{type} |
Unix file separator slash ``/''. |
parameter. Its sub-elements are: |
|
|
The outer container element is \texttt{resource}. It has the following |
|
\textbf{attributes}: |
|
|
\begin{description} |
\begin{description} |
\item[description] An informal textual description of the |
\item[type] sub-type of resource (e.g. ``ECHO'', ``MPIWG'') -- |
resource -- optional. |
optional. |
|
|
|
\item[version] version number of metadata format (currently 1.1) -- |
|
required. |
|
\end{description} |
|
|
|
\noindent The allowed \textbf{elements} inside \texttt{resource} are: |
|
|
|
\begin{description} |
|
\item[description] An informal textual description of the resource -- |
|
optional\footnote{At least one description of the resource's content |
|
is required. The description can be an informal |
|
\texttt{description} element or a descriptive element (like |
|
\texttt{bib}) in a \texttt{meta} container.}. |
|
|
\item[name] The filename of the resource (name of the directory this |
\item[name] The filename of the resource (name of the directory this |
file is contained in) -- required. |
file is contained in) -- required. |
Line 95 parameter. Its sub-elements are:
|
Line 124 parameter. Its sub-elements are:
|
\item[archive-path] The full path to the resource directory inside the |
\item[archive-path] The full path to the resource directory inside the |
whole archive collection, including the resource directory -- deduced. |
whole archive collection, including the resource directory -- deduced. |
|
|
|
\item[archive-id] The ID for this document in the archive -- |
|
required. |
|
|
\item[derived-from] Container for the description of the original |
\item[derived-from] Container for the description of the original |
resource if this resource is a modified version of another resource |
resource if this resource is a modified version of another resource |
-- optional. |
-- optional. |
|
|
\begin{description} |
\begin{description} |
\item[archive-path] The full path to the original resource |
\item[archive-id] The ID of the original resource |
--required. |
--required. |
|
|
|
\item[archive-path] The full path to the original resource |
|
-- deduced. |
|
|
\item[description] An informal textual description of the relation |
\item[description] An informal textual description of the relation |
of this resource to the original resource -- optional. |
of this resource to the original resource -- optional. |
\end{description} |
\end{description} |
Line 112 parameter. Its sub-elements are:
|
Line 147 parameter. Its sub-elements are:
|
-- optional. |
-- optional. |
|
|
\begin{description} |
\begin{description} |
\item[archive-path] The full path to the linked resource |
\item[archive-id] The ID of the linked resource |
--required. |
--required. |
|
|
|
\item[archive-path] The full path to the linked resource |
|
-- deduced. |
|
|
\item[description] An informal textual description of the relation |
\item[description] An informal textual description of the relation |
of this resource to the linked resource -- optional. |
of this resource to the linked resource -- optional. |
\end{description} |
\end{description} |
|
|
\item[content-type] The content type of this resource -- required.\\ |
\item[media-type] \label{tag-media-type} The main media type of this |
The content type enables the choice of tools to manipulate and |
resource -- required.\\ The main media type can be overridden by |
display the resource. There should be a common list of content |
\texttt{media-type}s in subdirectories. Possible types are |
types. For digital documents (books, manuscripts) this would be |
\begin{itemize} |
"scanned document", for other image data "scanned |
\item \texttt{image} |
images".\footnote{The criterion for documents is a ordered |
|
succession of image files (pages) and equal image size and |
\item \texttt{text} |
resolution throughout the images of a resource.} |
|
|
\item \texttt{audio} |
|
|
|
\item \texttt{video} |
|
|
|
\item \texttt{data} for other type of data |
|
\end{itemize} |
|
|
\item[meta] Additional metadata information about the resource -- |
\item[meta] Additional metadata information about the resource -- |
optional.\\ For a description of additional metadata see below. |
optional.\\ For a description of additional metadata see below. |
Line 142 parameter. Its sub-elements are:
|
Line 186 parameter. Its sub-elements are:
|
|
|
\item[name] The name of the subdirectory -- required. |
\item[name] The name of the subdirectory -- required. |
|
|
|
\item[original-name] A text string associated with the directory as |
|
original name -- optional. (E.g. if the data in this directory |
|
came from an external source and had a name that had to be changed |
|
according to section~\ref{sec:file-directory-names} but it should |
|
be possible to reference the original name.) |
|
|
\item[path] The directory path of this subdirectory relative to the |
\item[path] The directory path of this subdirectory relative to the |
resource's root directory (excluding the directory itself) -- |
resource's root directory (excluding the directory itself) -- |
required (may be empty or omitted if the directory is a direct |
required (may be empty or omitted if the directory is a direct |
Line 161 parameter. Its sub-elements are:
|
Line 211 parameter. Its sub-elements are:
|
|
|
\item[name] The name of the file -- required. |
\item[name] The name of the file -- required. |
|
|
|
\item[original-name] A text string associated with the file as |
|
original name -- optional. (E.g. if this file came from an |
|
external source and had a name that had to be changed according to |
|
section~\ref{sec:file-directory-names} but it should be possible |
|
to reference the original name.) |
|
|
\item[path] The directory path of this file relative to the |
\item[path] The directory path of this file relative to the |
resource's root directory (excluding the file itself) -- required |
resource's root directory (excluding the file itself) -- required |
(may be empty or omitted if the file is in the resource's root |
(may be empty or omitted if the file is in the resource's root |
Line 192 parameter. Its sub-elements are:
|
Line 248 parameter. Its sub-elements are:
|
\label{sec:additional-metadata} |
\label{sec:additional-metadata} |
|
|
All elements with \texttt{meta} tags can contain an arbitrary number |
All elements with \texttt{meta} tags can contain an arbitrary number |
of additional metadata elements. |
of the following additional metadata elements. |
|
|
|
\subsection{workflow state} |
|
\label{sec:workflow-state} |
|
|
|
All additional metadata elements can have a \texttt{workflow-state} |
|
\textbf{attribute}. This attribute reflects the state of the |
|
corresponding metadata element. The possible values for the |
|
\texttt{workflow-state} attribute are |
|
\begin{itemize} |
|
\item \texttt{preliminary} this information is preliminary. It must |
|
be checked in further workflow steps. |
|
|
|
\item \texttt{inwork} |
|
|
|
\item \texttt{final} |
|
\end{itemize} |
|
|
|
workflow states other than \texttt{preliminary} are part of the |
|
workflow handling of the respective projects. |
|
|
|
Metadata elements can appear multiple times with different |
|
\texttt{workflow-state} attributes. This enables metadata versioning. |
|
|
|
|
|
|
|
\subsection{Content type} |
|
\label{sec:content-type} |
|
|
|
\begin{description} |
|
\item[content-type] \label{tag-content-type} The content type of this |
|
resource -- required.\\ |
|
The content type enables the choice of tools to manipulate and |
|
display the resource. There should be a common list of content |
|
types. For digital documents (books, manuscripts) this would be |
|
"scanned document", for other image data "scanned |
|
images".\footnote{The criterion for documents is a ordered |
|
succession of image files (pages) and equal image size and |
|
resolution throughout the images of a resource.} |
|
\end{description} |
|
|
|
|
|
|
\subsection{Language} |
\subsection{Language} |
\label{sec:lang} |
\label{sec:lang} |
Line 220 on the page
|
Line 317 on the page
|
\subsection{Collection context} |
\subsection{Collection context} |
\label{sec:collection-context} |
\label{sec:collection-context} |
|
|
The context of a resource as part of a collection or part of a project can be |
The context of a resource as part of a collection or part of a project |
specified in the \texttt{context} element. All elements in the |
can be specified in the \texttt{context} element. The context element |
container can appear multiple times. |
can appear multiple times if the resource is part of multiple |
|
collections or projects. |
|
|
\begin{description} |
\begin{description} |
\item[context] information on collection or project context. |
\item[context] information on collection or project context. |
|
|
\begin{description} |
\begin{description} |
\item[link] URL to additional context information. |
\item[link] URL to additional context information -- optional. |
|
|
\item[name] Textual description of project or collection. |
\item[name] Textual description of project or collection -- optional. |
|
|
|
\item[meta-datalink] description of external sources of canonical meta |
|
information -- optional |
|
\begin{description} |
|
\item[db] \textbf{attribute} to identify different sets of meta data |
|
links to the same resource -- optional |
|
|
|
\item[object] \textbf{attribute} to identify different objects or |
|
parts of the same resource -- optional |
|
|
|
\item[label] textual label for the link -- optional |
|
|
|
\item[url] URL to present to the client -- optional |
|
|
|
\item[metadata-url] URL to an external server to be queried -- optional |
|
\end{description} |
|
|
|
\item[meta-baselink] description of external server for canonical meta |
|
information -- optional |
|
\begin{description} |
|
\item[db] \textbf{attribute} to identify different sets of meta data |
|
links to the same resource -- optional |
|
|
|
\item[label] textual label for the link -- optional |
|
|
|
\item[url] URL to present to the client -- optional |
|
|
|
\item[metadata-url] URL to an external server to be queried -- |
|
required (the parameter \texttt{object=} with an object id has |
|
to be appended to this URL) |
|
\end{description} |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
Line 523 appear multiple times.
|
Line 652 appear multiple times.
|
\end{description} |
\end{description} |
|
|
|
|
\subsection{Information on the document structure (table of contents)} |
\subsection{Document structure (table of contents)} |
\label{sec:toc} |
\label{sec:toc} |
|
|
Information on the structure of a document like the division into |
Information on the structure of a document like the division into |
Line 606 tags.
|
Line 735 tags.
|
%%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} |
%%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} |
|
|
|
|
\subsection{Information on scanned images} |
\subsection{Digital images} |
\label{sec:inform-scann-imag} |
\label{sec:inform-scann-imag} |
|
|
Image files representing scanned images can have an \texttt{img} |
Image files representing scanned images can have an \texttt{img} |
Line 620 Required is one of three possible sets o
|
Line 749 Required is one of three possible sets o
|
\item[img] digital image information. |
\item[img] digital image information. |
|
|
\begin{description} |
\begin{description} |
\item[original-size-x] The width of the original image. The unit of |
\item[original-size-x] The width of the original |
measure can be contained as parameter \texttt{unit}, the default |
image -- required. \\ |
is meter ``m''. The width to be considered is the total width of |
The unit of measure can be contained as parameter \texttt{unit}, |
the scanned area. |
the default is meter ``m''. The width to be considered is the |
|
total width of the scanned area. |
|
|
\item[original-size-y] The height of the original image. |
\item[original-size-y] The height of the original image -- required. |
|
|
\item[original-pixel-x] The width of the hi-res scan in pixels. |
\item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. |
|
|
\item[original-pixel-y] The height of the hi-res scan in pixels. |
\item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
Line 640 or
|
Line 770 or
|
|
|
\begin{description} |
\begin{description} |
\item[original-dpi-x] The resolution of the hi-res scan in its width |
\item[original-dpi-x] The resolution of the hi-res scan in its width |
in pixels per inch. |
in pixels per inch -- required. |
|
|
\item[original-dpi-y] The resolution of the hi-res scan in its height |
\item[original-dpi-y] The resolution of the hi-res scan in its height |
in pixels per inch. |
in pixels per inch -- required. |
|
|
|
\item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. |
|
|
|
\item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
Line 654 or
|
Line 788 or
|
|
|
\begin{description} |
\begin{description} |
\item[original-dpi] The resolution of the hi-res scan in pixels per |
\item[original-dpi] The resolution of the hi-res scan in pixels per |
inch if the resolutions in width and height are the same. |
inch if the resolutions in width and height are the same -- required. |
|
|
|
\item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. |
|
|
|
\item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
|
|
|
|
|
\subsection{Digital image acquisition} |
|
\label{sec:inform-about-image} |
|
|
|
A description of the technology used in the process of producing a |
|
digital image. |
|
|
|
\begin{description} |
|
\item[image-acquisition] description of the image production process |
|
\begin{description} |
|
\item[device] acquisition device (e.g. ``flatbed scanner'') |
|
|
|
\item[image-type] type and color-depth of the image -- required (e.g. ``RGB 24 |
|
bit'') |
|
|
|
\item[production-comment] additional textual information about the |
|
production process |
|
\end{description} |
|
\end{description} |
|
|
|
|
|
|
\subsection{Full text with images} |
\subsection{Full text with images} |
\label{sec:full-text-with} |
\label{sec:full-text-with} |
|
|
Full text in a XML format will be specified with a |
Full text in a XML format should be specified with a |
\texttt{content-type} ``fulltext''. |
\texttt{content-type}\footnote{see section~\ref{tag-content-type} |
|
on page\pageref{tag-content-type}} ``fulltext''. |
|
|
The relation between the full text and optional images of |
The relation between the full text and optional images of |
whole pages or parts of pages must be specified in a |
whole pages or parts of pages must be specified in a |
Line 677 whole pages or parts of pages must be sp
|
Line 838 whole pages or parts of pages must be sp
|
inside document directory) |
inside document directory) |
|
|
\item[page-images] the directory name of the directory containig the |
\item[page-images] the directory name of the directory containig the |
page image files (with path |
page image files (with path inside document directory) |
inside document directory) |
|
|
|
\item[xslt-file] the file name of an additional XSL transformation |
\item[xslt-file] the file name of an additional XSL transformation |
file |
file |
|
|
\item[text-config] container for configuration options |
\item[text-config] container for configuration options |
|
\begin{description} |
|
\item[container-tag] the name of the text root element (default |
|
``text'') |
|
|
\item[container-tag] the name of the text root element (default ``text'') |
\item[ref-element-tag] the name of the element that is used as |
|
unit of reference when results are presented |
\item[ref-element-tag] the name of the element that is used as unit of |
|
reference when results are presented |
|
|
|
\item[pagebreak-tag] the name of the element that indicates page |
\item[pagebreak-tag] the name of the element that indicates page |
breaks (default ``pb'') |
breaks (default ``pb'') |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
\end{description} |
|
|
|
|
|
|
|
\subsection{Copyright and access conditions} |
|
\label{sec:access-conditions} |
|
|
|
If the access to a resource is bound to conditions for technical or legal |
|
reasons then the conditions can be put in a \texttt{access-conditions} |
|
container. Other access rights conditions like copyright can also be |
|
documented in this container. |
|
|
|
\begin{description} |
|
\item[access-conditions] legal and technical conditions for access to |
|
this resource |
|
|
|
\begin{description} |
|
\item[attribution] The name or institution this resource should be |
|
attributed to when it's publicly presented |
|
|
|
\begin{description} |
|
\item[name] a name (free text) |
|
|
|
\item[url] a URL (with an optional \texttt{label} attribute to show |
|
as text) |
|
\end{description} |
|
|
|
\item[copyright] the copyright owner and it's conditions |
|
\begin{description} |
|
\item[owner] the name of the copyright owner |
|
\begin{description} |
|
\item[name] a name (free text) |
|
|
|
\item[url] a URL (with an optional \texttt{label} attribute to show |
|
as text) |
|
\end{description} |
|
|
|
\item[date] the date when the copyright was issued |
|
|
|
\item[duration] the duration of the copyright (if known) |
|
|
|
\item[description] free-text field for special or additional |
|
conditions |
|
\end{description} |
|
|
|
|
|
\item[publish-metadata] metadata about this resource can be made |
|
freely available when this tag is present. Access to the resource |
|
itself is regulated separately by the \texttt{access} element. |
|
|
|
\item[access] conditions of access to this resource |
|
\begin{description} |
|
\item[internal] access should be restricted to a group of users. The |
|
type of group is defined by one of the following |
|
\begin{description} |
|
\item[institution] the members of this institution. The method |
|
to identify a user to belong to the institution is not |
|
specified in this document. |
|
|
|
\item[subnet] all computers with an IP-address in this subnet. The |
|
subnet is defined in ``truncated-quad'' (e.g. ``141.14'') or |
|
``adress/netmask'' (e.g. ``141.14.0.0/255.255.0.0'') notation. |
|
|
|
\item[group] the members of this named group. The method to |
|
identify a user to belong to a named group is not specified in |
|
this document. |
|
\end{description} |
|
|
|
\item[scientific] access to this resource should be restricted to |
|
scientific work |
|
|
|
\item[free] access to this resource is not restricted |
|
|
|
\item[special] if none of the above conditions seems appropriate, |
|
a free-form text can be specified here. |
|
\end{description} |
|
\end{description} |
|
\end{description} |
|
|
|
\noindent |
|
It should be noted that control over the access to the resource has to |
|
be provided by additional technical measures. Access conditions in the |
|
metadata file only state that conditions \emph{should} be observed, |
|
not that they \emph{are} necessarily observed, as the enforcement of |
|
conditions depends on additional technical measures. |
|
|
|
|
|
|
|
\subsection{Acquisition of raw-data} |
|
\label{sec:acqu-inform} |
|
|
|
Information about the acquisition source for raw data resources can be |
|
provided in an \texttt{acquisition} container. |
|
|
|
\begin{description} |
|
\item[acquisition] the acquisition source of this resource -- required |
|
for raw data. |
|
\begin{description} |
|
\item[provider] where this resource came from -- required |
|
\begin{description} |
|
\item[name] free-text name of the provider (institution or |
|
individual) |
|
|
|
\item[address] address of the provider |
|
|
|
\item[contact] contact person at the provider (i.e. name and email) |
|
|
|
\item[url] URL related to the provider |
|
|
|
\item[provider-id] id of the provider (internally used) -- deduced |
|
\end{description} |
|
|
|
\item[date] date of acquisition -- required |
|
|
|
\item[description] free-text description of the acquisition source or |
|
additional information |
|
\end{description} |
|
\end{description} |
|
|
|
|
|
|
|
\subsection{Documentary Films} |
|
\label{sec:documentary-films} |
|
|
|
Documentary films can be described using a \texttt{film-acquisition} |
|
container. |
|
|
|
\begin{description} |
|
\item[film-acquisition] description of a (documentary) film -- |
|
required for documentary film |
|
\begin{description} |
|
\item[recording] specification of the recording process |
|
\begin{description} |
|
\item[author] the person or persons doing the recording |
|
|
|
\item[date] the date or time span when the film was recorded |
|
|
|
\item[location] the place where the film was recorded |
|
|
|
\item[device] recording device used (e.g. ``Sony CP-DV8 Camcorder'') |
|
|
|
\item[format] format of the recorded film -- required (e.g. ``DV |
|
720x524 25fps interlaced'') |
|
\end{description} |
|
|
|
\item[description] free-form description of the recording and the |
|
content of the film |
|
\end{description} |
|
\end{description} |
|
|
|
(More information about the digitization step could be added in a |
|
\texttt{digitization} tag similar to the \texttt{recording} tag.) |
|
|
\subsection{Access restrictions} |
|
\label{sec:access-restrictions} |
|
|
|
If the access to a resource is restricted for technical or legal |
|
reasons then the restrictions can be put in a |
|
\texttt{access-restrictions} container. The format of the information |
|
inside the container has to be further specified. |
|
|
|
|
|
\section{Sample metadata files for ECHO resources} |
\section{Sample metadata files for ECHO resources} |
Line 713 scanned document.
|
Line 1018 scanned document.
|
|
|
\begin{small} |
\begin{small} |
\begin{verbatim} |
\begin{verbatim} |
<resource type="ECHO"> |
<resource type="ECHO" version="1.0"> |
<description>Fleck, 1980</description> |
<description>Fleck, 1980</description> |
<name>fleck.1980</name> |
<name>fleck.1980</name> |
<creator>University of Bern</creator> |
<creator>University of Bern</creator> |
Line 754 architectural drawing.
|
Line 1059 architectural drawing.
|
|
|
\begin{small} |
\begin{small} |
\begin{verbatim} |
\begin{verbatim} |
<resource type="ECHO"> |
<resource type="ECHO" version="1.0"> |
<creator>Bibliotheca Hertziana</creator> |
<creator>Bibliotheca Hertziana</creator> |
<content-type>scanned images</content-type> |
<content-type>scanned images</content-type> |
<file> |
<file> |