version 1.11, 2003/09/11 14:52:43
|
version 1.12, 2003/12/05 16:11:59
|
Line 16
|
Line 16
|
|
|
\author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess} |
\author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess} |
|
|
\date{V1.0.3 of 11.9.2003} |
\date{V1.1.0 of 5.12.2003} |
|
|
\begin{document} |
\begin{document} |
|
|
Line 32 File and directory names should not cont
|
Line 32 File and directory names should not cont
|
in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen |
in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen |
``-'', underscore ``\_'' and dot ``.''. |
``-'', underscore ``\_'' and dot ``.''. |
|
|
File and directory paths in the metadata file use the conventional |
Files and directories with names that contain illegal characters must |
Unix file separator slash ``/''. |
be transformed to allowed names. A proposition for a simple |
|
transformation rule is |
|
|
|
\begin{itemize} |
|
\item whitespace characters (e.g. blank, tab, cr, lf) are replaced by |
|
hyphens ``-'' |
|
|
|
\item other illegal characters are replaced by underscores ``\_''. |
|
\end{itemize} |
|
|
|
This rule does not provide a reversible mapping to the original |
|
illegal file name and it does not provide a collision-free mapping, |
|
i.e. two different illegal file names might be mapped to the same |
|
allowed file name. Additional precautions for these cases must be |
|
taken. |
|
|
|
|
\section{Metadata files} |
\section{Metadata files} |
Line 72 supplied by the provider of the resource
|
Line 86 supplied by the provider of the resource
|
automatic scripts later in the process, these elements must be present |
automatic scripts later in the process, these elements must be present |
in the final file. |
in the final file. |
|
|
|
File and directory paths in the metadata file use the conventional |
|
Unix file separator slash ``/''. |
|
|
The outer container element is \texttt{resource}. It has the following |
The outer container element is \texttt{resource}. It has the following |
\textbf{attributes}: |
\textbf{attributes}: |
|
|
\begin{description} |
\begin{description} |
\item[type] sub-type of resource (e.g. ``ECHO'', |
\item[type] sub-type of resource (e.g. ``ECHO'', ``MPIWG'') -- |
``MPIWG'') -- optional. |
optional. |
|
|
\item[version] version number of metadata format (currently 1.0) -- |
\item[version] version number of metadata format (currently 1.1) -- |
required. |
required. |
\end{description} |
\end{description} |
|
|
Line 104 The outer container element is \texttt{r
|
Line 121 The outer container element is \texttt{r
|
\item[archive-path] The full path to the resource directory inside the |
\item[archive-path] The full path to the resource directory inside the |
whole archive collection, including the resource directory -- deduced. |
whole archive collection, including the resource directory -- deduced. |
|
|
|
\item[archive-id] The ID for this document in the archive -- |
|
required. |
|
|
\item[derived-from] Container for the description of the original |
\item[derived-from] Container for the description of the original |
resource if this resource is a modified version of another resource |
resource if this resource is a modified version of another resource |
-- optional. |
-- optional. |
|
|
\begin{description} |
\begin{description} |
\item[archive-path] The full path to the original resource |
\item[archive-id] The ID of the original resource |
--required. |
--required. |
|
|
|
\item[archive-path] The full path to the original resource |
|
-- deduced. |
|
|
\item[description] An informal textual description of the relation |
\item[description] An informal textual description of the relation |
of this resource to the original resource -- optional. |
of this resource to the original resource -- optional. |
\end{description} |
\end{description} |
Line 121 The outer container element is \texttt{r
|
Line 144 The outer container element is \texttt{r
|
-- optional. |
-- optional. |
|
|
\begin{description} |
\begin{description} |
\item[archive-path] The full path to the linked resource |
\item[archive-id] The ID of the linked resource |
--required. |
--required. |
|
|
|
\item[archive-path] The full path to the linked resource |
|
-- deduced. |
|
|
\item[description] An informal textual description of the relation |
\item[description] An informal textual description of the relation |
of this resource to the linked resource -- optional. |
of this resource to the linked resource -- optional. |
\end{description} |
\end{description} |
|
|
\item[content-type] The content type of this resource -- required.\\ |
\item[media-type] \label{tag-media-type} The main media type of this |
The content type enables the choice of tools to manipulate and |
resource -- required.\\ The main media type can be overridden by |
display the resource. There should be a common list of content |
\texttt{media-type}s in subdirectories. Possible types are |
types. For digital documents (books, manuscripts) this would be |
\begin{itemize} |
"scanned document", for other image data "scanned |
\item \texttt{image} |
images".\footnote{The criterion for documents is a ordered |
|
succession of image files (pages) and equal image size and |
\item \texttt{text} |
resolution throughout the images of a resource.} |
|
|
\item \texttt{audio} |
|
|
|
\item \texttt{video} |
|
|
|
\item \texttt{data} for other type of data |
|
\end{itemize} |
|
|
\item[meta] Additional metadata information about the resource -- |
\item[meta] Additional metadata information about the resource -- |
optional.\\ For a description of additional metadata see below. |
optional.\\ For a description of additional metadata see below. |
Line 151 The outer container element is \texttt{r
|
Line 183 The outer container element is \texttt{r
|
|
|
\item[name] The name of the subdirectory -- required. |
\item[name] The name of the subdirectory -- required. |
|
|
|
\item[original-name] A text string associated with the directory as |
|
original name -- optional. (E.g. if the data in this directory |
|
came from an external source and had a name that had to be changed |
|
according to section~\ref{sec:file-directory-names} but it should |
|
be possible to reference the original name.) |
|
|
\item[path] The directory path of this subdirectory relative to the |
\item[path] The directory path of this subdirectory relative to the |
resource's root directory (excluding the directory itself) -- |
resource's root directory (excluding the directory itself) -- |
required (may be empty or omitted if the directory is a direct |
required (may be empty or omitted if the directory is a direct |
Line 170 The outer container element is \texttt{r
|
Line 208 The outer container element is \texttt{r
|
|
|
\item[name] The name of the file -- required. |
\item[name] The name of the file -- required. |
|
|
|
\item[original-name] A text string associated with the file as |
|
original name -- optional. (E.g. if this file came from an |
|
external source and had a name that had to be changed according to |
|
section~\ref{sec:file-directory-names} but it should be possible |
|
to reference the original name.) |
|
|
\item[path] The directory path of this file relative to the |
\item[path] The directory path of this file relative to the |
resource's root directory (excluding the file itself) -- required |
resource's root directory (excluding the file itself) -- required |
(may be empty or omitted if the file is in the resource's root |
(may be empty or omitted if the file is in the resource's root |
Line 201 The outer container element is \texttt{r
|
Line 245 The outer container element is \texttt{r
|
\label{sec:additional-metadata} |
\label{sec:additional-metadata} |
|
|
All elements with \texttt{meta} tags can contain an arbitrary number |
All elements with \texttt{meta} tags can contain an arbitrary number |
of additional metadata elements. |
of the following additional metadata elements. |
|
|
|
\subsection{workflow state} |
|
\label{sec:workflow-state} |
|
|
|
All additional metadata elements can have a \texttt{workflow-state} |
|
\textbf{attribute}. This attribute reflects the state of the |
|
corresponding metadata element. The possible values for the |
|
\texttt{workflow-state} attribute are |
|
\begin{itemize} |
|
\item \texttt{preliminary} this information is preliminary. It must |
|
be checked in further workflow steps. |
|
|
|
\item \texttt{inwork} |
|
|
|
\item \texttt{final} |
|
\end{itemize} |
|
|
|
workflow states other than \texttt{preliminary} are part of the |
|
workflow handling of the respective projects. |
|
|
|
Metadata elements can appear multiple times with different |
|
\texttt{workflow-state} attributes. This enables metadata versioning. |
|
|
|
|
|
|
|
\subsection{Content type} |
|
\label{sec:content-type} |
|
|
|
\begin{description} |
|
\item[content-type] \label{tag-content-type} The content type of this |
|
resource -- required.\\ |
|
The content type enables the choice of tools to manipulate and |
|
display the resource. There should be a common list of content |
|
types. For digital documents (books, manuscripts) this would be |
|
"scanned document", for other image data "scanned |
|
images".\footnote{The criterion for documents is a ordered |
|
succession of image files (pages) and equal image size and |
|
resolution throughout the images of a resource.} |
|
\end{description} |
|
|
|
|
|
|
\subsection{Language} |
\subsection{Language} |
\label{sec:lang} |
\label{sec:lang} |
Line 615 tags.
|
Line 700 tags.
|
%%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} |
%%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} |
|
|
|
|
\subsection{Scanned images} |
\subsection{Digital images} |
\label{sec:inform-scann-imag} |
\label{sec:inform-scann-imag} |
|
|
Image files representing scanned images can have an \texttt{img} |
Image files representing scanned images can have an \texttt{img} |
Line 629 Required is one of three possible sets o
|
Line 714 Required is one of three possible sets o
|
\item[img] digital image information. |
\item[img] digital image information. |
|
|
\begin{description} |
\begin{description} |
\item[original-size-x] The width of the original image. The unit of |
\item[original-size-x] The width of the original |
measure can be contained as parameter \texttt{unit}, the default |
image -- required. \\ |
is meter ``m''. The width to be considered is the total width of |
The unit of measure can be contained as parameter \texttt{unit}, |
the scanned area. |
the default is meter ``m''. The width to be considered is the |
|
total width of the scanned area. |
|
|
\item[original-size-y] The height of the original image. |
\item[original-size-y] The height of the original image -- required. |
|
|
\item[original-pixel-x] The width of the hi-res scan in pixels. |
\item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. |
|
|
\item[original-pixel-y] The height of the hi-res scan in pixels. |
\item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
Line 649 or
|
Line 735 or
|
|
|
\begin{description} |
\begin{description} |
\item[original-dpi-x] The resolution of the hi-res scan in its width |
\item[original-dpi-x] The resolution of the hi-res scan in its width |
in pixels per inch. |
in pixels per inch -- required. |
|
|
\item[original-dpi-y] The resolution of the hi-res scan in its height |
\item[original-dpi-y] The resolution of the hi-res scan in its height |
in pixels per inch. |
in pixels per inch -- required. |
|
|
|
\item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. |
|
|
|
\item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
Line 663 or
|
Line 753 or
|
|
|
\begin{description} |
\begin{description} |
\item[original-dpi] The resolution of the hi-res scan in pixels per |
\item[original-dpi] The resolution of the hi-res scan in pixels per |
inch if the resolutions in width and height are the same. |
inch if the resolutions in width and height are the same -- required. |
|
|
|
\item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. |
|
|
|
\item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
|
|
|
|
\subsection{Image acquisition} |
\subsection{Digital image acquisition} |
\label{sec:inform-about-image} |
\label{sec:inform-about-image} |
|
|
A description of the technology used in the process of producing a |
A description of the technology used in the process of producing a |
Line 680 digital image.
|
Line 774 digital image.
|
\begin{description} |
\begin{description} |
\item[device] acquisition device (e.g. ``flatbed scanner'') |
\item[device] acquisition device (e.g. ``flatbed scanner'') |
|
|
\item[image-type] type and color-depth of the image (e.g. ``RGB 24 |
\item[image-type] type and color-depth of the image -- required (e.g. ``RGB 24 |
bit'') |
bit'') |
|
|
\item[postproduction] additional operations on the image |
|
(e.g. ``sharpening, color correction'') |
|
|
|
\item[production-comment] additional textual information about the |
\item[production-comment] additional textual information about the |
production process |
production process |
\end{description} |
\end{description} |
\end{description} |
\end{description} |
|
|
|
|
|
|
\subsection{Full text with images} |
\subsection{Full text with images} |
\label{sec:full-text-with} |
\label{sec:full-text-with} |
|
|
Full text in a XML format will be specified with a |
Full text in a XML format should be specified with a |
\texttt{content-type} ``fulltext''. |
\texttt{content-type}\footnote{see section~\ref{tag-content-type} |
|
on page\pageref{tag-content-type}} ``fulltext''. |
|
|
The relation between the full text and optional images of |
The relation between the full text and optional images of |
whole pages or parts of pages must be specified in a |
whole pages or parts of pages must be specified in a |
Line 710 whole pages or parts of pages must be sp
|
Line 803 whole pages or parts of pages must be sp
|
inside document directory) |
inside document directory) |
|
|
\item[page-images] the directory name of the directory containig the |
\item[page-images] the directory name of the directory containig the |
page image files (with path |
page image files (with path inside document directory) |
inside document directory) |
|
|
|
\item[xslt-file] the file name of an additional XSL transformation |
\item[xslt-file] the file name of an additional XSL transformation |
file |
file |
Line 732 whole pages or parts of pages must be sp
|
Line 824 whole pages or parts of pages must be sp
|
|
|
|
|
|
|
\subsection{Access restrictions} |
\subsection{Copyright and access conditions} |
\label{sec:access-restrictions} |
\label{sec:access-conditions} |
|
|
|
If the access to a resource is bound to conditions for technical or legal |
|
reasons then the conditions can be put in a \texttt{access-conditions} |
|
container. Other access rights conditions like copyright can also be |
|
documented in this container. |
|
|
|
\begin{description} |
|
\item[access-conditions] legal and technical conditions for access to |
|
this resource |
|
|
|
\begin{description} |
|
\item[attribution] The name or institution this resource should be |
|
attributed to when it's publicly presented |
|
|
|
\begin{description} |
|
\item[name] a name (free text) |
|
|
|
\item[url] a URL (with an optional \texttt{label} attribute to show |
|
as text) |
|
\end{description} |
|
|
|
\item[copyright] the copyright owner and it's conditions |
|
\begin{description} |
|
\item[owner] the name of the copyright owner |
|
\begin{description} |
|
\item[name] a name (free text) |
|
|
|
\item[url] a URL (with an optional \texttt{label} attribute to show |
|
as text) |
|
\end{description} |
|
|
|
\item[date] the date when the copyright was issued |
|
|
|
\item[duration] the duration of the copyright (if known) |
|
|
|
\item[description] free-text field for special or additional |
|
conditions |
|
\end{description} |
|
|
|
\item[access] conditions of access to this resource |
|
\begin{description} |
|
\item[internal] access should be restricted to a group of users. The |
|
type of group is defined by one of the following |
|
\begin{description} |
|
\item[institution] the members of this institution. The method |
|
to identify a user to belong to the institution is not |
|
specified in this document. |
|
|
|
\item[subnet] all computers with an IP-address in this subnet. The |
|
subnet is defined in ``truncated-quad'' (e.g. ``141.14'') or |
|
``adress/netmask'' (e.g. ``141.14.0.0/255.255.0.0'') notation. |
|
|
|
\item[group] the members of this named group. The method to |
|
identify a user to belong to a named group is not specified in |
|
this document. |
|
\end{description} |
|
|
|
\item[scientific] access to this resource should be restricted to |
|
scientific work |
|
|
|
\item[free] access to this resource is not restricted |
|
|
|
\item[special] if none of the above conditions seems appropriate, |
|
a free-form text can be specified here. |
|
\end{description} |
|
\end{description} |
|
\end{description} |
|
|
|
\noindent |
|
It should be noted that control over the access to the resource has to |
|
be provided by additional technical measures. Access conditions in the |
|
metadata file only state that conditions \emph{should} be observed, |
|
not that they \emph{are} necessarily observed, as the enforcement of |
|
conditions depends on additional technical measures. |
|
|
|
|
|
|
|
\subsection{Acquisition of raw-data} |
|
\label{sec:acqu-inform} |
|
|
|
Information about the acquisition source for raw data resources can be |
|
provided in an \texttt{acquisition} container. |
|
|
|
\begin{description} |
|
\item[acquisition] the acquisition source of this resource -- required |
|
for raw data. |
|
\begin{description} |
|
\item[provider] where this resource came from -- required |
|
\begin{description} |
|
\item[name] free-text name of the provider (institution or |
|
individual) |
|
|
|
\item[address] address of the provider |
|
|
|
\item[contact] contact person at the provider (i.e. name and email) |
|
|
|
\item[url] URL related to the provider |
|
\end{description} |
|
|
|
\item[date] date of acquisition -- required |
|
|
|
\item[description] free-text description of the acquisition source or |
|
additional information |
|
\end{description} |
|
\end{description} |
|
|
|
|
|
|
|
\subsection{Documentary Films} |
|
\label{sec:documentary-films} |
|
|
|
Documentary films can be described using a \texttt{film-acquisition} |
|
container. |
|
|
|
\begin{description} |
|
\item[film-acquisition] description of a (documentary) film -- |
|
required for documentary film |
|
\begin{description} |
|
\item[recording] specification of the recording process |
|
\begin{description} |
|
\item[author] the person or persons doing the recording |
|
|
|
\item[date] the date or time span when the film was recorded |
|
|
|
\item[location] the place where the film was recorded |
|
|
|
\item[device] recording device used (e.g. ``Sony CP-DV8 Camcorder'') |
|
|
|
\item[format] format of the recorded film -- required (e.g. ``DV |
|
720x524 25fps interlaced'') |
|
\end{description} |
|
|
|
\item[description] free-form description of the recording and the |
|
content of the film |
|
\end{description} |
|
\end{description} |
|
|
|
(More information about the digitization step could be added in a |
|
\texttt{digitization} tag similar to the \texttt{recording} tag.) |
|
|
|
|
If the access to a resource is restricted for technical or legal |
|
reasons then the restrictions can be put in a |
|
\texttt{access-restrictions} container. The format of the information |
|
inside the container has to be further specified. |
|
|
|
|
|
\section{Sample metadata files for ECHO resources} |
\section{Sample metadata files for ECHO resources} |