--- storage/meta/meta-format.tex 2003/09/11 14:52:43 1.11 +++ storage/meta/meta-format.tex 2003/12/05 16:11:59 1.12 @@ -16,7 +16,7 @@ \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess} -\date{V1.0.3 of 11.9.2003} +\date{V1.1.0 of 5.12.2003} \begin{document} @@ -32,8 +32,22 @@ File and directory names should not cont in filenames are only the alphanumeric set a-z, A-Z, 0-9, hyphen ``-'', underscore ``\_'' and dot ``.''. -File and directory paths in the metadata file use the conventional -Unix file separator slash ``/''. +Files and directories with names that contain illegal characters must +be transformed to allowed names. A proposition for a simple +transformation rule is + +\begin{itemize} +\item whitespace characters (e.g. blank, tab, cr, lf) are replaced by + hyphens ``-'' + +\item other illegal characters are replaced by underscores ``\_''. +\end{itemize} + +This rule does not provide a reversible mapping to the original +illegal file name and it does not provide a collision-free mapping, +i.e. two different illegal file names might be mapped to the same +allowed file name. Additional precautions for these cases must be +taken. \section{Metadata files} @@ -72,14 +86,17 @@ supplied by the provider of the resource automatic scripts later in the process, these elements must be present in the final file. +File and directory paths in the metadata file use the conventional +Unix file separator slash ``/''. + The outer container element is \texttt{resource}. It has the following \textbf{attributes}: \begin{description} -\item[type] sub-type of resource (e.g. ``ECHO'', - ``MPIWG'') -- optional. +\item[type] sub-type of resource (e.g. ``ECHO'', ``MPIWG'') -- + optional. -\item[version] version number of metadata format (currently 1.0) -- +\item[version] version number of metadata format (currently 1.1) -- required. \end{description} @@ -103,14 +120,20 @@ The outer container element is \texttt{r \item[archive-path] The full path to the resource directory inside the whole archive collection, including the resource directory -- deduced. + +\item[archive-id] The ID for this document in the archive -- + required. \item[derived-from] Container for the description of the original resource if this resource is a modified version of another resource -- optional. \begin{description} + \item[archive-id] The ID of the original resource + -- required. + \item[archive-path] The full path to the original resource - --required. + -- deduced. \item[description] An informal textual description of the relation of this resource to the original resource -- optional. @@ -121,21 +144,30 @@ The outer container element is \texttt{r -- optional. \begin{description} + \item[archive-id] The ID of the linked resource + -- required. + \item[archive-path] The full path to the linked resource - --required. + -- deduced. \item[description] An informal textual description of the relation of this resource to the linked resource -- optional. \end{description} -\item[content-type] The content type of this resource -- required.\\ - The content type enables the choice of tools to manipulate and - display the resource. There should be a common list of content - types. For digital documents (books, manuscripts) this would be - "scanned document", for other image data "scanned - images".\footnote{The criterion for documents is a ordered - succession of image files (pages) and equal image size and - resolution throughout the images of a resource.} +\item[media-type] \label{tag-media-type} The main media type of this + resource -- required.\\ The main media type can be overridden by + \texttt{media-type}s in subdirectories. Possible types are + \begin{itemize} + \item \texttt{image} + + \item \texttt{text} + + \item \texttt{audio} + + \item \texttt{video} + + \item \texttt{data} for other type of data + \end{itemize} \item[meta] Additional metadata information about the resource -- optional.\\ For a description of additional metadata see below. @@ -151,6 +183,12 @@ The outer container element is \texttt{r \item[name] The name of the subdirectory -- required. + \item[original-name] A text string associated with the directory as + original name -- optional. (E.g. if the data in this directory + came from an external source and had a name that had to be changed + according to section~\ref{sec:file-directory-names} but it should + be possible to reference the original name.) + \item[path] The directory path of this subdirectory relative to the resource's root directory (excluding the directory itself) -- required (may be empty or omitted if the directory is a direct @@ -170,6 +208,12 @@ The outer container element is \texttt{r \item[name] The name of the file -- required. + \item[original-name] A text string associated with the file as + original name -- optional. (E.g. if this file came from an + external source and had a name that had to be changed according to + section~\ref{sec:file-directory-names} but it should be possible + to reference the original name.) + \item[path] The directory path of this file relative to the resource's root directory (excluding the file itself) -- required (may be empty or omitted if the file is in the resource's root @@ -201,7 +245,48 @@ The outer container element is \texttt{r \label{sec:additional-metadata} All elements with \texttt{meta} tags can contain an arbitrary number -of additional metadata elements. +of the following additional metadata elements. + +\subsection{workflow state} +\label{sec:workflow-state} + +All additional metadata elements can have a \texttt{workflow-state} +\textbf{attribute}. This attribute reflects the state of the +corresponding metadata element. The possible values for the +\texttt{workflow-state} attribute are +\begin{itemize} +\item \texttt{preliminary} this information is preliminary. It must + be checked in further workflow steps. + +\item \texttt{inwork} + +\item \texttt{final} +\end{itemize} + +workflow states other than \texttt{preliminary} are part of the +workflow handling of the respective projects. + +Metadata elements can appear multiple times with different +\texttt{workflow-state} attributes. This enables metadata versioning. + + + +\subsection{Content type} +\label{sec:content-type} + +\begin{description} +\item[content-type] \label{tag-content-type} The content type of this + resource -- required.\\ + The content type enables the choice of tools to manipulate and + display the resource. There should be a common list of content + types. For digital documents (books, manuscripts) this would be + "scanned document", for other image data "scanned + images".\footnote{The criterion for documents is a ordered + succession of image files (pages) and equal image size and + resolution throughout the images of a resource.} +\end{description} + + \subsection{Language} \label{sec:lang} @@ -615,7 +700,7 @@ tags. %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise} -\subsection{Scanned images} +\subsection{Digital images} \label{sec:inform-scann-imag} Image files representing scanned images can have an \texttt{img} @@ -629,16 +714,17 @@ Required is one of three possible sets o \item[img] digital image information. \begin{description} - \item[original-size-x] The width of the original image. The unit of - measure can be contained as parameter \texttt{unit}, the default - is meter ``m''. The width to be considered is the total width of - the scanned area. + \item[original-size-x] The width of the original + image -- required. \\ + The unit of measure can be contained as parameter \texttt{unit}, + the default is meter ``m''. The width to be considered is the + total width of the scanned area. - \item[original-size-y] The height of the original image. + \item[original-size-y] The height of the original image -- required. - \item[original-pixel-x] The width of the hi-res scan in pixels. + \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. - \item[original-pixel-y] The height of the hi-res scan in pixels. + \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. \end{description} \end{description} @@ -649,10 +735,14 @@ or \begin{description} \item[original-dpi-x] The resolution of the hi-res scan in its width - in pixels per inch. + in pixels per inch -- required. \item[original-dpi-y] The resolution of the hi-res scan in its height - in pixels per inch. + in pixels per inch -- required. + + \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. + + \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. \end{description} \end{description} @@ -663,13 +753,17 @@ or \begin{description} \item[original-dpi] The resolution of the hi-res scan in pixels per - inch if the resolutions in width and height are the same. + inch if the resolutions in width and height are the same -- required. + + \item[original-pixel-x] The width of the hi-res scan in pixels -- deduced. + + \item[original-pixel-y] The height of the hi-res scan in pixels -- deduced. \end{description} \end{description} -\subsection{Image acquisition} +\subsection{Digital image acquisition} \label{sec:inform-about-image} A description of the technology used in the process of producing a @@ -678,25 +772,24 @@ digital image. \begin{description} \item[image-acquisition] description of the image production process \begin{description} - \item[device] acquisition device (e.g. ``flatbed scanner'') + \item[device] acquisition device (e.g. ``flatbed scanner'') - \item[image-type] type and color-depth of the image (e.g. ``RGB 24 + \item[image-type] type and color-depth of the image -- required (e.g. ``RGB 24 bit'') - \item[postproduction] additional operations on the image - (e.g. ``sharpening, color correction'') - \item[production-comment] additional textual information about the production process \end{description} \end{description} + \subsection{Full text with images} \label{sec:full-text-with} -Full text in a XML format will be specified with a -\texttt{content-type} ``fulltext''. +Full text in a XML format should be specified with a +\texttt{content-type}\footnote{see section~\ref{tag-content-type} +on page\pageref{tag-content-type}} ``fulltext''. The relation between the full text and optional images of whole pages or parts of pages must be specified in a @@ -708,10 +801,9 @@ whole pages or parts of pages must be sp \begin{description} \item[text-file] the file name of the full text file (with path inside document directory) - + \item[page-images] the directory name of the directory containig the - page image files (with path - inside document directory) + page image files (with path inside document directory) \item[xslt-file] the file name of an additional XSL transformation file @@ -732,13 +824,149 @@ whole pages or parts of pages must be sp -\subsection{Access restrictions} -\label{sec:access-restrictions} +\subsection{Copyright and access conditions} +\label{sec:access-conditions} + +If the access to a resource is bound to conditions for technical or legal +reasons then the conditions can be put in a \texttt{access-conditions} +container. Other access rights conditions like copyright can also be +documented in this container. + +\begin{description} +\item[access-conditions] legal and technical conditions for access to + this resource + + \begin{description} + \item[attribution] The name or institution this resource should be + attributed to when it's publicly presented + + \begin{description} + \item[name] a name (free text) + + \item[url] a URL (with an optional \texttt{label} attribute to show + as text) + \end{description} + + \item[copyright] the copyright owner and it's conditions + \begin{description} + \item[owner] the name of the copyright owner + \begin{description} + \item[name] a name (free text) + + \item[url] a URL (with an optional \texttt{label} attribute to show + as text) + \end{description} + + \item[date] the date when the copyright was issued + + \item[duration] the duration of the copyright (if known) + + \item[description] free-text field for special or additional + conditions + \end{description} + + \item[access] conditions of access to this resource + \begin{description} + \item[internal] access should be restricted to a group of users. The + type of group is defined by one of the following + \begin{description} + \item[institution] the members of this institution. The method + to identify a user to belong to the institution is not + specified in this document. + + \item[subnet] all computers with an IP-address in this subnet. The + subnet is defined in ``truncated-quad'' (e.g. ``141.14'') or + ``adress/netmask'' (e.g. ``141.14.0.0/255.255.0.0'') notation. + + \item[group] the members of this named group. The method to + identify a user to belong to a named group is not specified in + this document. + \end{description} + + \item[scientific] access to this resource should be restricted to + scientific work + + \item[free] access to this resource is not restricted + + \item[special] if none of the above conditions seems appropriate, + a free-form text can be specified here. + \end{description} + \end{description} +\end{description} + +\noindent +It should be noted that control over the access to the resource has to +be provided by additional technical measures. Access conditions in the +metadata file only state that conditions \emph{should} be observed, +not that they \emph{are} necessarily observed, as the enforcement of +conditions depends on additional technical measures. + + + +\subsection{Acquisition of raw-data} +\label{sec:acqu-inform} + +Information about the acquisition source for raw data resources can be +provided in an \texttt{acquisition} container. + +\begin{description} +\item[acquisition] the acquisition source of this resource -- required + for raw data. + \begin{description} + \item[provider] where this resource came from -- required + \begin{description} + \item[name] free-text name of the provider (institution or + individual) + + \item[address] address of the provider + + \item[contact] contact person at the provider (i.e. name and email) + + \item[url] URL related to the provider + \end{description} + + \item[date] date of acquisition -- required + + \item[description] free-text description of the acquisition source or + additional information + \end{description} +\end{description} + + + +\subsection{Documentary Films} +\label{sec:documentary-films} + +Documentary films can be described using a \texttt{film-acquisition} +container. + +\begin{description} +\item[film-acquisition] description of a (documentary) film -- + required for documentary film + \begin{description} + \item[recording] specification of the recording process + \begin{description} + \item[author] the person or persons doing the recording + + \item[date] the date or time span when the film was recorded + + \item[location] the place where the film was recorded + + \item[device] recording device used (e.g. ``Sony CP-DV8 Camcorder'') + + \item[format] format of the recorded film -- required (e.g. ``DV + 720x524 25fps interlaced'') + \end{description} + + \item[description] free-form description of the recording and the + content of the film + \end{description} +\end{description} + +(More information about the digitization step could be added in a +\texttt{digitization} tag similar to the \texttt{recording} tag.) + -If the access to a resource is restricted for technical or legal -reasons then the restrictions can be put in a -\texttt{access-restrictions} container. The format of the information -inside the container has to be further specified. \section{Sample metadata files for ECHO resources}