Diff for /storage/meta/meta-format.tex between versions 1.3 and 1.4

version 1.3, 2003/07/01 17:51:40 version 1.4, 2003/07/23 10:35:06
Line 7 Line 7
 %\usepackage{courier}  %\usepackage{courier}
   
 % create in-text links black (with PDF)  % create in-text links black (with PDF)
 \usepackage[colorlinks=true,linkcolor=black]{hyperref}  %\usepackage[colorlinks=true,linkcolor=black]{hyperref}
 % Format URLs nicely (without PDF)  % Format URLs nicely (without PDF)
 %\usepackage{url}  \usepackage{url}
   
   
 \title{A simple metadata format for resource bundles}  \title{A simple metadata format for resource bundles}
   
 \author{Robert Casties, Dirk Wintergrün, Christoph Liess}  \author{Robert Casties, Dirk Wintergrün, Hans-Christoph Liess}
   
 \date{V0.2.2 of \today}  \date{V0.3pre2 of \today}
   
 \begin{document}  \begin{document}
   
Line 35  in filenames are only the alphanumeric s Line 35  in filenames are only the alphanumeric s
 File and directory paths in the metadata file use the conventional  File and directory paths in the metadata file use the conventional
 Unix file separator slash ``/''.  Unix file separator slash ``/''.
   
   
   \section{Metadata files}
   \label{sec:metadata-files}
   
   The metadata information is stored in the XML format documented below
   in special files in the resource directory. Two forms of metadata
   files are possible:
   \begin{itemize}
   \item a file named \texttt{index.meta} in a directory.
   
   \item a file named like the data file it describes with an
     additional extension \texttt{.meta}. For example metadata for the
     file \texttt{0001.tif} would be in a file \texttt{0001.tif.meta}.
   \end{itemize}
   
   The resource directory must contain an \texttt{index.meta} file with
   information about the resource as a whole. Other directories can
   contain \texttt{index.meta} files.
   
   Additional information about single data files that are part of the
   resource can either be put in \texttt{file} tags in the
   \texttt{index.meta} file or in separate \emph{filename}\texttt{.meta}
   files for each data file. Information from the directory level file is
   inherited at the file level.
   
   
 \section{Resource format}  \section{Resource format}
 \label{sec:mpiwg-doc}  \label{sec:mpiwg-doc}
   
Line 43  by the provider of the resource and may Line 69  by the provider of the resource and may
 the metadata file. Elements marked ``required'' must be supplied by  the metadata file. Elements marked ``required'' must be supplied by
 the provider of the resource. Elements marked ``deduced'' can be  the provider of the resource. Elements marked ``deduced'' can be
 supplied by the provider of the resource but can also be provided by  supplied by the provider of the resource but can also be provided by
 automatic scripts later in the process, the elements must be present  automatic scripts later in the process, these elements must be present
 in the final file.  in the final file.
   
 The outer container is named \texttt{resource}. Sub-types (``ECHO'',  The outer container element is \texttt{resource}. Sub-types (``ECHO'',
 ``MPIWG'') can be specified if necessary with a \texttt{type}  ``MPIWG'') can be specified if necessary with a \texttt{type}
 parameter. Its sub-elements are:  parameter. Its sub-elements are:
   
Line 60  parameter. Its sub-elements are: Line 86  parameter. Its sub-elements are:
 \item[creator] The name of the project or person that created the  \item[creator] The name of the project or person that created the
   resource -- optional.    resource -- optional.
   
 \item[archive-creation-date] The time and date the archive was created  \item[archive-creation-date] The time and date the archive collection
   -- deduced.    was created -- deduced.
   
   \item[archive-storage-date] The time and date the archive was written
     to permanent storage -- deduced (must not be set by the user).
   
 \item[archive-path] The full path to the resource directory inside the  \item[archive-path] The full path to the resource directory inside the
   whole archive collection -- deduced.    whole archive collection -- deduced.
Line 164  parameter. Its sub-elements are: Line 193  parameter. Its sub-elements are:
 All elements with \texttt{meta} tags can contain an arbitrary number  All elements with \texttt{meta} tags can contain an arbitrary number
 of additional metadata elements.  of additional metadata elements.
   
   \subsection{Language}
   \label{sec:lang}
   
   The language of a resource (e.g. a text) can be specified with a
   \texttt{lang} tag. Languages have to be described using the
   international codes for the representation of names of languages
   either in two-letter form (ISO 639-1) or in three-letter form (ISO
   639-2).  The entire catalogue of languages is documented on the page
   
   \url{http://www.loc.gov/standards/iso639-2/englangn.html}
   
   
 \subsection{DRI}  \subsection{DRI}
 \label{sec:dri}  \label{sec:dri}
   
 The \emph{digital resource identifier} for the resource is specified  The \emph{digital resource identifier} for the resource is specified
 with a \texttt{dri} tag. Digital resource identifiers are documented  in a \texttt{dri} element. Digital resource identifiers are documented
 on the page  on the page
   
 \url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.  \url{http://pythia.mpiwg-berlin.mpg.de/projects/standards/dri}.
   
   
   
   \subsection{Collection context}
   \label{sec:collection-context}
   
   The context of a resource as part of a collection or part of a project can be
   specified in the \texttt{context} element:
   
   \begin{description}
   \item[link] URL to additional context information.
   
   \item[name] Textual description of project or collection.
   \end{description}
   \noindent multiple \texttt{link} or \texttt{name} elements are
   possible.
   
   
   
 \subsection{Bibliographic information}  \subsection{Bibliographic information}
 \label{sec:bibliographic-data}  \label{sec:bibliographic-data}
   
Line 182  Bibliographic information in the format Line 239  Bibliographic information in the format
 bibliographic data (cf. content workflow) or the MPIWG  bibliographic data (cf. content workflow) or the MPIWG
 ``Projektbibliografie'' is presented in a \texttt{bib} container with  ``Projektbibliografie'' is presented in a \texttt{bib} container with
 a \texttt{type} parameter, giving the type of bibliographic resource.  a \texttt{type} parameter, giving the type of bibliographic resource.
 The \texttt{type} field is repeated as a tag in the container. The  The \texttt{type} field can be repeated as a tag in the container.
 tags have the variable ``human-readable'' field names.  
   
   \subsubsection{Book}
   
   \begin{description}
   
   \item [bib type="book"] a published book.
   
     \begin{description}
     \item [author] The author of the book.
     \item [year] The year of publication.
     \item [title] Title of the book.
     \item [series-editor] Name of the series editor, if the book appears
       in a series.
     \item [series-title] Title of the serie, if the book appears in a
       series.
     \item [series-volume] Volume number, if the book appears in a
       series.
     \item [number-of-pages] Number of pages of the entire book.
     \item [city] City where the book was published.
     \item [publisher] Name of the publishing company
     \item [edition] Edition of the book (e.g. third edition)
     \item [number-of-volumes] Number of volumes, if the the book is
       published in multiple volumes.
     \item [translator] Name of the translator.
     \item [isbn-issn]
     \end{description}
   \end{description}
   
   \subsubsection{In Book}
   
   \begin{description}
   \item [bib type="inbook"] an article as part of a book.
   
     \begin{description}
     \item [author] The author of the book.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [editor] Name of the book's editor.
     \item [book-title] Title of the book.
     \item [series-volume] Volume number, if the book appears in a
       series.
     \item [pages] Number of pages of the article.
     \item [city] City where the book was published.
     \item [publisher] Name of the publishing company
     \item [edition] Edition of the book (e. g. third edition)
     \item [series-author] Name of the series editor, if the book appears
       in a series.
     \item [series-title] Title of the series, if the book appears in a
       series.
     \item [number-of-volumes] Number of volumes, if the the book is
       published in multiple volumes.
     \item [translator] Name of the translator
     \item [isbn-issn]
     \end{description}
   \end{description}
   
   \subsubsection{Proceedings}
   
   \begin{description}
   \item [bib type="proceedings"] a conference proceedings publication.
   
     \begin{description}
     \item [author] The author of the article.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [editor] Name of the book's editor.
     \item [conference-name] Name of the conference the proceedings are
       related to.
     \item [volume] Volume number.
     \item [pages] Number of pages of the article.
     \item [date] Date of the conference the proceedings are related to.
     \item [conference]-location City where the conference was held.
     \item [publisher] Name of the publishing company
     \item [edition] Edition of the book (e. g. third edition)
     \item [series-editor] Name of the series editor, if the book appears
       in a series.
     \item [series-title] Title of the series, if the book appears in a
       series.
     \item [number-of-volumes] Number of volumes, if the the book is
       published as multiple volumes.
     \item [isbn-issn]
     \end{description}
   \end{description}
   
   \subsubsection{Edited Book}
   
   \begin{description}
   \item[bib type="edited-book"] a book that is the edition of another
     work.
   
     \begin{description}
     \item [editor] Name of the editor of the book.
     \item [year] The year of publication.
     \item [title] Title of the book.
     \item [series-editor] Name of the editor of the series the book is
       part of.
     \item [series-title] Title of the series, if the book is part of a
       series.
     \item [series-volume] Volume number, if the book appears in a series.
     \item [number-of-pages] Number of pages of the article.
     \item [city] City where the book was published.
     \item [publisher] Name of the publishing company
     \item [edition] Information about the edition (e.g. ``Repr. of the London ed. 1652'')
     \item [number-of-volumes] Number of volumes, if the the book is
       published as multiple volumes.
     \item [isbn-issn]
     \end{description}
   \end{description}
   
   \subsubsection{Journal Article}
   
   \begin{description}
   \item [bib type="journal-article"] an article in a scientific journal.
     \begin{description}
     \item [author] The author of the article.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [journal] Name of the journal.
     \item [volume] Volume number, if the journal appears in a series.
     \item [issue] Number of the issue the article is part of.
     \item [pages] Number of pages of the article.
     \item [alternate-journal] Alternate Journal
     \item [isbn-issn]
     \end{description}
   \end{description}
   
   \subsubsection{Magazine Article}
   
   \begin{description}
   \item [bib type="magazine-article"] an article in a popular magazine.
     \begin{description}
     \item [author] The author of the book.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [magazine] Name of the magazine.
     \item [volume] Volume number, if the book appears in a series.
     \item [issue-number] Number of the issue the article is part of.
     \item [pages Number] of pages of the article.
     \item [date] Date when the article appeared.
     \end{description}
   \end{description}
   
   \subsubsection{Newspaper Article}
   
   \begin{description}
   \item [bib type="newspaper-article"] an article in a newspaper.
     \begin{description}
     \item [author] The author of the article.
     \item [year] The year of publication.
     \item [title] Title of the article.
     \item [Newspaper] Name of the newspaper the article appeared in.
     \item [pages] Number of pages of the article.
     \item [issue-date] Date of the issue the article is part of.
     \item [city] City of the newspaper.
     \end{description}
   \end{description}
   
   \subsubsection{Thesis}
   
   \begin{description}
   \item [bib type="thesis"] a master/doctorate/etc. thesis.
     \begin{description}
     \item [author] The author of the thesis.
     \item [year] The year of publication.
     \item [title] Title of the thesis.
     \item [academic-department] Name of the academic department where
       the thesis was handed in.
     \item [number-of-pages] Number of pages of the thesis.
     \item [city] City where the thesis was published.
     \item [University] Name of the university where the thesis was
       handed in.
     \item [isbn-issn]
     \end{description}
   \end{description}
   
   \subsubsection{Report}
   
   \begin{description}
   \item [bib type="report"] a scientific report.
     \begin{description}
     \item [author] The author of the report.
     \item [year] The year of publication.
     \item [title] Title of the report.
     \item [pages] Number of pages of the report.
     \item [date] Date when the report appeared.
     \item [city] City where the book was published.
     \item [institution] Institution where the report was produced.
     \item [type] Type of report.
     \item [report-number] Report number.
     \end{description}
   \end{description}
   
   \subsubsection{Generic}
   
   \begin{description}
   \item [bib type="generic"] a generic bibliographic type. This type
     should only be used in rare cases.
     \begin{description}
     \item [author]
     \item [year]
     \item [title]
     \item [secondary-author]
     \item [secondary-title]
     \item [volume]
     \item [number]
     \item [pages]
     \item [date]
     \item [place-published]
     \item [publisher]
     \item [edition]
     \item [tertiary author]
     \item [tertiary-title]
     \item [number-of-volumes]
     \item [type-of-work]
     \item [subsidiary author]
     \item [alternate-title]
     \item [isbn-issn]
     \item [call-number]
     \item [label]
     \item [keywords]
     \item [abstract]
     \item [notes]
     \item [url]
   \end{description}
   \end{description}
   
   
   \subsection{Architectural drawings}
   \label{sec:doc}
   
   Specific information for architectural drawings is presented in a
   \texttt{doc} container. All elements can appear multiple times.
   
   \begin{description}
   \item [person] last name and first name of a person, separated by a
     comma. A further common name for the person can be put infront,
     separated by a semicolon.
   \item [location] Name of a place in its common notation. This can
     be a city or a institution.
   \item [date] This can be a year (or several years, separated by commas) or a period
     (1706-1714). Years are noted with four digits.
   \item [object] Short description of an object or signatures.
   \item [keywords] Keywords related to the object.
   \end{description}
   
   
 \subsection{Information on the document structure (table of contents)}  \subsection{Information on the document structure (table of contents)}
 \label{sec:toc}  \label{sec:toc}
   
 Document structure information like a table of contents for a scanned  Information on the structure of a document like the division into
 document is presented in a \texttt{toc} container. The format to be  parts and chapters in the way of a table of contents is presented in a
 used has to be further specified. The format could be based on the so  \texttt{toc} container. 
 called ``LiSe-XML'' format. For a detailed description and an  
 exemplary set of TOC information see:  The scheme allows multiple logical pages on a single page image
   as it is often the case with scanned books or manuscripts. The scheme
   also allows for ``loose'' numbering schemes with roman, arabic or
   other page numbers consecutively or mixed and changes in the numbering
   within the document.
   
   The flexibility comes from the fact that no additional assumptions
   about the mapping between logical pages and page images are made in
   the format. All mapping information is specified by the user.
   
   The logical page numbering or naming that can be presented to the user
   is specified in the \texttt{name} tags while the physical numbering of
   the page images is specified in the \texttt{index} or \texttt{url}
   tags.
   
 \url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}  \begin{description}
   \item[page] describes a single logical page
     \begin{description}
     \item[name] the ``name'' of the logical page. This can be any string
       like a page number (arabic, roman, etc.) or a special designation
       like ``Table 5''.
       
     \item[index] the \texttt{digilib} index number\footnote{The index
         number for digilib is the index in the alphabetical order of the
         scan file names.} of the scan image of the page.
       
     \item[url] alternatively to the \texttt{digilib} index number the
       full URL of the scan image of the page can be used.
     \end{description}
   
 \url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TSlise/lise_downloads/deimel1929.xml}  \item[chapter] describes a section or chapter of the text.
     \texttt{chapter} elements can be nested.
     \begin{description}
     \item[name] the title of the chapter or section.
       
     \item[start] the beginning of a page range (usually the first page
       of the chapter). The \texttt{start} element has an optional
       \texttt{increment} attribute to indicate the number of logical
       pages on a scan image.\footnote{This information is only needed by
         additional tools that try to generate lists of all page and
         image numbers.}
       \begin{description}
       \item[name] the ``name'' of the first page (see \texttt{page}).
   
       \item[index] the index of the first page (see \texttt{page}).
   
       \item[url] the URL of the first page (see \texttt{page}).
       \end{description}
     
     \item[end] the end of a page range (usually the last page of the
       chapter).
       \begin{description}
       \item[name] the ``name'' of the last page (see \texttt{page}).
   
       \item[index] the index of the last page (see \texttt{page}).
   
       \item[url] the URL of the last page (see \texttt{page}).
       \end{description}
     
     \item[page] alternative (and additional) to
       \texttt{start}/\texttt{end} page ranges single \texttt{page}
       elements can be used inside \texttt{chapter}.
     \end{description}
   \end{description}
   
   %%\url{http://pythia.mpiwg-berlin.mpg.de/toolserver/TS_lise}
   
   
 \subsection{Information on scanned images}  \subsection{Information on scanned images}
Line 250  reasons then the restrictions can be put Line 612  reasons then the restrictions can be put
 inside the container has to be further specified.  inside the container has to be further specified.
   
   
 \section{Sample metadata file for an ECHO resource}  \section{Sample metadata files for ECHO resources}
   
 The following is the sample structure for a scanned document resource.  
   
   The following is a sample structures for a scanned document.
 \begin{verbatim}  \begin{verbatim}
 <resource type="ECHO">  <resource type="ECHO">
     <description></description>      <description>Fleck, 1980</description>
     <name>fleck.1980</name>      <name>fleck.1980</name>
     <creator>University of Bern</creator>      <creator>University of Bern</creator>
     <archive-creation-date></archive-creation-date>  
     <archive-path>ubern/wiss-theorie</archive-path>      <archive-path>ubern/wiss-theorie</archive-path>
     <content-type>scanned images</content-type>      <content-type>scanned images</content-type>
     <meta>      <meta>
         <dri>echo23a45e2329x</dri>          <dri>echo23a45e2329x</dri>
           <lang>ger</lang>
         <bib type="book">          <bib type="book">
             <author>Fleck, Ludwik</author>              <author>Fleck, Ludwik</author>
             <year>1980</year>              <year>1980</year>
             <title>Entstehung und Entwicklung einer               <title>Entstehung und Entwicklung einer 
                    wissenschaftlichen Tatsache</title>                     wissenschaftlichen Tatsache</title>
             <series_editor></series_editor>              <series-editor></series-editor>
             <series_title></series_title>              <series-title></series-title>
             <series_volume></series_volume>              <series-volume></series-volume>
             <number_of_pages></number_of_pages>              <number-of-pages></number-of-pages>
             <city>Frankfurt am Main</city>              <city>Frankfurt am Main</city>
             <publisher>Suhrkamp</publisher>              <publisher>Suhrkamp</publisher>
             <edition></edition>              <edition></edition>
             <number_of_volumes></number_of_volumes>              <number-of-volumes></number-of-volumes>
             <translator></translator>              <translator></translator>
             <isbn></isbn>              <isbn></isbn>
             <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>              <keywords>Wissenschaftstheorie, Fleck, Tatsache</keywords>
Line 286  The following is the sample structure fo Line 647  The following is the sample structure fo
     <dir>      <dir>
          <description>Scanned images (300dpi)</description>           <description>Scanned images (300dpi)</description>
          <name>img</name>           <name>img</name>
          <path></path>  
          <meta></meta>  
     </dir>      </dir>
 </resource>  </resource>
 \end{verbatim}  \end{verbatim}
   
   The following is a sample metadata structure for an architectural
   drawing.
   
   \begin{verbatim}
   <resource type="ECHO">
       <creator>Bibliotheca Hertziana</creator>
       <content-type>scanned images</content-type>
       <file>
           <name>00000271-asl-160-r-full.tif</name>
           <meta>
               <img>
                   <original-dpi>315</original-dpi>
               </img>
               <dri>echo45a67bc4367d</dri>
               <lang>ita</lang>
               <doc type="Architectural Drawing">
                       <person>Ciolli, Giacomo</person>
                       <person>Urban VIII; Barberini, Maffeo</person>
                       <location>Accademia di San Luca</location>
                       <location>Roma</location>
                       <date>1706</date>
                       <object>Concorso Clementino</object>
                       <object>Fontana Pubblica</object>
                       <object>Brunnen</object>
                       <object>ASL 160</object>
                       <keywords></keywords>
               </doc>
               <collection-context>
                      <url>http://colosseum.biblhertz.it:8080/Lineamenta/
                      1033478408.39/1035196181.35/1035196204.09/1035394121.83
                      </url>
               </collection-context>
           </meta>
       </file>
   </resource>
   \end{verbatim}
   
 \end{document}  \end{document}
   
 %%% Local Variables:   %%% Local Variables: 

Removed from v.1.3  
changed lines
  Added in v.1.4


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>