Annotation of worldwide-digilib/worldwide-digilib.tex, revision 1.1
1.1 ! casties 1: \documentclass[a4paper]{article}
! 2:
! 3: \usepackage[latin1]{inputenc}
! 4: \usepackage[T1]{fontenc}
! 5: \usepackage{ae}
! 6:
! 7: \usepackage{url}
! 8: %\usepackage{hyperref}
! 9:
! 10: %% for latex2rtf :-(
! 11: % remember to replace "%" in URL!
! 12: %\newcommand{\url}[1]{\verb!#1!}
! 13: %\renewenvironment{footnotesize}{}{}
! 14:
! 15:
! 16: \newcommand{\digilib}{\texttt{digilib}}
! 17:
! 18: \title{Draft: World Wide digilib -- Resource Identifier in ECHO}
! 19:
! 20: \author{Robert Casties\thanks{IT-Group, Max Planck Institute for the
! 21: history of science}}
! 22:
! 23: \date{Version~0.6 of \today}
! 24:
! 25: \begin{document}
! 26:
! 27: \maketitle
! 28:
! 29: \tableofcontents
! 30:
! 31: \section{Digital Resource Identifier DRI}
! 32:
! 33: The \emph{Digital Resource Identifier} is a worldwide unique
! 34: identifier for a digital resource. The resource may be an electronic
! 35: text, single or multiple digital images, an audiovisual media file or
! 36: other type of electronic resource that is accessible over the
! 37: Internet.
! 38:
! 39: The identifier provides a stable point of reference for digital
! 40: resources in the Internet. The identifier is therefore independent
! 41: from the address, implementation and directory layout of the location
! 42: of the resource. The identifier is unique and constant and it can be
! 43: used in other documents to reference the resource without the risk of
! 44: having a broken reference in the future because the address or filename
! 45: of the resource has changed.
! 46:
! 47: The identifier supports infrastructure for the ``sustainability'' of
! 48: digital resources to guarantee that not only the identifier always
! 49: points to the same resource but also the resource stays available in
! 50: the Internet. The infrastructure supports backup copies and load
! 51: balancing mechanisms. The implementation and enduring support of the
! 52: actual servers and digital resources is in itself mostly an
! 53: organisational and social challenge that cannot be solved by
! 54: technological measures alone.
! 55:
! 56:
! 57:
! 58: \subsection{Structure of the DRI}
! 59: \label{sec:structure-dri}
! 60:
! 61: The \emph{Digital Resource Identifier} has the following properties:
! 62:
! 63: \begin{itemize}
! 64: \item Total address space of 70 bit, partitioned into a million
! 65: subspaces of 50 bit for $10^{15}$ or 1125 billion different
! 66: resources per subspace.
! 67:
! 68: \item The identifier contains only (uppercase) letters and digits.
! 69:
! 70: \item The identifier is composed of a 4 character \emph{subspace} or
! 71: \emph{namespace identifier}, a 10 character \emph{resource
! 72: identifier} and a 1 character checksum, giving a total of 15
! 73: characters for the full DRI.
! 74: \end{itemize}
! 75:
! 76:
! 77:
! 78:
! 79: \subsection{Character set}
! 80: \label{sec:charset}
! 81:
! 82: The identifier is composed only of letters and digits. Uppercase and
! 83: lowercase letters are not distinguished. The resulting character set
! 84: has $26+10=36$ characters. Four characters with ambiguous shapes that
! 85: might lead to errors are omitted: ``O'' (vs. ``0''), ``I'' (vs. ``1''
! 86: or ``l''), ``L'' (vs. ``1'' or ``I''), and ``J'' (vs ``1'' or ``I'').
! 87: The resulting set of 32 characters can be used to represent 5 bit of
! 88: information.
! 89:
! 90: \begin{table}[htbp]
! 91: \centering
! 92: \begin{footnotesize}
! 93: \begin{tabular}{cc|cc|cc|cc}
! 94: character & value & character & value & character & value &
! 95: character & value \\ \hline
! 96: 0 & 0 & A & 10 & N & 20 & Y & 30 \\
! 97: 1 & 1 & B & 11 & P & 21 & Z & 31 \\
! 98: 2 & 2 & C & 12 & Q & 22 \\
! 99: 3 & 3 & D & 13 & R & 23 \\
! 100: 4 & 4 & E & 14 & S & 24 \\
! 101: 5 & 5 & F & 15 & T & 25 \\
! 102: 6 & 6 & G & 16 & U & 26 \\
! 103: 7 & 7 & H & 17 & V & 27 \\
! 104: 8 & 8 & K & 18 & W & 28 \\
! 105: 9 & 9 & M & 19 & X & 29 \\
! 106: \end{tabular}
! 107: \end{footnotesize}
! 108: \caption{Character set for identifier}
! 109: \label{tab:chartable}
! 110: \end{table}
! 111:
! 112: The 50 bit of the chosen address for the resource is divided into ten
! 113: pieces of 5 bit. The pieces are each encoded into one character
! 114: according to the character table in table~\ref{tab:chartable}. The
! 115: resulting string of 10 characters is called the \emph{resource
! 116: address}.
! 117:
! 118:
! 119:
! 120:
! 121: \subsection{Namespaces}
! 122: \label{sec:namespaces}
! 123:
! 124: The total address space of 70 bit is divided into $2^{20}$ (1048576)
! 125: subspaces of 50 bit. These subspaces, also called namespaces, can be
! 126: assigned to institutions that wish to implement their own allocation
! 127: of resource identifiers for reasons of efficiency and maintenance. All
! 128: resulting resource identifiers are only valid once they are registered
! 129: with the central \emph{resource registry}.
! 130:
! 131: Each subspace is identified by a four-character \emph{name
! 132: space identifier}. The 10 character \emph{resource address} is
! 133: prefixed with the \emph{name space identifier}, resulting in a 14
! 134: character \emph{unique address} for each resource.
! 135:
! 136: Subspaces and their name space identifier are registered by the
! 137: central resource registry. An institution or project that wishes to
! 138: implement its own allocation of resource identifiers contacts the
! 139: resource registry and receives a name space identifier for a currently
! 140: unused subspace. The subspace is then marked as being used by this
! 141: institution or project. New resource identifiers in this subspace can
! 142: only be assigned by the institution or project that owns the subspace.
! 143:
! 144: The central resource registry allocates and registers resource
! 145: identifiers for institutions, projects and individuals that do not
! 146: want to maintain their own subspace. Resource identifiers allocated by
! 147: the central resource registry are in the \texttt{ECHO} namespace.
! 148:
! 149: The namespaces \texttt{0000}, \texttt{TEMP} and \texttt{ECHO} are
! 150: reserved for use with the central resource registry.
! 151:
! 152:
! 153: \subsection{Checksum}
! 154: \label{sec:checksum}
! 155:
! 156: A checksum of one character (5 bit) is calculated over the 14
! 157: characters (70 bit) of the \emph{unique address}. The checksumming method is
! 158: similar to the method used for ISBN (International Standard Book
! 159: Number). The differences are the number system, which is base-32 for
! 160: the DRI (ISBN: base-10) and the modulus, which is 31 for the DRI
! 161: (ISBN: 11).
! 162:
! 163: The checksum number is calculated with the formula
! 164: \begin{displaymath}
! 165: c = \sum_{i=1..14} i x_i \pmod{31}
! 166: \end{displaymath}
! 167:
! 168: The resulting checksum number $c$ is converted to a character
! 169: according to table~\ref{tab:chartable} and appended to the end of the
! 170: \emph{unique address} giving the full \emph{Digital Resource
! 171: Identifier}.
! 172:
! 173: The DRI is only valid if the checksum calculated over the unique
! 174: address part of the identifier (the first 14 characters) matches the
! 175: checksum value (the last character).
! 176:
! 177:
! 178:
! 179:
! 180: \section{Central resource registry}
! 181: \label{sec:central-registry}
! 182:
! 183: The central resource registry is the keystone in the concept of stable
! 184: and sustainable digital resource identifiers and references. Resources
! 185: can be moved and renamed on local servers, duplicated onto other
! 186: servers and servers can even be shut down (given the resource had been
! 187: duplicated) without resources getting lost or breaking links or
! 188: references to the resource.
! 189:
! 190: The resource registry server acts as a switchboard between the user
! 191: requests for a resource and local servers providing the resource. URLs
! 192: and other so called ``global'' references to a resource via its DRI
! 193: access the resource registry server that dispatches the request to the
! 194: local server. In this way only the resource registry server's address
! 195: has to remain stable.
! 196:
! 197: This places a high burden of availability on the registry server. This
! 198: challenge can be met on a technical level with standard technology
! 199: (transparent replication and load balancing) and scaled to higher
! 200: performance levels when the demand rises. More importantly a durable
! 201: solution has to be established on the organizational and social level
! 202: for running the server.
! 203:
! 204: The resource registry maintains the mapping database between the
! 205: digital resource identifiers and the location of the resources on the
! 206: local servers. In this way it has a list of all known resource
! 207: identifiers and ensures that all resource identifiers are unique.
! 208:
! 209: The database on the resource registry server can additionally store a
! 210: set of minimal meta informations on the resources and provide
! 211: searches in this metadata. One item of this minimal meta information
! 212: should be a URL to further information on the resource.
! 213:
! 214: The resource registry server provides a HTTP redirect function for
! 215: transparent HTTP access to resources and optionally other webservice
! 216: access (XML-RPC, SOAP).
! 217:
! 218: Special client software for accessing resources can harvest and cache
! 219: DRI mappings from the central registry for short times to improve
! 220: performance or offline work.
! 221:
! 222: As mentioned in chapter~\ref{sec:namespaces} parts of the resource
! 223: identifier address space can be assigned to institutions or projects
! 224: to implement their own allocation of resource identifiers. These
! 225: identifiers are generally valid only after they have been registered
! 226: with the central resource registry.
! 227:
! 228: The central resource registry remains the only authoritative source of
! 229: digital resource identifiers and their mapping to local resources.
! 230:
! 231: The resource registry provides interfaces to
! 232:
! 233: \begin{itemize}
! 234: \item redirect HTTP requests with resource identifiers to local
! 235: resource servers
! 236:
! 237: \item query the mapping of resource identifiers using a webservice
! 238: interface
! 239:
! 240: \item hand out new resource identifiers and acquire the necessary
! 241: mapping information
! 242:
! 243: \item change resource mapping information or resource meta information
! 244:
! 245: \item query the database for meta information
! 246:
! 247: \item upload sets of externally allocated resource identifiers
! 248:
! 249: \item download sets of identifiers or the whole database for caching
! 250: purposes.
! 251: \end{itemize}
! 252:
! 253:
! 254:
! 255: \subsection{Handling of digital resource identifiers in HTTP
! 256: requests}
! 257: \label{sec:dri-resolution-http}
! 258:
! 259: A global HTTP request usually accesses a digital resource via some
! 260: kind of display tool (for example \digilib{}) that is able to render a
! 261: web representation of the resource. While the resource identifier is
! 262: embedded in the DRI part of the URL, other aspects of the rendering
! 263: (for example which tool to use) are embedded in other parts of the URL
! 264: that may be specific to the display tool. Therefore the registry
! 265: server has to treat URLs differently depending on the display tool.
! 266:
! 267: The handling of HTTP requests has three steps:
! 268: \begin{enumerate}
! 269: \item Identification of the DRI in the request string.
! 270:
! 271: \item Lookup of additional information on the handling of the request
! 272: based on the DRI.
! 273:
! 274: \item Redirect of the client to the local resource server.
! 275: \end{enumerate}
! 276:
! 277: The first part of the treatment of the URL is the identification of
! 278: the DRI in the HTTP request string. Three basic ways of handling the
! 279: DRI are envisaged:
! 280:
! 281: \begin{itemize}
! 282: \item The DRI can be embedded as part of the URI path\footnote{The
! 283: first part of the URI path, separated by slashes, that is a valid
! 284: DRI string.} (\url{http://driserver.echo.eu/dri/ECHO00001A2B3CX}),
! 285:
! 286: \item it can be provided as a special HTTP GET or POST parameter for a
! 287: defined environment like \digilib{}\footnote{The environment itself
! 288: should be identified by the first parts of the URI path.}
! 289: (\url{http://driserver.echo.eu/digilib/digilib.jsp?dri=ECHO00001A2B3CX&pn=5})
! 290: or
! 291:
! 292: \item it can be extracted from the request by a generic pattern
! 293: matching scheme (this option is computationally most expensive)
! 294: \end{itemize}
! 295:
! 296: Once the DRI is identified more information about the resource can be
! 297: looked up in the central resource database. From this point on the
! 298: redirection of the request can be handled differently depending on the
! 299: record type information in the database.
! 300:
! 301: An extensible set of URL rewrite rules will be implemented by the
! 302: server. The type of rule to be used is part of the resource record of
! 303: the DRI in the central resource registry. The following rules should
! 304: be part of the first implementation of the registry server:
! 305:
! 306: \begin{description}
! 307:
! 308: \item[redirect] only the host part of the URL is replaced by the local
! 309: host name from the resource record.
! 310:
! 311: \item[replace] the full URL is replaced by the local URL from the
! 312: resource record.
! 313:
! 314: \item[\digilib{}] the host part of the URL is replaced by the local host
! 315: name from the resource record and the remaining part is replaced according
! 316: to \digilib{} rules.
! 317:
! 318: \item[rewrite] the host part of the URL is replaced by the local host
! 319: name from the resource record and the remaining part is replaced according to
! 320: generic substitution rules with wildcard patterns.
! 321: \end{description}
! 322:
! 323: The introduction of other specialized types of rewrite rules can be
! 324: implemented as extension modules to the resource server.
! 325:
! 326:
! 327:
! 328: \subsubsection{Redirect and replace type DRI resolution}
! 329: \label{sec:redirect-type-dri}
! 330:
! 331: When a DRI resource record has a resolution type of ``redirect'', then
! 332: only the host part of the URL is replaced in the redirected request by
! 333: the local host given in the resource record. See
! 334: table~\ref{tab:redirect-resolv}.
! 335:
! 336: \begin{table}[htbp]
! 337: \centering
! 338: \begin{tabular}{lp{0.7\textwidth}}
! 339: incoming request & \url{http://driserver.echo.eu/dri/ECHO00001A2B3CX} \\
! 340: \texttt{local\_host} record & \texttt{penelope.unibe.ch} \\
! 341: redirect request & \url{http://penelope.unibe.ch/dri/ECHO00001A2B3CX}
! 342: \end{tabular}
! 343: \caption{redirect type DRI resolution}
! 344: \label{tab:redirect-resolv}
! 345: \end{table}
! 346:
! 347: When a DRI resource record has a resolution type of ``replace'', then
! 348: the whole URL is replaced in the redirected request by the local URL
! 349: given in the resource record. See table~\ref{tab:replace-resolv}.
! 350:
! 351: \begin{table}[htbp]
! 352: \centering
! 353: \begin{tabular}{lp{0.7\textwidth}}
! 354: incoming request & \url{http://driserver.echo.eu/dri/ECHO00001A2B3CX} \\
! 355: \texttt{local\_url} record & \url{http://penelope.unibe.ch/docuserver/compago/compare.pl?32} \\
! 356: redirect request & \url{http://penelope.unibe.ch/docuserver/compago/compare.pl?32}
! 357: \end{tabular}
! 358: \caption{replace type DRI resolution}
! 359: \label{tab:replace-resolv}
! 360: \end{table}
! 361:
! 362:
! 363:
! 364: \subsubsection{\digilib{} type DRI resolution}
! 365: \label{sec:digilib-type-dri}
! 366:
! 367: When a DRI resource record has a resolution type of ``\digilib{}'', then
! 368: the host part of the URL is replaced by the local host in the resource
! 369: record and the remaining part is replaced according to \digilib{}
! 370: parameter format.
! 371:
! 372: In the preferred parameter-style format the DRI is given as the
! 373: parameter ``dri''. The local URL for the redirect is constructed by
! 374: replacing the URI path up to the ``?'' with the digilib path from the
! 375: resource record and adding a local filename as parameter ``fn''. See
! 376: table~\ref{tab:digilib-resolv}.
! 377:
! 378: \begin{table}[htbp]
! 379: \centering
! 380: \begin{tabular}{lp{0.7\textwidth}}
! 381: incoming request &
! 382: \url{http://driserver.echo.eu/digilib/digilib.jsp?dri=ECHO00001A2B3CX&pn=5} \\
! 383: \texttt{local\_host} record & \texttt{penelope.unibe.ch} \\
! 384: \texttt{digilib\_path} record & \texttt{/docuserver/digitallibrary/digilib.jsp} \\
! 385: \texttt{digilib\_file} record & \texttt{public/Beispiele} \\
! 386: redirect request &
! 387: \url{http://penelope.unibe.ch/docuserver/digitallibrary/digilib.jsp?dri=ECHO00001A2B3CX&fn=public/Beispiele&pn=5}
! 388: \end{tabular}
! 389: \caption{digilib type DRI resolution}
! 390: \label{tab:digilib-resolv}
! 391: \end{table}
! 392:
! 393: In the deprecated plus-style format the DRI could be placed the first
! 394: part of the parameter path, prefixed with ``dri:''. In the local URL
! 395: the local pathname is appended to the DRI part.
! 396:
! 397:
! 398: \subsubsection{Rewrite type DRI resolution}
! 399: \label{sec:rewrite-type-dri}
! 400:
! 401: When a DRI resource record has a resolution type of ``rewrite'', then
! 402: the host part of the URL is replaced by the local host name from the
! 403: resource record and the remaining part is replaced according to
! 404: generic substitution rules with wildcard patterns.
! 405:
! 406:
! 407:
! 408: \subsection{Handling of digital resource identifiers as a web service}
! 409: \label{sec:handl-dri-web}
! 410:
! 411: The basic function of resolution of a DRI as well as other maintenance
! 412: functions like the registration of new DRIs or the download of parts
! 413: or all registered DRI mappings should also be accessible with a web
! 414: service interface.
! 415:
! 416: Specifications for the web service interface have to be established.
! 417:
! 418:
! 419: \section{Resource metadata}
! 420: \label{sec:resource-metadata}
! 421:
! 422: The set of metadata about a resource that is stored on the resource
! 423: server is called a \emph{resource record}. Since the requirements of
! 424: access, structure and amount of metadata for different projects can
! 425: hardly be generalized the resource server stores only a minimal set of
! 426: fields that is sufficient for the basic functions of access to the
! 427: resource, sustainability of access, and interoperability. More
! 428: extensive and project specific metadata sets should be stored and
! 429: maintained on external servers. The optional resource information
! 430: field can be used to point to external metadata representations.
! 431:
! 432:
! 433: \subsection{Basic metadata}
! 434: \label{sec:basic-metadata}
! 435:
! 436: The amount of metadata is dependent on the type of resource record.
! 437: Common to all records is the \texttt{dri} field for the resource
! 438: identifier. Redirect-type records require an additional
! 439: \texttt{local\_host} field for the host name of the local host.
! 440: Replace-type records require an \texttt{local\_url} field for a full
! 441: URL. Digilib-type records require at least the three fields
! 442: \texttt{local\_host}, \texttt{digilib\_path}, and
! 443: \texttt{digilib\_file} and an optional parameter
! 444: \texttt{digilib\_pageno}. The basic fields can be found in
! 445: table~\ref{tab:basic-meta}.
! 446:
! 447: \begin{table}[htbp]
! 448: \centering
! 449: \begin{tabular}{lr|l}
! 450: type & field & description \\ \hline
! 451: \textbf{redirect} & & \\
! 452: & \texttt{record\_type} & type of record (``redirect'') \\
! 453: & \texttt{dri} & DRI \\
! 454: & \texttt{local\_host} & local host name \\ \hline
! 455: \textbf{replace} & & \\
! 456: & \texttt{record\_type} & type of record (``replace'') \\
! 457: & \texttt{dri} & DRI \\
! 458: & \texttt{local\_url} & full local URL \\ \hline
! 459: \textbf{digilib} & & \\
! 460: & \texttt{record\_type} & type of record (``digilib'') \\
! 461: & \texttt{dri} & DRI \\
! 462: & \texttt{local\_host} & local digilib server \\
! 463: & \texttt{digilib\_path} & URI path of the digilib installation \\
! 464: & \texttt{digilib\_file} & digilib path name (parameter fn) \\
! 465: & \texttt{digilib\_pageno} & optional page number
! 466: (parameter pn)
! 467: \end{tabular}
! 468: \caption{Basic metadata fields}
! 469: \label{tab:basic-meta}
! 470: \end{table}
! 471:
! 472: The resource server may implement additional fields like owner and
! 473: group fields for internal management and user access functions.
! 474:
! 475:
! 476: \subsection{Alternate server and backup server}
! 477: \label{sec:redund-serv-back}
! 478:
! 479: The resource server architecture is designed to fulfill high demands
! 480: on the performance and sustainability of access to the
! 481: resources. These demands can be met by a loosely coupled network of
! 482: local servers duplicating content for backup and the transparent
! 483: sharing of concurrent access to resources for enhanced
! 484: performance.
! 485:
! 486: Backup server fields give the names and paths of servers that provide
! 487: copies of the resource. Requests for the resource are diverted to a
! 488: backup server when the original server becomes unavailable.
! 489:
! 490: Alternate server fields give the names paths of servers that provide
! 491: copies of the resource. Requests for a resource are spread among all
! 492: alternate servers for the same resource according to a load-balancing
! 493: pattern. The pattern can be a simple round-robin scheme or a more
! 494: sophisticated scheme based on server performance or the geographical
! 495: location of client and server.
! 496:
! 497: A resource record can have any number of backup server and alternate
! 498: server fields. If a resource is required to have at least one backup
! 499: server is a policy decision of the hosting project that is not
! 500: enforced by the resource server.
! 501:
! 502:
! 503:
! 504: \subsection{Additional resource information}
! 505: \label{sec:addt-reso-inform}
! 506:
! 507: The resource server itself carries only minimal metadata on a resource
! 508: but it provides a basic mechanism to store and access more extensive
! 509: information on external servers.
! 510:
! 511: Every resource record can have a resource info URL that is stored in
! 512: the \texttt{info-url} field.
! 513:
! 514: \begin{table}[htbp]
! 515: \centering
! 516: \begin{tabular}{l|l}
! 517: field & description \\ \hline
! 518: \texttt{info-url} & URL to external information
! 519: \end{tabular}
! 520: \caption{External resource information}
! 521: \label{tab:extern-reso-inform}
! 522: \end{table}
! 523:
! 524: The external resource information can be accessed in a standardized
! 525: way on the resource server where the DRI of the resource is part of
! 526: the URI path: \url{http://driserver.echo.eu/resinfo/ECHO00001A2B3CX/}
! 527: Requests to this URL will be redirected to the URL in the
! 528: \texttt{info-url} field in the resource record.
! 529:
! 530:
! 531: \end{document}
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>