texttool-architecture/soft-cgi.tex - view

File: [Repository] / texttool-architecture / soft-cgi.tex
Revision 1.17: download - view: text, annotated - select for diffs - revision graph
Sat Apr 3 17:19:33 2004 UTC (20 years, 1 month ago) by casties
Branches: MAIN
CVS tags: HEAD

small fixes such that it really runs through latex...

\subsubsection{rec.cgi (register text)} \label{sec:rec.cgi} \paragraph On the ECHO server, the registration of new texts is implemented by means of a cgi script, reg.cgi (archimedes/web/cgi-bin/toc/admin/reg.cgi ). reg.cgi retrieves a metadata file in MPIWG archive metadata format from the entered uri (currently only local paths are supported ) and constructs from this file a toc.cgi object file (see below) , which it writes to toc.cgi's data section. [corpus???] It should be stressed that this is a registration procedure developed for a particular implementation of toc.cgi and not a part of the core application. \paragraph reg.cgi takes two parameters, path and show. Path should give the local path to the metadata file for the text that is being registered. If ``show'' is set to 1, reg.cgi will return for inspection the toc.cgi object file that it has built out of the submitted metadata file. \paragraph{input metadata file} The input metadata file must have the following form \begin{verbatim} <resource> ... <meta> <meta> <bib type=''Book''> <title>Mainzer Untergerichtsordnung (von 1534)</title> <author>anon</author> <year>1580</year> <texttool><display>yes</display> <image>pageimgtif</image> <text>/mpiwg/online/experimental/echo_DRQEdit_test/anon_Mainz_1580/fulltextDW/mainzugo02_utf8.xml</text> <pagebreak>pb</pagebreak><presentation>01-presentation/info.xml</presentation></texttool></meta> </meta> \end{verbatim} \paragraph{archimedes object registration} \subsubsection{toc.cgi (display text)} \label{sec:toc.cgi} \paragraph{plan of this section } \begin{enumerate} \item An overview of toc.cgi architecture \item A walk-through of typical cgi queries for toc.cgi \item An index of cgi parameters and values with short descriptions of function \item The TOC Perl modules \end{enumerate} \paragraph{Overview of toc.cgi architecture} \subparagraph{} toc.cgi is a perl script for displaying collections of xml texts and linking them to related resources such as page-images, morphological analysis, commentaries, dictionaries, etc. It implements generic methods for resource-linking provided by a series of perl modules which are in turn based mainly on generic open-source tools for xml manipulation and networking written in C. \subparagraph{toc.cgi collections--Network transparency} Each of the collections in toc.cgi is a ``virtual'' collection, that is, a collection of links or uri's to resources that reside somewhere on an accessible network, local or remote. \subparagraph{toc.cgi collections--remote resources} What is at the other end of the link is of no concern to toc.cgi, as long as the resource referenced by the link meets minimal toc.cgi requirements--how the resource is actually implemented and exposed is a matter for the resource provider. The link may, for instance, point directly to an xml text or it may point to a container which exposes a particular xml view of an underlying resource that is perhaps not in xml format at all. \subparagraph{resource registry} \paragraph{cgi parameters -- standard queries} \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus } \newline \newline get a listing of corpora \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest } \newline \newline get an xml listing of corpora \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi } \newline \newline get a listing of works in default corpus \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 } \newline \newline get a listing of works in corpus 1 [default corpus = 0] \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist } \newline \newline get an xml listing of works in default corpus \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 } \newline \newline get an xml listing of works in corpus 1 \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb } \newline \newline get a work from default corpus with thumbnail navbar displayed left \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright } \newline \newline get a work from default corpus with thumbnail navbar displayed right \url{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 } \newline \newline get a page of text from a work from default corpus \paragraph{TOC Perl Modules} \subparagraph{general}The documentation for the Toc Perl Modules is located in the modules themselves in POD format. The POD is the definitive documentation for the modules. The modules are available to archimedes staff from cvs on the archimedes server at 141.14.236.86:/perseus/cvsroot in the module /perseus/cvsroot/mpitexts/perl/perllib. To get them, log on to the archimedes server and use the commandline command: \begin{verbatim} cvs -d /perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib \end{verbatim} or from a remote location \begin{verbatim} bash; export CVS_RSH=ssh; cvs -d :ext:myusername@141.14.236.86:/perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib \end{verbatim} \subsubsection{Indexing} \label{sec:indexing} \paragraph{Status quo ECHO} Currently indexing is not implemented on the ECHO server. \paragraph{Plan ECHO} \begin{enumerate} \item construct remote (141.14.236.86) index for each file at per-change or daily intervals \item store indices locally in \url{archimedes/data/db/PROJECT_NAME/CORPUS_NAME/WORK} \item 2 progs on server 1. cgi: \url{indexer} 2. backend \url{da_remote} \item 2 progs on client 1. cgi: \url{sendindex} 2. backend \url{getindex} \item indexing transaction handled by two cgi scripts, one on the server the other on the client [this is the 1st implementation bcs its easiest and there are no port issues, but probably it'd be better to have a separate port]. \item client cgi: getindex -- sends 1. list of files to index 2. uri to which xml notification of completion is to be sent. Upon notification, activates backend prog that fetches and installs the indices. \item server cgi: indexer receives filelist and notification addess. Activates backend that fetches files, indexes, places completed indexes in a networked location, then sends xml notification back to client. \item single script provides backend access to indices \item leave front-end issues like display, collection and navigation to web-design programmers. Do only a sample for now. \end{enumerate} \subsubsection{Morphology} \label{sec:morphology} \subsubsection{Dictionary server} \label{sec:dictionary-server} \subsubsection{helper programs} \paragraph{addarch.pl ARCHIMEDES} Automatically registers new texts as toc.cgi objects when they appear in cvs. Automatically updates relevant morphological indices (slow!) each time a cvs update occurs. This program is called by a hook in the cvs ``loginfo'' configuration file. \paragraph{makelemma.pl ARCHIMEDES} Updates lemmatization indices. Parameters: No parameter--update all lemmatization indices [latin | ital | greek | en | nl | de]-- update this language \paragraph{makefast.pl ARCHIMEDES} Updates the toc.cgi morphology indices Parameters No parameter--update all lemmatization indices [latin | ital | greek | en | nl | de]-- update this language \subsubsection{summary of differences btwn the archimedes toc.cgi implementation and the echo toc.cgi impelementation (toc.x.cgi)} \paragraph{missing in archimedes} \begin{enumerate} \item html templates (coded but phased out of cvs branch) \end{enumerate} \paragraph{missing in echo} \begin{enumerate} \item word-coloring? \item remote text method may work differently \end{enumerate} \paragraph{differences} \begin{enumerate} \item structure of info.xml \item resource-discovery algorithm for info.xml \end{enumerate} %%% Local Variables: %%% mode: latex %%% TeX-master: "texttools" %%% End: