Diff for /texttool-architecture/soft-search.tex between versions 1.1 and 1.2

version 1.1, 2004/06/01 12:00:21 version 1.2, 2004/06/01 12:14:48
Line 1 Line 1
 \subsubsection{rec.cgi (register text)}  \subsubsection{q1 (corpus-wide search)}
 \label{sec:rec.cgi}  \label{q1}
   
 \paragraph  \paragraph
 On the ECHO server, the registration of new texts is implemented by  This section describes the software associated with the ECHO
 means of a cgi script, reg.cgi  lemmatized text search 
 (archimedes/web/cgi-bin/toc/admin/reg.cgi ). reg.cgi retrieves a  \url{http://echo.mpiwg-berlin.mpg.de/ECHOVIEW/ECHO_view.css}
 metadata file  in MPIWG archive metadata format from the entered uri  
 (currently only local paths are supported ) and constructs from this  
 file a toc.cgi object file (see below) , which it writes to toc.cgi's  
 data section. [corpus???] It should be stressed that this is a  
 registration procedure developed for a particular implementation of  
 toc.cgi and not a part of the core application.   
   
 \paragraph  
 reg.cgi takes two parameters, path and show.  Path should give the  
 local path to the metadata file for the text that is being  
 registered. If ``show'' is set to 1, reg.cgi will return for  
 inspection the toc.cgi object file that it has built out of the  
 submitted metadata file.   
   
 \paragraph{input metadata file}  
   
 The input metadata file must have the following form  
   
 \paragraph  
 \begin{verbatim}  
 <resource>  
     ...  
     <meta>  
       <meta>  
                 <bib type=''Book''>  
   
 <title>Mainzer Untergerichtsordnung (von 1534)</title>  
 <author>anon</author>  
 <year>1580</year>  
         <texttool><display>yes</display>  
     <image>pageimgtif</image>  
     <text>/mpiwg/online/experimental/echo_DRQEdit_test/anon_Mainz_1580/fulltextDW/mainzugo02_utf8.xml</text>  
     <pagebreak>pb</pagebreak><presentation>01-presentation/info.xml</presentation></texttool></meta>  
   
     </meta>  
   
 \end{verbatim}  
 \paragraph{archimedes object registration}  
   
 \subsubsection{toc.cgi (display text)}  
 \label{sec:toc.cgi}  
   
 \paragraph{plan of this section }  
   
 \begin{enumerate}  
 \item An overview of toc.cgi architecture  
 \item A walk-through of typical cgi queries for toc.cgi  
 \item An index of cgi parameters and values with short descriptions of function  
 \end{enumerate}  
   
 \paragraph{Overview of toc.cgi architecture}  
   
 \subparagraph{}  
 toc.cgi is a perl script for displaying collections of xml texts and   
 linking them to related resources such as page-images, morphological  
 analysis, commentaries, dictionaries, etc. It implements generic methods  
 for resource-linking provided by a series of perl modules which are in  
 turn based mainly on generic open-source tools for xml manipulation and networking  
 written in C.   
   
 \subparagraph{toc.cgi collections--Network transparency}  
 Each of the collections in toc.cgi is a ``virtual'' collection, that  
 is, a collection of links or uri's to resources that reside somewhere on an accessible  
 network, local or remote.    
   
 \subparagraph{toc.cgi collections--remote resources}  
   
 What is at the other end of the link is of no concern to toc.cgi, as  
 long as the resource referenced by the link meets minimal toc.cgi  
 requirements--how the resource is actually implemented and exposed is  
 a matter for the resource provider. The link may, for instance, point  
 directly to an xml text or it may point to a container which exposes a  
 particular xml view of an underlying resource that is perhaps not in  
 xml format at all.   
   
   
 \subparagraph{resource registry}  
   
   
   
   
 \paragraph{cgi parameters -- standard queries}  
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }  
 \newline  
 \newline  
 get a listing of corpora  
   
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }  
 \newline  
 \newline  
 get an xml listing of corpora   
   
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }  
 \newline  
 \newline  
 get a listing of works in default corpus  
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }  
 \newline  
 \newline  
 get a listing of works in corpus 1 [default corpus = 0]  
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }  
 \newline  
 \newline  
 get an xml listing of works in default corpus   
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }  
 \newline  
 \newline  
 get an xml listing of works in corpus 1  
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }  
 \newline  
 \newline  
 get a work from default corpus with thumbnail navbar displayed left  
   
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }  
 \newline  
 \newline  
 get a work from default corpus with thumbnail navbar displayed right  
   
 \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }  
 \newline  
 \newline  
 get a page of text from a work from default corpus   
   
   
   
   
 \subsubsection{Indexing}  
 \label{sec:indexing}  
   
 \paragraph{Status quo ECHO}  
 Currently indexing is not implemented on the ECHO server.  
   
 \paragraph{Plan ECHO}  
   
 \begin{enumerate}  
 \item construct remote (141.14.236.86) index for each file at  
   per-change or daily intervals  
 \item store indices locally in  
 archimedes/data/db/PROJECT_NAME/CORPUS_NAME/WORK  
 \item 2 progs on server 1. cgi: indexer 2. backend da_remote  
 \item 2 progs on client 1. cgi: sendindex 2. backend getindex  
 \item indexing transaction handled by two cgi scripts, one on the  
   server the other on the client [this is the 1st implementation bcs  
   its easiest and there are no port issues, but probably it'd be  
   better to have a separate port].   
 \item client cgi: getindex -- sends 1.  list of files to index  
   2. uri to which xml notification of completion is to be sent. Upon  
   notification, activates backend prog that fetches and installs the  
   indices.    
 \item server cgi: indexer receives filelist and notification  
   addess. Activates backend that fetches files, indexes, places  
   completed indexes in a networked location, then sends xml  
   notification back to client.   
 \item single script provides backend access to indices   
 \item leave front-end issues like display, collection and navigation  
   to web-design programmers. Do only a  sample for now.   
 \end{enumerate}  
   
 \subsubsection{Morphology}  
 \label{sec:morphology}  
   
   
 \subsubsection{Dictionary server}  
 \label{sec:dictionary-server}  
   
   
 \subsubsection{helper programs}  
   
 \paragraph{addarch.pl ARCHIMEDES}   
   
 Automatically registers new texts as toc.cgi objects when they appear in  
 cvs. Automatically updates relevant morphological indices (slow!) each  
 time a cvs update occurs. This program is called by a hook in the cvs  
 ``loginfo'' configuration file.   
   
   
 \paragraph{makelemma.pl ARCHIMEDES}  
   
 Updates lemmatization indices.   
 Parameters:   
 No parameter--update all lemmatization indices  
 [latin | ital | greek | en | nl | de]--  update this language  
   
 \paragraph{makefast.pl ARCHIMEDES}   
   
 Updates the toc.cgi morphology indices  
 Parameters  
 No parameter--update all lemmatization indices  
 [latin | ital | greek | en | nl | de]--  update this language  
   
 \subsubsection{summary of differences btwn the archimedes toc.cgi  
   implementation and the echo toc.cgi impelementation (toc.x.cgi)}  
   
 \paragraph{missing in archimedes}  
 \begin{enumerate}  \begin{enumerate}
   \item xml-rpc interface to 141.14.236.86 and implementations (archimedes/bin/getindex
 \item html templates (coded but phased out of cvs branch)    and archimedes/bin/make_indices)
 \end{enumerate}  \item search module archimedes/code/IncPerl/Archim/Toc/Search.pm and
     implementation ( archimedes/web/cgi-bin/search/q1 )
 \paragraph{missing in echo}  
 \begin{enumerate}  
   
 \item word-coloring?  
 \item remote text method may work differently  
 \end{enumerate}  
   
 \paragraph{differences}  
 \begin{enumerate}  
 \item structure of info.xml  
 \item resource-discovery algorithm for info.xml  
 \end{enumerate}  \end{enumerate}
   \paragraph
   
   
   

Removed from v.1.1  
changed lines
  Added in v.1.2


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>