--- texttool-architecture/soft-cgi.tex 2004/04/03 17:19:33 1.17 +++ texttool-architecture/soft-cgi.tex 2004/08/16 22:34:04 1.20 @@ -155,37 +155,7 @@ or from a remote location bash; export CVS_RSH=ssh; cvs -d :ext:myusername@141.14.236.86:/perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib \end{verbatim} -\subsubsection{Indexing} -\label{sec:indexing} - -\paragraph{Status quo ECHO} -Currently indexing is not implemented on the ECHO server. - -\paragraph{Plan ECHO} - -\begin{enumerate} -\item construct remote (141.14.236.86) index for each file at - per-change or daily intervals -\item store indices locally in -\url{archimedes/data/db/PROJECT_NAME/CORPUS_NAME/WORK} -\item 2 progs on server 1. cgi: \url{indexer} 2. backend \url{da_remote} -\item 2 progs on client 1. cgi: \url{sendindex} 2. backend \url{getindex} -\item indexing transaction handled by two cgi scripts, one on the - server the other on the client [this is the 1st implementation bcs - its easiest and there are no port issues, but probably it'd be - better to have a separate port]. -\item client cgi: getindex -- sends 1. list of files to index - 2. uri to which xml notification of completion is to be sent. Upon - notification, activates backend prog that fetches and installs the - indices. -\item server cgi: indexer receives filelist and notification - addess. Activates backend that fetches files, indexes, places - completed indexes in a networked location, then sends xml - notification back to client. -\item single script provides backend access to indices -\item leave front-end issues like display, collection and navigation - to web-design programmers. Do only a sample for now. -\end{enumerate} +\input{soft-search} \subsubsection{Morphology} \label{sec:morphology} @@ -215,12 +185,26 @@ No parameter--update all lemmatization i \paragraph{makefast.pl ARCHIMEDES} Updates the toc.cgi morphology indices -Parameters +Parameters: No parameter--update all lemmatization indices [latin | ital | greek | en | nl | de]-- update this language -\subsubsection{summary of differences btwn the archimedes toc.cgi - implementation and the echo toc.cgi impelementation (toc.x.cgi)} +The indices are produced from the corpus word index 'xml:raw:norm', +which correlates raw forms to normalized forms, and +'\$lang:inc_lemma', which correlates incidentia to lemmata. The basic +rule is, if exists \$raw->\$norm->\$inc_lemma, then \$raw is included +in the 'fast' index for that language. + +Currently stores the indices with the name xml:hit:\$lang, where +\$lang is one of [ital,greek,latin,de,en,fr,nl] in the directory +/usr/share/perlobjects/wordindex in Archim::Object::Depot format +(Storable). Access to these indices is provided by +Archim::Toc::Utils->get_hits_hash(\$lang) . + +The functionality of makefast.pl is duplicated by Archim::Toc::Index->make_fast_lemma(\$lang); + + +\subsubsection{summary of differences btwn the archimedes toc.cgi implementation and the echo toc.cgi impelementation (toc.x.cgi)} \paragraph{missing in archimedes} \begin{enumerate} @@ -235,7 +219,6 @@ No parameter--update all lemmatization i \item remote text method may work differently - \end{enumerate} \paragraph{differences} \begin{enumerate}